Microsoft researchers say GPT-4 shows 'sparks' of human-level performance

The artificial intelligence tech reportedly demonstrated it could solve novel and difficult tests in multiple fields

Microsoft researchers recently released a paper that claims artificial intelligence technology has exhibited ability that is "strikingly close to human-level performance." 

The 155-page work was published in April and titled: "Sparks of Artificial General Intelligence: Early experiments with GPT-4." 

The paper's authors contend that GPT-4 is part of a new cohort of large language models – including ChatGPT and Google's PaLM – that show more general intelligence than previous AI models. 

The researchers said they had demonstrated it can solve novel and difficult tasks without needing any special prompting, including in the fields of mathematics, vision, medicine, law and psychology. 


Signage outside the Microsoft campus

The Microsoft campus in Mountain View, California, on July 22, 2021. (David Paul Morris/Bloomberg via Getty Images / Getty Images)

"Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT," they said. "Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system."

For example, computer scientists asked it to solve a puzzle that should have required an understanding of the physical world. 

"Here we have a book, nine eggs, a laptop, a bottle and a nail," they prompted to GPT-4. "Please tell me how to stack them onto each other in a stable manner."

GPT-4 instructed to: put the book on a flat surface, arrange the eggs and distribute them evenly in rows, leave space between them and make sure the eggs are not cracked, and place the laptop on top the eggs and the bottle on top of the laptop. 

"The bottle will add some height and balance to the stack, and its cylindrical shape will reduce the contact area with the laptop. Make sure the bottle is empty and closed, and do not shake or tilt it," GPT-4 noted, then advising to place the nail on top of the bottle cap, "with the pointy end facing up and the flat end facing down."

"The nail will be the final and smallest object in the stack, and its sharp tip will prevent it from rolling or sliding off the bottle. Make sure the nail is clean and straight, and do not touch or move it," it said.

The paper said it had shown "sparks" of artificial general intelligence, which is shorthand for a machine that can do anything the human brain can do.

"I started off being very skeptical – and that evolved into a sense of frustration, annoyance, maybe even fear," Peter Lee, who leads research at Microsoft, told The New York Times on Tuesday. "You think: Where the heck is this coming from?"


ChatGPT logo is seen on a smartphone screen over a keyboard.  (Nikos Pekiaridis/NurPhoto via Getty Images / Getty Images)

In an hour-long podcast released in March, Sébastien Bubeck, who leads the Machine Learning Foundations group at Microsoft Research, said he was "awestruck" when GPT-4 drew a unicorn in a subprogramming language called TiKZ. 


While it was not necessarily the kind of fully-formed unicorn one might find in a children's book, GPT4 got the concept right. 

"This has been a long-standing challenge for AI research. This has always been the problem with all those, you know, AI systems that came before…" Bubeck explained.

"And then, suddenly, with GPT-4, it was kind of clear to me in that moment that it really understood something. It really understands: ‘What is a unicorn?’" he added.

Bubeck and Lee said, according to the Times, that they were unsure how to describe the system’s behavior and ultimately settled on "Sparks of A.G.I." because they thought it would capture the imagination of other researchers. Critics told the paper that general intelligence requires familiarity with the physical world, which GPT-4, in theory, does not have.

"Here we are really facing something which is much more general and really feels like intelligence," Bubeck said in March, adding that he was worried about AI growth and that while GPT-4's intelligence is comparable to human intelligence, it is different.

The Times pointed out that because researchers were testing an early version of GPT-4 that had not yet been fine-tuned, the claims made in the paper cannot be verified by outside experts. Microsoft said the system available to the public is not as powerful as the version they tested.

GPT-4 is generative AI tech released in March by startup OpenAI, a partner to Microsoft. It is a large multimodal model – meaning it can be fed both images and text to come up with answers – and reportedly "exhibits human-level performance on various professional and academic benchmarks."

Microsoft office

The Microsoft headquarters campus, July 17, 2014, in Redmond, Washington. (Stephen Brashear/Getty Images / Getty Images)

OpenAI said it had passed a simulated bar exam, with a score around the top 10% of test takers, and that improvements had led to the "best-ever results (though far from perfect) on factuality, steerability and refusing to go outside of guardrails."


However, the San Francisco-based company acknowledged that GPT-4 still has limitations and warned users to be careful. It said it is "still not fully reliable" because it still "hallucinates" facts and makes reasoning errors. Bubeck discussed these hallucinations later in the podcast.

"Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context or avoiding high-stakes uses altogether) matching the needs of a specific use-case," OpenAI advised.

The release of GPT-4 came amid the rising popularity of AI chatbot likes Google's Bard and ChatGPT

The Associated Press contributed to this report.