Skip to main content

Here’s why people are saying GPT-4 is getting ‘lazy’

OpenAI and its technologies have been in the midst of scandal for most of November. Between the swift firing and rehiring of CEO Sam Altman and the curious case of the halted ChatGPT Plus paid subscriptions, OpenAI has kept the artificial intelligence industry in the news for weeks.

Now, AI enthusiasts have rehashed an issue that has many wondering whether GPT-4 is getting “lazier” as the language model continues to be trained. Many who use it speed up more intensive tasks have taken to X (formerly Twitter) to air their grievances about the perceived changes.

Recommended Videos

OpenAI has safety-ed GPT-4 sufficiently that its become lazy and incompetent.

Convert this file? Too long. Write a table? Here's the first three lines. Read this link? Sorry can't. Read this py file? Oops not allowed.

So frustrating.

— rohit (@krishnanrohit) November 28, 2023

Rohit Krishnan on X detailed several of the mishaps he experienced while using GPT-4, which is the language model behind ChatGPT Plus, the paid version of ChatGPT. He explained that the chatbot has refused several of his queries or given him truncated versions of his requests when he was able to get detailed responses previously. He also noted that the language model will use tools other than what it has been instructed to use, such as Dall-E when a prompt asks for a code interpreter. Krishnan also sarcastically added that “error analyzing” is the language model’s way of saying “AFK [away from keyboard], be back in a couple of hours.”

Matt Wensing on X detailed his experiment, where he asked ChatGPT Plus to make a list of dates between now and May 5, 2024, and the chatbot required additional information, such as the number of weeks between those dates, before it was able to complete the initial task.

Wharton professor Ethan Mollick also shared his observations of GPT-4 after comparing sequences with the code interpreter he ran in July to more recent queries from Tuesday. He concluded that GPT-4 is still knowledgeable, but noted that it explained to him how to fix his code as opposed to actually fixing the code. In essence, he would have to do the work he was asking GPT-4 to do. Though Mollick has not intended to critique the language, his observations fall in step with what others have described as “back talk” from GPT-4.

ChatGPT is known to hallucinate answers for information that it does not know, but these errors appear to go far beyond common missteps of the AI chatbot. GPT-4 was introduced in March, but as early as July, reports of the language model getting “dumber” began to surface. A study done in collaboration with Stanford University and the University of California, Berkeley observed that the accuracy of GPT-4 dropped from 97.6% to 2.4% between March and June alone. It detailed that the paid version of ChatGPT was unable to provide the correct answer to a mathematical equation with a detailed explanation, while the unpaid version that still runs an older GPT 3.5 model gave the correct answer and a detailed explanation of the mathematical process.

During that time, Peter Welinder, OpenAI Product vice president, suggested that heavy users might experience a psychological phenomenon where the quality of answers might appear to degrade over time when the language model is actually becoming more efficient.

There has been discussion if GPT-4 has become "lazy" recently. My anecdotal testing suggests it may be true.

I repeated a sequence of old analyses I did with Code Interpreter. GPT-4 still knows what to do, but keeps telling me to do the work. One step is now many & some are odd. pic.twitter.com/OhGAMtd3Zq

— Ethan Mollick (@emollick) November 28, 2023

According to Mollick, the current issues might similarly be temporary and due to a system overload or a change in prompt style that hasn’t been made apparent to users. Notably, OpenAI cited a system overload as a reason for the ChatGPT Plus sign-up shutdown following the spike in interest in the service after its inaugural DevDay developers’ conference introduced a host of new functions for the paid version of the AI chatbot. There is still a waitlist in place for ChatGPT Plus. The professor also added that ChatGPT on mobile uses a different prompt style, which results in “shorter and more to-the-point answers.”

Yacine on X detailed that the unreliability of the latest GPT-4 model due to the drop in instruction adherence has caused them to go back to traditional coding, adding that they plan on creating a local code LLM to regain control of the model’s parameters. Other users have mentioned opting for open-source options in the midst of the language model’s decline.

Similarly, Reddit user, Mindless-Ad8595 explained that more recent updates to GPT-4 have made it too smart for its own good. “It doesn’t come with a predefined ‘path’ that guides its behavior, making it incredibly versatile, but also somewhat directionless by default,” he said.

The programmer recommends users create custom GPTs that are specialized by task or application to increase the efficiency of the model output. He doesn’t provide any practical solutions for users remaining within OpenAI’s ecosystem.

App developer Nick Dobos shared his experience with GPT-4 mishaps, noting that when he prompted ChatGPT to write pong in SwiftUI, he discovered various placeholders and to-dos within the code. He added that the chatbot would ignore commands and continue inserting these placeholders and to-dos into the code even when instructed to do otherwise. Several X users confirmed similar experiences of this kind with their own examples of code featuring placeholders and to-dos. Dobos’ post got the attention of an OpenAI employee who said they would forward examples to the company’s development team for a fix, with a promise to share any updates in the interim.

Overall, there is no clear explanation as to why GPT-4 is currently experiencing complications. Users discussing their experiences online have suggested many ideas. These range from OpenAI merging models to a continued server overload from running both GPT-4 and GPT-4 Turbo to the company attempting to save money by limiting results, among others.

It is well-known that OpenAI runs an extremely expensive operation. In April 2023, researchers indicated it took $700,000 per day, or 36 cents per query, to keep ChatGPT running. Industry analysts detailed at that time that OpenAI would have to expand its GPU fleet by 30,000 units to maintain its commercial performance for the remainder of the year. This would entail support of ChatGPT processes, in addition to the computing for all of its partners.

While waiting for GPT-4 performance to stabilize, users exchanged several quips, making light of the situation on X.

“The next thing you know it will be calling in sick,” Southrye said.

“So many responses with “and you do the rest.” No YOU do the rest,” MrGarnett said.

The number of replies and posts about the problem is definitely hard to ignore. We’ll have to wait and see if OpenAI can tackle the problem head-on in a future update.

Fionna Agomuoh
Fionna Agomuoh is a Computing Writer at Digital Trends. She covers a range of topics in the computing space, including…
Microsoft Copilot ‘spews data all over the floors,’ says influential CEO
Microsoft CEO Satya Nadella announces updates to the company's Copilot artificial intelligence (AI) tool.

Marc Benioff, co-founder and CEO of Salesforce, has some harsh criticism of Microsoft Copilot. During an interview on the Rapid Response podcast (spotted by Windows Central), the decorated executive described Microsoft's AI assistant as a "tremendous disservice" to the AI industry, and even compared it to Microsoft's long-retired office assistant, Clippy.

The topic of discussion on the podcast, which you can find the full video of below, is Saleforce's Agentforce AI. It's a competitor to Copilot that offers an AI assistant targeted at increasing productivity in businesses. But Agentforce is customizable. Instead of one AI to rule them all, Salesforce offers agents targeted at different applications. There's an agent built for customer service, another built for retail, and even another built to dig through analytics. Customers can build their own custom agents, too.

Read more
From Open AI to hacked smart glasses, here are the 5 biggest AI headlines this week
Ray-Ban Meta smart glasses in Headline style are worn by a model.

We officially transitioned into Spooky Season this week and, between OpenAI's $6.6 million funding round, Nvidia's surprise LLM, and some privacy-invading Meta Smart Glasses, we saw a scary number of developments in the AI space. Here are five of the biggest announcements.
OpenAI secures $6.6 billion in latest funding round

Sam Altman's charmed existence continues apace with news this week that OpenAI has secured an additional $6.6 billion in investment as part of its most recent funding round. Existing investors like Microsoft and Khosla Ventures were joined by newcomers SoftBank and Nvidia. The AI company is now valued at a whopping $157 billion, making it one of the wealthiest private enterprises on Earth.

Read more
ChatGPT’s new Canvas feature sure looks a lot like Claude’s Artifacts
ChatGPT's Canvas screen

Hot on the heels of its $6.6 billion funding round, OpenAI on Thursday debuted the beta of a new collaboration interface for ChatGPT, dubbed Canvas.

"We are fundamentally changing how humans can collaborate with ChatGPT since it launched two years ago," Canvas research lead Karina Nguyen wrote in a post on X (formerly Twitter). She describes it as "a new interface for working with ChatGPT on writing and coding projects that go beyond simple chat."

Read more