We all saw it coming, and the day is finally here — ChatGPT is slowly morphing into your friendly neighborhood AI, complete with the ability to creepily laugh alongside you if you say something funny, or go “aww” if you’re being nice — and that’s just scratching at the surface of today’s announcements. OpenAI just held a special Spring Update Event, during which it unveiled its latest large language model (LLM) — GPT-4o. With this update, ChatGPT gets a desktop app, will be better and faster, but most of all, it becomes fully multimodal.
The event started with an introduction by Mira Murati, OpenAI’s CTO, who revealed that today’s updates aren’t going to be just for the paid users — GPT-4o is launching across the platform for both free users and paid subscribers. “The special thing about GPT-4o is that it brings GPT-4 level intelligence to everyone, including our free users,” Murati said.
GPT-4o is said to be much faster, but the impressive part is that it really takes the capabilities up a few notches, across text, vision, and audio. It can also be used by developers to integrate into their APIs, and it’s said to be up to two times faster and 50% cheaper, with a rate limit that’s five times higher compared to GPT-4 Turbo.
Alongside the new model, OpenAI is launching the ChatGPT desktop app as well as a refresh of the user interface on the website. The goal is to make the chatbot as easy to communicate with as possible. “We’re looking at the future of interaction between ourselves and the machines, and we think that GPT-4o is really shifting that paradigm into the future of collaboration — where the interaction becomes much more natural,” Murati said.
To that end, the new improvements — which Murati showcased with the help of OpenAI’s Mark Chen and Barret Zoph — really do appear to make the interaction much more seamless. GPT-4o is now able to analyze videos, images, and speech in real time, and it can accurately pinpoint emotion in all three. This is especially impressive in ChatGPT Voice, which became so human-like that it skirts the edge of the uncanny valley.
Saying “hi” to ChatGPT evokes an enthusiastic, friendly response that has just the slightest hint of a robotic undertone. When Mark Chen told the AI that he was holding a live demo and needed help calming down, it sounded adequately impressed and jumped in with the idea that he should take a few deep breaths. It also noticed when those breaths were far too quick — more like panting, really — and walked Chen through the right way to breathe, first making a small joke: “You’re not a vacuum cleaner.”
The conversation flows naturally, as you can now interrupt ChatGPT and don’t have to wait for it to finish, and the responses come quickly with no awkward pauses. When asked to tell a bedtime story, it responded to requests regarding its tone of voice, going from enthusiastic, to dramatic, to robotic. The second half of the demo showed off ChatGPT’s ability to accurately read code, help with math problems via video, and read and describe the content of the screen.
The demo wasn’t perfect — the bot appeared to cut off at times, and it was hard to tell whether this was due to someone else talking or because of latency. However, it sounded just about as lifelike as can be expected from a chatbot, and its ability to read human emotion and respond in kind is equal parts thrilling and anxiety-inducing. Hearing ChatGPT laugh wasn’t on my list of things I thought I’d hear this week, but here we are.
GPT-4o, with its multimodal design, as well as the desktop app, will be gradually launching over the next few weeks. A few months ago, Bing Chat told us that it wanted to be human, but now, we’re about to get a version of ChatGPT that might be as close to human as we’ve ever seen since the start of the AI boom.