Ahead of Google I/O 2024, there was little doubt that Google would talk about AI. The event started on a fittingly rowdy note. YouTube sensation Marc Rebillet started the show adorned in a bathrobe after popping up from a giant cup.
The social media star set the tone for the rest of the event by asking audience members for wild musical ideas that came to life via Google’s AI DJ software. The host couldn’t have asked for a better start. In the words of CEO Sundar Pichai, Google executives uttered the word “AI” 121 times.
By the time the event concluded, I was left with two haunting questions. One: Is Google trying to solve problems that don’t even exist in an average person’s life by force-feeding them the Gemini gelato? Two: Is there a market for specialized AI hardware worth a few hundred dollars when AI on phones is gaining a mind-bending set of superpowers?
The status of AI trinkets
So far, we’ve got cute orange AI gadgets like the Rabbit R1, as well as something as fine as the Humane AI Pin. One brand is even making an AI pendant. Some of them only listen. Others talk, record videos, make calls, tap into chatty AI bots, and even try to make sense of the world around you.
Now, I am not going to discuss just how poorly these devices have fared so far. But Digital Trends Mobile Section Editor Joe Maring says the Rabbit R1 is one of the worst gadgets he’s ever used. The story of Humane AI Pin hasn’t been too different either. Ouch! Alright, these are all first-generation devices of their kind, so let’s cut them some slack.
But here’s the reality. Their future doesn’t seem bright, easy on the pockets, or even convenient. In a span of two days, two AI heavyweights — OpenAI and Google — have made that point almost conclusively.
AI is now aware of the world
Let’s start with vision, a power that allows an AI see the world through a camera lens and talk about what it sees. Google showcased something called Gemini Live at I/O 2024. A day prior to that, OpenAI revealed GPT-4o, where “o” stands for omnimodal. That’s just a fancy way of saying multimodal, which means your AI pal can handle text, audio, and visuals for input and output. But the ultimate objective is identical across both products.
You launch the AI of your choice, point the camera at virtually anything, and the AI will answer your contextual questions. You can fire up the front camera and ask the AI to provide commentary as it watches you playing Rock, Paper, Scissors with a friend. It can tell whether your pink shirt is not the best attire for a job interview.
When needed, it can look at objects and explain them in Portuguese, identify buildings like a trusty tour guide, and sense a special occasion by looking at the confetti spread on a table. Point it at code, and the AI will explain the code’s purpose. And if the AI has seen your car keys at any point, it will tell you where exactly you left them.
Now, all the aforementioned capabilities are not uniform across ChatGPT (high on GPT-4o juice) and Gemini Live (with the Google Astra tech behind it). But the fundamentals are shared. This is also a crucial juncture where the fault lines between the AI experience on phones and on dedicated hardware widen.
The hardware conundrum
The Rabbit R1 and Humane AI Pin have 8-megapixel and 12MP cameras, respectively. Yes, they can see the world and make sense of it, but they can’t match the visual chops of the optically stabilized high-resolution cameras on a half-decent current-gen smartphone.
In a nutshell, an average smartphone will feed more healthy visual data points to an AI engine, local or cloud-based, which directly translates to better comprehension. Think of it as comparing a vlog shot in challenging light from a budget and a flagship phone and asking your friends to describe everything they see. Of course, a blurry or blown-out clip won’t be of much help here.
Then there’s the computing part. Between them, 2024’s buzziest AI gadgets run on low to mid-tier MediaTek and Qualcomm silicon. These devices are not burdened by the weight of an entire OS on them, but from what we’ve seen so far, even a half-decent smartphone can execute AI chores at a dramatically quicker pace compared to the R1 or Humane’s Pin.
I don’t want my AI gadget to take 15 seconds to process a request when even good old Siri can do a better job. That’s a poor benchmark, but that’s where the R1 stands. Now that we’re talking silicon, let’s discuss how processing plays a key role here. Generative AI tricks come to life in two ways. Most of the solutions take the queries to a cloud server, which means they need an internet connection.
The second option is offline processing, the way Google’s Gemini Nano model does on the Pixel 8 series and Samsung phones, among others. The biggest advantage is that you don’t need an internet connection in this scenario. There is currently no AI thingamajig out there that can work without an internet connection.
On-device AI is a real gem
With on-device processing, the Recorder app on Pixel phones can transcribe and summarize audio recordings. Magic Compose will level up your texting game without asking for Wi-Fi or cellular connections. The same is true for translations and transcription. In fact, Google laid the foundations of reliable offline translations all the way back in 2018 with its Neural Machine Translation tech.
But that’s just the tip of the iceberg. Later this year, Google will release Gemini Nano with Multimodality. That means you won’t need an internet connection for Gemini Live to see, understand, and provide contextual answers for what it sees and hears through your phone’s camera, screen, and mic.
Google is even supercharging the TalkBack accessibility feature with Gemini. That’s a huge win for folks living with speech and visibility challenges, but who need a reliable TalkBack companion with multimodal capabilities, but don’t have access to an internet connection.
Also, did I tell you that on-device AI processing is faster, and that it is dramatically safer because no data leaves your phone? More importantly, it ultimately lowers the cost of serving generative AI features.
Cost to consumers is currently one of the biggest uncertainties when it comes to the whole AI-phone marketing blitz. On-device AI comes as huge sigh of relief in this chaos, as you at least have an idea of the bare minimum that your phone can do without worrying too much about feature compatibility in the years to come.
Gemini is doing it right
Finally, we have the all-too-crucial question of interplay. My life revolves around Gmail, Docs, Drive, Maps, Photos, and Search, among others. Google has created Gems, aka custom Gemini-based assistants for handling specific tasks that knit tightly with other ecosystem products.
For example, when you ask Gemini to plan a trip for you, it will peek at your Gmail inbox for ticket scheduling and then combine the data in your voice/text prompt with relevant Google Search information to create a fully fleshed-out travel plan.
For those willing to pay for Gemini aAdvanced, there are even more productivity superpowers in tow. It can process PDFs up to 1,500 pages, 30,000 lines of code, an hourlong video, or a mix of various file formats.
Gemini will process all that input and will then serve you summarized versions, identify crucial aspects, and even double as a teacher after ingesting all that material. It can even take mundane spreadsheets and create a detailed financial report with a clear understanding of profits and related insights.
The AI will even hear calls and alert users if the caller is a scam. In fact, Gemini won’t even take you to another app. When you need it, the Gemini interface will simply hover over the app you are using at the moment, do its job, and vanish.
It’s hard to beat a smartphone
The point I want to make here is that an AI should serve as an assistant, but it needs to strike the right balance between functional versatility and practical convenience. It can only do so when it has access to data that matters to me, personally and professionally. And I want all those smarts to be served in the best way possible without any extra financial overhead.
Right now, the likes of Rabbit R1 or Humane AI Pin can barely scratch the surface of such deep product interconnection. Plus, the hardware itself holds back the AI from serving its full potential. I can’t imagine Google licensing Gemini Nano for something like the Rabbit R1, and even if it happens, the experience will be hobbled by the hardware.
So, why pay extra and settle for a subpar experience when the phone in your pocket can do a killer job? The AI phone is here. And it’s here to stay. Orange and shiny AI trinkets, on the other hand, are as good as dead.