How do you train an artificial intelligence program to choose when to take a picture? Using professional photographers of course. That’s part of the approach Google engineers took while developing Clips, the wearable camera that presses the shutter button for you (at least some of the time anyway). On the Google Design Blog, the company recently shared a glimpse at how the A.I. inside of the upcoming camera was built.
While a majority of A.I. programs are so large that cloud-based storage is a necessity, Google Clips houses the entire program inside of the camera, a security measure to keep content offline until the user decides to upload. Google engineers have spent three years building the camera — including the software.
Google says the approach to the Clips’ A.I. is human-centered. Google designer Josh Lovejoy put out a job ad for photographers, and in return was able to build a team that included a documentary filmmaker, a photojournalist, and a fine art photographer. After putting content from all those creatives together, the team asked, “What makes a memorable moment?”
While the group started with big ideas about rule of thirds, lighting and depth of field, the programmers began to realize they needed to simplify the list in order to teach those ideas to a computer — or as Lovejoy puts it, teaching Go, Dog, Go! rather than starting with Shakespeare.
With a revised approach, the engineers began teaching the software using consistent examples — with each image fed into the system designed around teaching one specific concept. Many of those concepts, rather than what to look for, centered on what not to photograph. Clips was trained to ignore bouncing around inside of a purse, fingers over the lens, shaky movements, and blur, for example.
So besides what not to do, how did the system learn which moments to photograph? Clips is also trained in diversity — the camera is more likely to take a picture with a change in the environment, for example. Clips looks for visual changes using color along with avoiding allowing too much time to pass before taking another picture. As Google shared with the announcement of the lifelogging camera, the A.I. is also trained to learn which faces are familiar and which are strangers.
In the end, simplifying the program created the best results, the blog post suggests, alongside slight overshooting and giving the user the final say over which images to keep. Google also added a shutter button and software viewfinder to give the user the ability to snap shots themselves.
“In the context of subjectivity and personalization, perfection simply isn’t possible, and it really shouldn’t even be a goal,” Lovejoy said. “Unlike traditional software development, [machine learning] systems will never be ‘bug-free’ because prediction is an innately fuzzy science. But it’s precisely this fuzziness that makes ML so useful … success with Clips isn’t just about keeps, deletes, clicks, and edits (though those are important), it’s about authorship, co-learning, and adaptation over time.”
Google hasn’t yet shared an official launch date for Clips, though recent Federal Communications Commission approval suggests the release date could be coming up soon. Google has a sign-up list to receive updates on availability for the $250 life-logging camera.