Skip to main content

Shutterstock’s visual search engine could make browsing photos less of a chore


Perhaps somewhat ironically, we are used to searching for photos through text. Looking for a photo of a cat? Just type in “cat” in the Google image search bar, and it will return relevant photos (provided they were tagged as such, of course). Keywords will do the job at the most basic level, but what if you are looking for a specific type of photo? You could type “yellow cat” or some sort of generic description, but things become difficult as the description becomes more complex.

To address this, photo agency Shutterstock has just launched a new tool, called Reverse Image Search, that allows customers to upload a photo (up to 5MB) and find images that are similar. Using computer vision, Shutterstock says the tool breaks through the limiting ceiling of metadata.

Besides keywords, “the technology now relies instead on pixel data within images,” wrote Kevin Lester, who is Shutterstock’s vice president of engineering, in a blog post. “It has studied our 70 million images and 4 million video clips, broken them down into their principal features, and now recognizes what’s inside each and every image, including shapes, colors, and the smallest of details; this visual and conceptual data is represented numerically.”

Although this kind of computer vision-based search technology has existed for years (Google’s image search lets you do the same thing), when compared with similar tools offered by other stock photo services, Shutterstock says its technology is the most refined.

“It isn’t the first, but it’s the best on the market,” says Lawrence Lazare, Shutterstock’s product director for search and discovery.

The big benefits of using computer vision are accuracy and speed, and for Shutterstock’s customer base, it solves a major problem with search. It cuts down on the amount of time spent on searching for an image. If you are looking for inspiration or something generic, metadata (keywords) is easier, Lazare says. But if you are creating something specific — an ad campaign, for example — and you have specific requirements for what needs to be in that photo, then words aren’t as successful.

“Words are fallible — some pictures are hard to describe,” Lazare adds. “Some photos would require a short story to describe, and people don’t search like that.”

For example, typing in “sunset” into the search bar will result in 14,394 pages encompassing 1,439,383 photos, illustrations, and vector art that depict a sunset. And the photos are dependent on whether the photographer added keywords properly (sometimes a photographer will use a bunch of keywords to tag a batch of photos, say a wedding, but then may include photos that aren’t related).

The image on the left shows varied results when searching by keywords. The image on the right shows visually similar results when using visual search.
The image on the left shows varied results when searching by keywords. The image on the right shows visually similar results when using visual search.

You could narrow down the search results by adding additional keywords, like “city and architecture,” but, as it turns out, you’ll still have 140,330 options to browse through. It’s even more difficult when the photo in your head has nuances like the angle of a building or the color of a sunset.

Which is why visual similarity is more useful than keyword similarity, Lazare says, but this type of search requires a significant amount of machine learning, and it is not an easy task. When an image is uploaded, the computer breaks it down numerically — in a manner that it can understand — so that it can compare and contrast the important aspects of the image. The computer has to compare it against the millions of photos in Shutterstock’s archive, and do so incredibly quickly; it takes less than 20 milliseconds for the algorithms to compare and contrast 70 million images in real time. For the computer, some photos are easier to decipher, but when you have things like abstract art or colors, it’s a bit harder, and the computer is more likely to return “false positives.”

To achieve its success rates, the neural network utilized by Shutterstock’s computers required a lot of training. At the beginning, the first attempts weren’t good, but over time, the responses — reflecting the learning they were doing on their own — improved. Lester, who oversees search as well as the computer vision team, told us that in about a year’s time, the company managed to go from having nothing to having something that works well.

From our own experiments (the feature is live, and anyone can try it out by uploading an image), we can say the visual search tool is pretty good. Although it has trouble with complicated photos, it’s more successful with simpler ones. But Shutterstock, of course, isn’t the only company to develop a visual search engine: We noticed equally good results via Google’s image search, and many of Shutterstock’s competitors offer visual search as well (although Shutterstock showed us similar technology from competitors, and claims they aren’t as successful, hence one reason why they decided to build it from scratch).

We uploaded a fairly complicated photo, and threw the computer off. However, it does recognize that it's some type of architecture.
We uploaded a fairly complicated photo, and threw the computer off. However, it does recognize that it’s some type of architecture.

This all shows just how far along computer vision and machine learning have come in a relatively short time. And it’s only going to get better: Shutterstock is adding new tools to its network that would allow users to give its computers feedback about the quality of the search results, and will soon unveil visual search for its four million video footage assets, which is an even greater challenge than static photos.

Les Shu
Former Digital Trends Contributor
I am formerly a senior editor at Digital Trends. I bring with me more than a decade of tech and lifestyle journalism…
The best webcams for 2023

Laptop webcams suck. If you're stuck doing Zoom calls or videoconferencing from your home, you'll need a decent external or stand-alone webcam, one that works for your preferred space. The Logitech C920S is currently our pick for the best webcam. It's affordable and provides crisp image quality. But if you need a higher resolution or a streaming-specific option, check the full list below.

Read more
Best microSD cards in 2023: top picks for your computer, camera, or drone
galaxy s8 tips and tricks

MicroSD cards are an unsung hero in the mobile space. They provide extra storage space for your smartphone, but they're also key in tablets, drones, and security cameras. Unfortunately, this feature is falling out of favor in the smartphone space, and most flagship and midrange phones no longer have a slot to insert a microSD card. Even the most expensive phones around, like the Samsung Galaxy Z Fold 4 and the Samsung S22 Ultra don't have a microSD card slot anymore. You're not even safe if you have an iPhone or iPad, as they've never had microSD card slots.

But all is not lost! Some of the best Android phones and best Android tablets do still support them, and they're still required for use with cameras, security cameras, and drones. But no matter which device you're buying it for, you'll want to get your hands on a microSD card from a reputable brand.

Read more
Selfie stunt sets new Guinness World Record
Indian actor Akshay Kumar attempts the world record for most selfies taken in three minutes.

Akshay Kumar Attempts Most Selfies Taken In Three Minutes - Guinness World Records

It seems like Guinness World Records is happy to entertain any kind of extraordinary feat for its listings, no matter how weird or wacky.

Read more