Skip to main content

Microsoft’s new bot can draw a photo-realistic bird based on text descriptions

Microsoft

Microsoft’s research labs created a new artificial intelligence, or bot, that can draw any image you want based on simple descriptions. The company says this bot can draw anything in pixel form stemming from caption-like text descriptions you provide. And although text-to-image creation isn’t anything new, Microsoft’s “drawing bot” focuses on captions as image descriptors to produce an image quality that is claimed to be three times better than other state-of-the-art technologies.  

“The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes, such as grazing livestock, to the absurd, such as a floating double-decker bus,” Microsoft states. “Each image contains details that are absent from the text descriptions, indicating that this artificial intelligence contains an artificial imagination.” 

Microsoft’s drawing bot merges two components of artificial intelligence: Natural-language processing and computer vision. The research project started with a bot that could generate text captions from photos. The researchers then advanced the project to answer human-generated questions about images, such as identifying a location, the object in focus, and so on. 

But actually drawing an image is a huge step. While the bot can generate components based on text descriptors, it must “imagine” all the other missing pieces of the picture. Thus, if you tell the bot to draw a yellow bird with black wings, it has four descriptors, but must pull the remaining parts from data it acquired from previous drawings, photos, and more. In other words, knowledge obtained through machine-based learning. 

Microsoft’s bot relies on a generative adversarial network (GAN). Just imagine two teams of computers: One side must render an image to fool the other team into believing it’s an actual photograph. Both teams go back and forth, with the first saying the image is real, and the second saying “nuh-uh,” disproving the claim. The goal, obviously, is to render an image that finally fools the second team. 

In this case, the first team renders an image derived from text-based descriptions and the second team will disprove its “authenticity” as an actual photograph until the first team correctly renders the image. Microsoft first fed its GAN with paired images and captions so that it could understand that it needs to draw a bird based on that single word. 

From there, Microsoft continued to build the knowledge base with paired images and captions consisting of multiple traits, such as black wings and a red belly. But Microsoft says it’s not using just any GAN, but one that targets tiny details so the bot can produce photo-realistic results. Microsoft dubs it as an attentional GAN, or AttnGAN. 

“As humans draw, we repeatedly refer to the text and pay close attention to the words that describe the region of the image we are drawing,” the company says. “[AttnGAN] does this by breaking up the input text into individual words and matching those words to specific regions of the image.” 

You can read Microsoft’s research paper describing its AttnGAN here. 

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
Trying to buy a GPU in 2023 almost makes me miss the shortage
Two AMD Radeon RX 7000 graphics cards on a pink surface.

The days of the GPU shortage are long over, but somehow, buying a GPU is harder than ever -- and that sentiment has very little to do with stock levels. It's just that there are no obvious candidates when shopping anymore.

In a generation where no single GPU stands out as the single best graphics card, it's hard to jump on board with the latest from AMD and Nvidia. I don't want to see another GPU shortage, but the state of the graphics card market is far from where it should be.
This generation is all over the place

Read more
HP printers are heavily discounted in Best Buy’s flash sale
The HP - OfficeJet Pro 8034e Wireless All-In-One Inkjet Printer on a desk with a smartphone.

There’s good news in store if you’re looking to land a new printer at a discount this weekend. Best Buy is having a 48-hour flash sale on HP printers, with several that can compete with the best printers seeing some good prices. HP is almost always one of the best laptop brands, and it’s one of the same when it comes to printers. So if you’re looking for a new home or office printer, read onward on how to save on an HP printer at Best Buy.
HP DeskJet 2755e — $60, was $85

The HP DeskJet 2755e is a good entry-level printer. It’s got you covered if your printing needs are pretty basic, or if you don’t need to print in mass. This is a color InkJet printer, which makes it good for almost all uses. It can also make copies and scan in color, and it has mobile and wireless printing functionality. You can get set up quickly and easily with the HP Smart app that guides you through the setup process, and you can also use this app to print, scan and copy documents from your phone.

Read more
This tiny ThinkPad can’t quite keep up with the MacBook Air M2
Lenovo ThinkPad X1 Nano Gen 3 rear view showing lid and logo.

While the laptop industry continues to move toward 14-inch laptops and larger, the 13-inch laptop remains an important category. One of the best is the Apple MacBook Air M2, with an extremely thin and well-built chassis, great performance, and incredibly long battery life.

Lenovo has recently introduced the third generation of its ThinkPad X1 Nano, one of the lightest laptops we've tested and a good performer as well. It's stiff competition, but which of these two diminutive laptops stands apart?
Specs and configurations

Read more