Skip to main content

Microsoft’s new bot can draw a photo-realistic bird based on text descriptions

Microsoft’s research labs created a new artificial intelligence, or bot, that can draw any image you want based on simple descriptions. The company says this bot can draw anything in pixel form stemming from caption-like text descriptions you provide. And although text-to-image creation isn’t anything new, Microsoft’s “drawing bot” focuses on captions as image descriptors to produce an image quality that is claimed to be three times better than other state-of-the-art technologies.  

“The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes, such as grazing livestock, to the absurd, such as a floating double-decker bus,” Microsoft states. “Each image contains details that are absent from the text descriptions, indicating that this artificial intelligence contains an artificial imagination.” 

Recommended Videos

Microsoft’s drawing bot merges two components of artificial intelligence: Natural-language processing and computer vision. The research project started with a bot that could generate text captions from photos. The researchers then advanced the project to answer human-generated questions about images, such as identifying a location, the object in focus, and so on. 

Please enable Javascript to view this content

But actually drawing an image is a huge step. While the bot can generate components based on text descriptors, it must “imagine” all the other missing pieces of the picture. Thus, if you tell the bot to draw a yellow bird with black wings, it has four descriptors, but must pull the remaining parts from data it acquired from previous drawings, photos, and more. In other words, knowledge obtained through machine-based learning. 

Microsoft’s bot relies on a generative adversarial network (GAN). Just imagine two teams of computers: One side must render an image to fool the other team into believing it’s an actual photograph. Both teams go back and forth, with the first saying the image is real, and the second saying “nuh-uh,” disproving the claim. The goal, obviously, is to render an image that finally fools the second team. 

In this case, the first team renders an image derived from text-based descriptions and the second team will disprove its “authenticity” as an actual photograph until the first team correctly renders the image. Microsoft first fed its GAN with paired images and captions so that it could understand that it needs to draw a bird based on that single word. 

From there, Microsoft continued to build the knowledge base with paired images and captions consisting of multiple traits, such as black wings and a red belly. But Microsoft says it’s not using just any GAN, but one that targets tiny details so the bot can produce photo-realistic results. Microsoft dubs it as an attentional GAN, or AttnGAN. 

“As humans draw, we repeatedly refer to the text and pay close attention to the words that describe the region of the image we are drawing,” the company says. “[AttnGAN] does this by breaking up the input text into individual words and matching those words to specific regions of the image.” 

You can read Microsoft’s research paper describing its AttnGAN here. 

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
One more year of the iMac Pro being missing in action
Apple iMac Pro News

This week, Apple announced a new M4 iMac. It got some upgrades that help make it more appealing to creatives and pros, such as the more powerful M4 chip, Thunderbolt 4, upgraded camera, and nano-texture display.

But an iMac Pro, this is not.

Read more
The best tablets in 2025: top 8 tablets you can buy now
Disney+ app on the iPad Air 5.

Even the best smartphones still have relatively small screens. After all, they can only get so big before they're no longer practical or pocketable. There are some great folding phones that try to give you the best of both worlds, but these are pricy options and often make compromises you're unwilling to live with. If you're looking for a larger canvas for reading, sketching, gaming, or just about anything else, you're likely much better off going with a tablet. This lets your smartphone keep doing what it does best — being portable, taking great pictures, and offering solid battery life — while giving you an alternative for those times when you need more screen real estate.

However, choosing the right tablet can be tricky. People use tablets in different ways, so there are dozens of options on the market to address these myriad needs. We know how hard it can be to wade through them all and find the one that works best for your use case and budget, so we've done the legwork for you, reviewing dozens of tablets across the entire spectrum to compile a list of the top eight for all walks of life.

Read more
There’s a new challenger to one of our favorite laptops
Lenovo Yoga Slim 7i Aura Edition front view showing display and keyboard.

When it came out earlier this year, the Surface Laptop 7th Edition was a breath of fresh air. So much so that our computing editor adopted it as his "new daily carry" after reviewing the device.

And while we still highly recommend that laptop, there's a new challenger in town. The Lenovo Yoga Slim 7i Aura Edition is one of the rare laptops comes with a 15-inch display that can match the Surface Laptop 7th Edition -- and it also comes with the latest Intel chip. It's the perfect foil to the Surface Laptop 7th Edition, and in some ways, might be an even better option.
Specs and configurations

Read more