New corruption tool spells trouble for AI text-to-image tech

Professional artists and photographers annoyed at generative AI firms using their work to train their technology may soon have an effective way to respond that doesn’t involve going to the courts.

Generative AI burst onto the scene with the launch of OpenAI’s ChatGPT chatbot almost a year ago. The tool is extremely adept at conversing in a very natural, human-like way, but to gain that ability it had to be trained on masses of data scraped from the web.

Recommended Videos

Similar generative AI tools are also capable of producing images from text prompts, but like ChatGPT, they’re trained by scraping images published on the web.

It means artists and photographers are having their work used — without consent or compensation — by tech firms to build out their generative AI tools.

To fight this, a team of researchers has developed a tool called Nightshade that’s capable of confusing the training model, causing it to spit out erroneous images in response to prompts.

Outlined recently in an article by MIT Technology Review, Nightshade “poisons” the training data by adding invisible pixels to a piece of art before it’s uploaded to the web.

“Using it to ‘poison’ this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless — dogs become cats, cars become cows, and so forth,” MIT’s report said, adding that the research behind Nightshade has been submitted for peer review.

While the image-generating tools are already impressive and are continuing to improve, the way they’re trained has proved controversial, with many of the tools’ creators currently facing lawsuits from artists claiming that their work has been used without permission or payment.

University of Chicago professor Ben Zhao, who led the research team behind Nightshade, said that such a tool could help shift the balance of power back to artists, firing a warning shot at tech firms that ignore copyright and intellectual property.

“The data sets for large AI models can consist of billions of images, so the more poisoned images can be scraped into the model, the more damage the technique will cause,” MIT Technology Review said in its report.

When it releases Nightshade, the team is planning to make it open source so that others can refine it and make it more effective.

Aware of its potential to disrupt, the team behind Nightshade said it should be used as “a last defense for content creators against web scrapers” that disrespect their rights.

In a bid to deal with the issue, DALL-E creator OpenAI recently began allowing artists to remove their work from its training data, but the process has been described as extremely onerous as it requires the artist to send a copy of every single image they want removed, together with a description of that image, with each request requiring its own application.

Making the removal process considerably easier might go some way to discouraging artists from opting to use a tool like Nightshade, which could cause many more issues for OpenAI and others in the long run.

Editors’ Recommendations