How do you prove you’re human when it comes to communicating on the internet? It’s a tough question, but for years the answer has been your ability to successfully read a string of distorted characters that are unrecognizable to a machine. Called CAPTCHAs (“Completely Automated Public Turing test to tell Computers and Humans Apart”), this security tool is used for everything from blocking automated spammers to stopping bots creating fraudulent profiles on social media sites. And for the past 20-odd years, it’s worked — possibly until now, that is.
In a joint effort by researchers from the U.K.’s Lancaster University and China’s Northwest University and Peking University, computer scientists have developed an artificial intelligence capable of cracking text CAPTCHA systems in as little as 0.5 seconds. It was successfully tested on different 33 CAPTCHA schemes, of which 11 came from the world’s most popular websites, including eBay and Wikipedia.
“We think our research probably has pronounced a death sentence for text CAPTCHA,” Zheng Wang, associate professor in the School of Computing and Communications at Lancaster University, told Digital Trends.
The attack developed by the researchers is based on a deep neural network-based image classifier. Deep neural networks have demonstrated impressive performance in image recognition. However, successful models typically require millions of manually labeled images to learn from. The novelty of this latest work is that it uses a generative adversarial network (GAN) to create this training data. Instead of collecting and labeling millions of CAPTCHA examples, the system requires as few as 500 to learn from. It can then use this to generate millions or even billions of synthetic training data to create its successful image classifier. The result? A higher accuracy than any of the CAPTCHA recognizer systems seen to date.
This approach would be useful with any image recognition task requiring masses of training data. CAPTCHAs, however, are somewhat unique in the sense that they keep evolving. The text-based early CAPTCHAs (as seen in the thumbnail picture for this article) was the first iteration of the technology. However, by now you’re probably more used to something like the traffic sign-based CAPTCHAs that are widely used. This constant shifting (versus, say, learning to recognize a dog, which remains broadly the same over lifetimes) makes collecting training data a pain.
“[It] means that by the time the attacker has collected enough training data, the CAPTCHA scheme would have already changed, which will invalidate the efforts,” Wang said. “Our work presents a new way to generate CAPTCHA recognizer at a much lower cost. As a result, it poses a real threat to CAPTCHA schemes as it can learn a CAPTCHA solver much quicker.”