Skip to main content

Researchers teach computer to understand dialects by reading Twitter

Twitter
Computers don’t harbor the more problematic prejudices that are unfortunately still found in parts of society, but that isn’t to say they’re without their faults. One task machines frequently prove less adept at is understanding other dialects, such as an English language dialect considered to originate in some African-American communities. (Researchers term the dialect “African-American English,” which we realize may be regarded as inaccurate by African Americans who don’t share it.) Now, researchers are training AI to recognize and use this dialect.

When it comes to why computers are less good at understanding some dialects than others, there is a logical reason: computer scientists who have spent the past 30 years teaching machines to read have frequently used readily available data, such as back issues of the Wall Street Journal, to carry out the training. Such formal written language has rendered many natural language processing (NLP) systems less adept at understanding language which doesn’t conform to a very specific type.

“If you think about traditional media that have existed for a long time — things like books or, more recently, newspapers — you’re seeing a very standardized dialect of language, associated with elite education and the like,” Brendan O’Connor, a natural language processing expert at the University of Massachusetts Amherst, told Digital Trends. “That’s not specific to English: you see it in every language in the world.”

As O’Connor noted, this no longer has to be the case. The internet — and particularly social media — has opened up a rich data-stream of different dialects which can be used to train the next wave of NLP systems. In a new paper, O’Connor and other researchers created the largest dataset for studying African-American English from online communication, composed of 59 million tweets from 2.8 million users.

“The African-American English dialect has … millions of speakers and is different from standard English in several interesting ways,” O’Connor said. “It’s different enough that our artificial intelligence tools — which are designed for standardized English — perform worse with them; they’re less intelligent at understanding that dialect. African-American English is often incorrectly characterized as ‘not English’ by current classifiers.”

For their paper, O’Connor and his colleagues showed that properly fine-tuned NLP systems are capable of understanding African-American English. The authors plan to release their new model in the next year to better identify English written in this dialect.

“The future next step is to make systems that can do deeper analysis of sentences that are written in different types of English dialects,” he said. “Embracing linguistic diversity is certainly something that needs to be focused on. We highlight the importance of engineering systems that are better at handling different forms of dialect.”

Because, ultimately, making AI systems that can understand everyone equally will be the best possible outcome for all.

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Language supermodel: How GPT-3 is quietly ushering in the A.I. revolution
profile of head on computer chip artificial intelligence

OpenAI’s GPT-2 text-generating algorithm was once considered too dangerous to release. Then it got released -- and the world kept on turning.

In retrospect, the comparatively small GPT-2 language model (a puny 1.5 billion parameters) looks paltry next to its sequel, GPT-3, which boasts a massive 175 billion parameters, was trained on 45 TB of text data, and cost a reported $12 million (at least) to build.

Read more
Why teaching robots to play hide-and-seek could be the key to next-gen A.I.
AI2-Thor multi-agent

Artificial general intelligence, the idea of an intelligent A.I. agent that’s able to understand and learn any intellectual task that humans can do, has long been a component of science fiction. As A.I. gets smarter and smarter -- especially with breakthroughs in machine learning tools that are able to rewrite their code to learn from new experiences -- it’s increasingly widely a part of real artificial intelligence conversations as well.

But how do we measure AGI when it does arrive? Over the years, researchers have laid out a number of possibilities. The most famous remains the Turing Test, in which a human judge interacts, sight unseen, with both humans and a machine, and must try and guess which is which. Two others, Ben Goertzel’s Robot College Student Test and Nils J. Nilsson’s Employment Test, seek to practically test an A.I.’s abilities by seeing whether it could earn a college degree or carry out workplace jobs. Another, which I should personally love to discount, posits that intelligence may be measured by the successful ability to assemble Ikea-style flatpack furniture without problems.

Read more
A.I. teaching assistants could help fill the gaps created by virtual classrooms
AI in education kid with robot

There didn’t seem to be anything strange about the new teaching assistant, Jill Watson, who messaged students about assignments and due dates in professor Ashok Goel’s artificial intelligence class at the Georgia Institute of Technology. Her responses were brief but informative, and it wasn’t until the semester ended that the students learned Jill wasn’t actually a “she” at all, let alone a human being. Jill was a chatbot, built by Goel to help lighten the load on his eight other human TAs.

"We thought that if an A.I. TA would automatically answer routine questions that typically have crisp answers, then the (human) teaching staff could engage the students on the more open-ended questions," Goel told Digital Trends. "It is only later that we became motivated by the goal of building human-like A.I. TAs so that the students cannot easily tell the difference between human and A.I. TAs. Now we are interested in building A.I. TAs that enhance student engagement, retention, performance, and learning."

Read more