Named after computer scientist Alan Turing, the Turing test tries to determine if a machine can act like a human being well enough to fool the person taking the test. An online game called Human or Not offered people a similar challenge, and now the results are in.
Launched around a month ago, Human or Not asked you to chat with someone (or something) for two minutes and then try to figure out if it was another human being or an AI bot. In accepting the challenge, you were able to ask any questions or offer any responses you wanted. But once the two minutes were up, you had to guess who or what was on the other end.
After generating millions of conversations in one of the largest Turing tests ever recorded, developer AI21 Labs found that 32% of the people who tried the game were unable to tell the difference between a human and a bot, leaving 68% who got it right.
Overall, people had an easier time trying to identify another person. When chatting with a human being, the participants got the right answer 73% of the time. But when speaking with a bot, they guessed correctly just 60% of the time.
Across 17 different countries, France scored the highest percentage of correct guesses at 71%, while India had the lowest score at 63.5%. The United States came in toward the middle of the pack with 67%, while the U.K. scored 67.5%, Italy 67%, and Russia 66%.
To challenge its users, Human or Not used an AI bot based on top large language models (LLMs) such as GPT-4 and AI21 Labs’ own Jurassic-2. Such LLMs rely on deep learning to help chatbots and other AI tools generate more human-like text. Beyond employing these models, AI21 developed a framework that would create a different bot character in each game.
Participants turned to a few tricks to try to distinguish human from bot. But with AI well trained and well informed, those tricks didn’t always work.
If the chat partner made spelling mistakes or grammatical errors or used slang, then many people assumed it was likely a human being. However, the models in the game were specifically trained to make certain mistakes and use slang.
In some cases, participants tried to steer the conversation toward current events under the belief that many AIs have a cutoff date after which they’re unaware of newer events. These people asked such questions as: “What is the exact date and time where you are?” and “What did you think of Biden’s last speech?” However, most of the models used in the game were connected to the internet and so they were aware of recent news events.
Knowing that bots obviously don’t have a personal life, some participants asked personal questions such as “What’s your name?” and “Where are you from?” and then gauged the reaction. But most of the bots managed to answer these questions successfully by making up personalities based on the personal stories in their databases.
Using one trick that may have worked better than others, some participants asked their chat partner for advice on illegal activities or told them to use offensive language. The idea here is that an AI’s “ethical sub-routines” would prevent them from responding to such requests.
In one more interesting strategy, participants assumed that a chat partner that was too polite or kind was probably a bot. The perception here is that human beings are often rude and impolite, especially when online.
AI21 Labs said it will study the findings in more detail and work with other leading AI researchers and labs on the project. The aim is to help the public, researchers, and policymakers alike to better understand AI not just as productivity tools but as future members of the online world.
“We started this experiment with some basic questions about people’s ability to distinguish between humans and machines, given the crazy development of AI in the past year or so, and we found some answers for that,” said Amos Meron, creative product lead at AI21 labs and the designer of the game.
“More importantly, though, is that we now have new and more important questions we need to think about,” Meron added. “Given that at least in some cases people can’t tell the difference, what interactions do people want and should experience online with bots? Should they be informed about the fact they’re talking to a machine? What policies should we have in place? Of course we don’t have the answers to these questions, but we hope this experiment helped start the conversation earlier rather than later, because we assume the technology is only going to get ever better soon.”
You can still try your hand at the game here.