Verbal nonsense reveals limitations of AI chatbots

The era of artificial intelligence (AI) chatbots that appear to understand and use language in a human-like manner has dawned. These chatbots rely on large language models, a type of neural network. However, a recent study has revealed a vulnerability in these large language models, as they can sometimes mistake nonsense for natural language. Researchers at Columbia University see this flaw as an opportunity to enhance chatbot performance and gain insights into how humans process language.

In their paper published in Nature Machine Intelligence, the scientists describe how they conducted experiments using nine different language models. They presented hundreds of pairs of sentences to human participants and asked them to select the sentence they believed sounded more natural, i.e., the one more likely to be encountered in everyday communication. The researchers then evaluated whether the AI models would provide the same judgments as the human participants.

In head-to-head comparisons, the more advanced AI models based on transformer neural networks generally outperformed simpler models, such as recurrent neural networks and statistical models that rely on word pair frequencies from the internet or online databases. However, all models exhibited errors, occasionally selecting sentences that sounded like gibberish to humans.

Dr. Nikolaus Kriegeskorte, a principal investigator at Columbia’s Zuckerman Institute and a coauthor of the paper, noted, “That some of the large language models perform as well as they do suggests that they capture something important that the simpler models are missing. That even the best models we studied still can be fooled by nonsense sentences shows that their computations are missing something about the way humans process language.”

For example, consider the following sentence pair:

  1. That is the narrative we have been sold.
  2. This is the week you have been dying.

Human participants in the study judged the first sentence as more natural. However, BERT, one of the advanced models, rated the second sentence as more natural, while GPT-2, another widely known model, correctly identified the first sentence as more natural, aligning with human judgments.

Christopher Baldassano, an assistant professor of psychology at Columbia and the senior author of the study, emphasized that all models had blind spots and labeled some sentences as meaningful when human participants considered them gibberish. He cautioned against relying too heavily on AI systems for important decisions, at least in their current state.

One of the intriguing findings of the study is the good yet imperfect performance of many models. Dr. Kriegeskorte emphasized the importance of understanding why these gaps exist and why certain models outperform others, as this knowledge can drive progress in language models.

The researchers are also curious about whether the computations in AI chatbots can inspire new scientific questions and hypotheses, potentially guiding neuroscientists toward a better understanding of human brain function. Analyzing the strengths and weaknesses of various chatbots and their underlying algorithms may contribute to answering this question.

Tal Golan, the paper’s corresponding author, who recently established his own lab at Ben-Gurion University of the Negev in Israel, highlighted the interest in understanding how people think and the unique processing of language by AI tools, offering a fresh perspective on human cognition.

Posted in

Aihub Team

Leave a Comment





Deep Learning: The advancement of deep neural networks and their applications in various domains.

AI for Climate Change and Sustainability

Top 4 Types of AI

Artificial Intelligence and Machine Learning

The Biggest Lie In Protest

Protest Strategies For Beginners

Top 10 Tips To Grow Your Tech

Microsoft announces native Teams

Oppo working Find N Fold and Find

NASA scrubs second Artemis 1 launch

Lunar demo mission to provide “stress test” for NASA’s Artemis

Italian microsatellite promises orbital photo bonanza after

Uber drivers at record high as people record high as people as people

Tension between China and Taiwan has risen and what happens what happens

The ride-hailing app had been facing a driver shortage driver shortage

The meteoric rise of AMTD Digital’s shares has been likened been likened

THE BEST WINTER VACATION SPOTS IN THE USA

What Can Instagramm Teach You About Innovation

Where Can You Find Free TECHNOLOGY Resources

Build a business, not a, not a financial machine a financial machine

Giant solar sail will propel tiny spacecraft to intercept and study

Every great design begins with an even better story even better story.

Simplicity carried to an extreme becomes elegance.

Design is not just what it looks like and feels like and feels.

Before you can master design, you must first master the basic

There Hydrogen leak delays moonshot by at least several weeksis

Creating is a privilege but it’s a gift

Being unique is better than perfect

Every day, in every city and town

Falcon 9 launches Starlink satellites, Boeing rideshare payload