Multi-AI collaboration helps reasoning and factual accuracy in large language models [MIT News]

September 18, 2023

Researchers use multiple AI models to collaborate, debate, and improve their reasoning abilities to advance the performance of LLMs while increasing accountability and factual accuracy.

Rachel Gordon | MIT CSAIL

An age-old adage, often introduced to us during our formative years, is designed to nudge us beyond our self-centered, nascent minds: "Two heads are better than one." This proverb encourages collaborative thinking and highlights the potency of shared intellect.

Fast forward to 2023, and we find that this wisdom holds true even in the realm of artificial intelligence: Multiple language models, working in harmony, are better than one. 

Recently, a team from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) embodied this ancient wisdom within the frontier of modern technology. They introduced a strategy that leverages multiple AI systems to discuss and argue with each other to converge on a best-possible answer to a given question. This method empowers these expansive language models to heighten their adherence to factual data and refine their decision-making. 

The crux of the problem with large language models (LLMs) lies in the inconsistency of their generated responses, leading to potential inaccuracies and flawed reasoning. This new approach lets each agent actively assess every other agent’s responses, and uses this collective feedback to refine its own answer. In technical terms, the process consists of multiple rounds of response generation and critique. Each language model generates an answer to the given question, and then incorporates the feedback from all other agents to update its own response. This iterative cycle culminates in a final output from a majority vote across the models' solutions. It somewhat mirrors the dynamics of a group discussion — where individuals contribute to reach a unified and well-reasoned conclusion.

One real strength of the approach lies in its seamless application to existing black-box models. As the methodology revolves around generating text, it can also be implemented across various LLMs without needing access to their internal workings. This simplicity, the team says, could help researchers and developers use the tool to improve the consistency and factual accuracy of language model outputs across the board.

“Employing a novel approach, we don’t simply rely on a single AI model for answers. Instead, our process enlists a multitude of AI models, each bringing unique insights to tackle a question. Although their initial responses may seem truncated or may contain errors, these models can sharpen and improve their own answers by scrutinizing the responses offered by their counterparts," says Yilun Du, an MIT PhD student in electrical engineering and computer science, affiliate of MIT CSAIL, and lead author on a new paper about the work. "As these AI models engage in discourse and deliberation, they're better equipped to recognize and rectify issues, enhance their problem-solving abilities, and better verify the precision of their responses. Essentially, we're cultivating an environment that compels them to delve deeper into the crux of a problem. This stands in contrast to a single, solitary AI model, which often parrots content found on the internet. Our method, however, actively stimulates the AI models to craft more accurate and comprehensive solutions."

The research looked at mathematical problem-solving, including grade school and middle/high school math problems, and saw a significant boost in performance through the multi-agent debate process. Additionally, the language models showed off enhanced abilities to generate accurate arithmetic evaluations, illustrating potential across different domains.

The method can also help address the issue of "hallucinations" that often plague language models. By designing an environment where agents critique each other's responses, they were more incentivized to avoid spitting out random information and prioritize factual accuracy. 

Beyond its application to language models, the approach could also be used for integrating diverse models with specialized capabilities. By establishing a decentralized system where multiple agents interact and debate, they could potentially use these comprehensive and efficient problem-solving abilities across various modalities like speech, video, or text...

Read the full article on the MIT News website using the link below.

Associated CBMM Pages: