Google’s Chain of Thought Prompting Can Boost Today’s Best Algorithms

Google’s latest innovation attempts to improve the efficiency of algorithms that predict search results. The company is using prompts in response to a question, which could lead to more relevant and personalized recommendations for consumers.

Self-consistency improves chain of thought reasoning in language models. The concept of self-consistency is a way to improve the accuracy of artificial intelligence by making it easier for computers to process natural language input. This is done by first teaching machines how to think, then teaching them how to speak.

Google revealed Chain of Thought Prompting, a breakthrough in Natural Language Processing that advances the state of the art of sophisticated technologies like PaLM and LaMDA to a “amazing” level, according to the experts.

The fact that Chain of Thought Prompting can significantly enhance PaLM and LaMDA is remarkable.

PaLM and LaMDA

The study used two language models: LaMDA (Language Model for Dialogue Applications) and Pathways (Language Model for Pathways) (PaLM).

LaMDA is a conversational model, similar to a chatbot, but it may also be used for a variety of different applications that need speech and interaction.

PaLM is a model that uses Google’s Pathways AI architecture, which involves training a language model to understand how to solve issues.

Previously, machine learning models were trained to address a certain kind of issue and then released to perform that one thing very well. However, Google would have to train a new model to perform something different.

The Pathways AI architecture is a method for developing a model that can address challenges it hasn’t encountered before.

According to the Google PaLM explanation:

“…we’d want to train a single model that can perform a variety of jobs while also drawing on and combining its previous talents to learn new tasks quicker and more efficiently.”

What Does It Do?

Three major advancements in Chain of Thought Reasoning are listed in the study paper:

  1. It enables language models to decompose multi-step tasks into a series of stages.
  2. Engineers can peep into the mental process using the chain of thought, and when things go wrong, they can figure out what went wrong and correct it.
  3. Can answer arithmetic word problems, do commonsense thinking, and (in theory) solve any word-based issue that a person can, according to the study article.

Multiple-step reasoning exercises

The study uses the following example of a multi-step reasoning assignment to assess language models:

“Q: There were 23 apples in the cafeteria. How many apples do they have if they used 20 for lunch and purchased 6 more?

A: Originally, the cafeteria had 23 apples. Lunch was prepared with 20. As a result, they had 23 – 20 = 3. They purchased 6 additional apples, giving them a total of 3 + 6 = 9. “The correct answer is nine.”

PaLM is a cutting-edge language model included in the Pathways AI architecture. It has gotten to the point where it can explain why a joke is hilarious.

Despite how sophisticated PaLM is, the researchers argue that Chain of Thought Prompting considerably improves these models, which is why this new study is so important. This is how Google describes it:

“Models may break difficult issues into intermediate phases that can be addressed individually using chain of thought reasoning.”

Furthermore, since chain of thought is founded on language, it may be applied to any job that can be solved using language.”

The study report goes on to say that when the model’s size is expanded, traditional prompting doesn’t significantly improve.

With this new technique, however, scale has a considerable and noticeable beneficial influence on the model’s performance.


Chain of Thought Prompting was tested on both PaLM and LaMDA, using two mathematical word problem datasets.

Researchers use these datasets to compare outcomes for various language models on comparable challenges.

The graphs below demonstrate the outcomes of employing Chain of Thought Prompting on LaMDA.

Chain of Thought Prompting and LaMDA

Scaling LaMDA on the MultiArith dataset yielded a moderate improvement, according to the findings. When weighed with Chain of Thought Prompting, however, LaMDA scores much higher.

The GSM8K dataset findings suggest a little improvement.

The PaLM language model is a separate story.

Chain of Thought Prompting and PaLM

The improvements from scaling PaLM with Chain of Thought Prompting are enormous, as seen in the graph above, and they are enormous for both datasets (MultiArith and GSM8K).

The researchers describe the findings as “extraordinary” and “state-of-the-art”:

“When scaled to 540B parameters, PaLM performs well on the GSM8K dataset of arithmetic word problems.

…adding the 540B parameter to the chain of thinking prompts The PaLM model achieves a new state-of-the-art performance of 58 percent, which is higher than the previous state-of-the-art of 55 percent, which was reached by fine-tuning GPT-3 175B on a large training set and then rating probable solutions using a specifically trained verifier.

Furthermore, follow-up work on self-consistency reveals that using the majority vote of a large collection of created reasoning processes improves the performance of chain of thought prompting even more, resulting in 74 percent accuracy on GSM8K.”


The conclusion of a research article is one of the most significant elements for determining whether the study improves the state of the art, is a dead-end, or requires more investigation.

The final part of Google’s study report is overwhelmingly favorable.

It notes:

“We looked at chained thought prompting as a basic and widely applicable approach for improving reasoning in language models.

We discovered that chain of thought processing is an emergent characteristic of model size that permits sufficiently big language models to do reasoning tasks that would otherwise have flat scaling curves via tests on arithmetic, symbolic, and commonsense reasoning.

Expanding the variety of reasoning problems that language models can handle would hopefully spur further research into language-based reasoning systems.”

That implies Chain of Thought Prompting might help Google dramatically enhance its multiple language models, which could lead to huge increases in the kind of things Google can accomplish.


Read the article on Google AI.

Language models use a chain of thought to reason.

The Research Paper is available for download and reading.

In large language models, chain of thought prompting elicits reasoning (PDF)

The “540b language model” is a neural network that was developed by Google. It helps to boost today’s best algorithms.

Related Tags

  • show your work: scratchpads for intermediate computation with language models
  • palm google ai
  • google language model
  • pathways language model
  • palm language model

Leave a Comment

Your email address will not be published.