October 28, 2024

ReAct: Merging Reasoning and Action to Elevate AI Task Solving

Shany Barhom | Data Scientist at Neradot
ReAct: Merging Reasoning and Action to Elevate AI Task Solving

“ReAct: Synergizing Reasoning and Acting in language models" is a research paper that explores the use of large language models (LLMs) to generate both reasoning traces and task-specific actions — crucial elements for tackling complex tasks.

As most stand-alone LLMs depend exclusively on their internal knowledge to execute tasks, they are typically unable to access or retrieve information beyond what they have already been trained on. But just as humans combine their cognitive reasoning with external tools (such as search engines) to solve complex tasks, why not empower LLMs to operate similarly?

Background

Recent advancements have broadened the scope of LLMs in handling downstream tasks. A notable approach, Chain-of-Thought (CoT) prompting, enhances the performance of LLMs by guiding them to produce reasoning traces and decompose complex problems into a series of intermediate steps, to facilitate more effective problem-solving. However, CoT prompting operates solely within the model’s internal representations, which means it may not be grounded in real-world data. As a result, the model lacks the ability to actively explore, reason about, or update its knowledge based on external information.

figure caption: Chain-of-Thought (CoT) prompting in which the LLM only generates self-conditioned reasoning traces

On the other hand, in Action-only prompting LLMs are guided to plan and execute actions in interactive environments — such as games, web navigation, and robotics. These models focus on mapping textual inputs from the task’s environment into text-based actions using the internal knowledge of the language model. However, they usually fall short in reasoning about high-level goals, limiting their ability to sustain coherent actions over extended or complex tasks.

figure caption: Action-only prompting, solely generating task-specific actions based on observations it receives from the task’s environment .
ReAct

Reason+Act (ReAct) paradigm allows language models to create both reasoning steps and actions in a back-and-forth manner. Actions interact with the outside world and give feedback from the environment, updating the model’s internal reasoning process by adding useful information, helping the model make better decisions and take more effective actions.

This can be exemplify by the figure below, where a question from HotPotQA dataset is given (1), and each of the compared methods is required to answer the question. In the Chain-of-Thought (CoT) method (1b), the model is instructed to generate thoughts before answering the question directly, which in general, improves the decision-making capabilities of the model, but may lead to issues like fact hallucination, as in the example below. The Action-only model (1c) is guided to use given tools (in this example, a search engine tool), to generate its final answer based on the tool’s result — the observations. In this case, the action-only model falls short due to its inability to reason. Even though it has the same actions and observations as the ReAct model, it cannot effectively combine them into a coherent final answer. On the other hand, ReAct (1d) was able to answer the question correctly, while gathering information to help with reasoning and using that reasoning to decide what to look for next, showing a strong synergy between reasoning and actions.

This can be exemplify by the figure below, where a question from HotPotQA dataset is given (1), and each of the compared methods is required to answer the question. In the Chain-of-Thought (CoT) method (1b), the model is instructed to generate thoughts before answering the question directly, which in general, improves the decision-making capabilities of the model, but may lead to issues like fact hallucination, as in the example below. The Action-only model (1c) is guided to use given tools (in this example, a search engine tool), to generate its final answer based on the tool’s result — the observations. In this case, the action-only model falls short due to its inability to reason. Even though it has the same actions and observations as the ReAct model, it cannot effectively combine them into a coherent final answer. On the other hand, ReAct (1d) was able to answer the question correctly, while gathering information to help with reasoning and using that reasoning to decide what to look for next, showing a strong synergy between reasoning and actions.

In practice, to use ReAct prompting effectively, the following key components need to be defined:

  • LLM (Large Language Model): The core of the ReAct agent. It generates reasoning steps and actions for specific tasks. While the ReAct method can work with any LLM, it performs best with models that have a large enough capacity (in tokens) to hold both reasoning steps and observations, like GPT-4 or PaLM-540B (which was used in the ReAct paper). These larger models are better suited for maintaining complex reasoning and action plans.
  • Tools/Actions: The ReAct framework requires defining the set of actions to interact with the external world and gather relevant information. This interaction relies on specific tools, which can vary depending on the task. For instance, a search engine API might be used to retrieve external data, while math tools could handle complex mathematical computations. The choice of tools is tailored to the requirements of the specific task.
  • ReAct based prompting — The prompting strategy that guides the LLM in generating both free-form language reasoning traces and actions. In the ReAct paper, the authors randomly selected 6 and 3 examples from the HotpotQA and Fever training set and manually crafted them into ReAct-style trajectories to serve as few-shot exemplars in the prompts, as illustrated in the figure below.
figure caption: example of ReAct trajectory, answering a question from HotpotQA.
Results
  • The researchers carried out empirical evaluations of ReAct alongside state-of-the-art baselines across four distinct benchmarks: question answering (HotPotQA), fact verification (Fever), text-based gaming (ALFWorld), and web page navigation (WebShop). For both HotPotQA and Fever, where the model could interact with a Wikipedia API, ReAct demonstrated superior performance over standard action generation models while remaining competitive with the chain-of-thought (CoT) reasoning approach.
Figure caption: PaLM-540B prompting results on HotpotQA and Fever
  • The researchers found that the most effective approach combines both ReAct and Chain-of-Thought (CoT) prompting. Therefore, they propose to incorporate ReAct and CoT and let the model decide when to switch to the other method. This was done based on the heuristics when the ReAct fails to return an answer within given steps, it backs off to CoT, and when the internal knowledge might not support the task confidently, back off to ReAct.
  • In the ALFWorld and WebShop benchmarks, ReAct—using both one-shot and two-shot prompting—significantly outperforms imitation and reinforcement learning methods trained on approximately 105 task instances.
Figure caption: PaLM-540B prompting results on ALFWorld and WebShop.
Key Benefits of ReAct
  • Interpretability: The tight integration of reasoning and action in LLMs generates human-aligned task-solving trajectories, improving the interpretability of the model's output. This alignment is vital for building AI systems that promote greater user trust and enable a more intuitive understanding of their decisions.
  • Able to obtain the most up-to-date knowledge: As shown in the figure below, a Hotpot QA example with an outdated answer is provided to all baseline models. Only ReAct was able to retrieve the correct, up-to-date information by interacting with a real-world search engine and using its reasoning skills to generate the correct answer. The Standard models and CoT gave wrong answers with hallucinated information, and while the Action-only method had an access to a search engine as ReAct, it still failed, as it couldn’t effectively reason through how to use the retrieved information. ReAct was the only method able to combine web interaction with reasoning to retrieve the latest data and give an accurate answer.
Limitations of ReAct

Although ReAct provides notable benefits by integrating reasoning and actions within LLMs, implementing this paradigm can poses several challenges and limitations:

  • Latency: A key challenge lies in addressing the issue of latency during trajectory generation. As tasks grow in complexity and involving multiple steps, ReAct's performance may degrade. The interleaving of reasoning and action can result in lengthy decision-making trajectories, which not only impacts model performance but also increases computational costs in terms of time and token usage.
  • Sequential decision-making: ReAct performs well in tasks that follow a "plan as you go" sequential strategy**,** but it is less computationally efficient in cases ****where the solution could be achieved through parallel processing. Alternative approaches such as “plan and execute” accomplish tasks by first planning what to do, using an LLM to create a plan for solving the task with clear sub-tasks, then executing the sub-tasks accordingly.
  • Constraints of In-Context Learning: ReAct depends largely on few-shot in-context learning, where a limited set of examples guide the model's behavior. However, this method has limitations, especially for tasks that demand deeper reasoning or domain-specific knowledge that exceeds the scope of the provided prompt examples.
Conclusion

ReAct has played a crucial role in advancing research and development of AI systems. The strength of the this paradigm lies in its ability to integrate reasoning with decision-making capabilities within LLMs. This approach proves particularly valuable in real-world scenarios where models need to leverage external APIs and tools.

To support ReAct like capabilities, many LLM providers—such as Anthropic, Google, and OpenAI and others — offer tool-calling features. These features let users include information about available tools in their requests and allow the model to use those tools when giving responses.

Alongside advancements in function-calling capabilities, recent progress has also focused on enhancing the reasoning abilities of LLMs. A prominent example is OpenAI's o1 model, which employs reinforcement learning to iteratively refine its chain of thought, enabling more sophisticated reasoning and optimizing its decision-making processes.

In the next post, we will transition from theory to practice by providing a detailed walkthrough of ReAct’s code implementation, stay tuned!