Sign up for our newsletter today and never miss a Neradot update
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
“ReAct: Synergizing Reasoning and Acting in language models" is a research paper that explores the use of large language models (LLMs) to generate both reasoning traces and task-specific actions — crucial elements for tackling complex tasks.
As most stand-alone LLMs depend exclusively on their internal knowledge to execute tasks, they are typically unable to access or retrieve information beyond what they have already been trained on. But just as humans combine their cognitive reasoning with external tools (such as search engines) to solve complex tasks, why not empower LLMs to operate similarly?
Recent advancements have broadened the scope of LLMs in handling downstream tasks. A notable approach, Chain-of-Thought (CoT) prompting, enhances the performance of LLMs by guiding them to produce reasoning traces and decompose complex problems into a series of intermediate steps, to facilitate more effective problem-solving. However, CoT prompting operates solely within the model’s internal representations, which means it may not be grounded in real-world data. As a result, the model lacks the ability to actively explore, reason about, or update its knowledge based on external information.
On the other hand, in Action-only prompting LLMs are guided to plan and execute actions in interactive environments — such as games, web navigation, and robotics. These models focus on mapping textual inputs from the task’s environment into text-based actions using the internal knowledge of the language model. However, they usually fall short in reasoning about high-level goals, limiting their ability to sustain coherent actions over extended or complex tasks.
Reason+Act (ReAct) paradigm allows language models to create both reasoning steps and actions in a back-and-forth manner. Actions interact with the outside world and give feedback from the environment, updating the model’s internal reasoning process by adding useful information, helping the model make better decisions and take more effective actions.
This can be exemplify by the figure below, where a question from HotPotQA dataset is given (1), and each of the compared methods is required to answer the question. In the Chain-of-Thought (CoT) method (1b), the model is instructed to generate thoughts before answering the question directly, which in general, improves the decision-making capabilities of the model, but may lead to issues like fact hallucination, as in the example below. The Action-only model (1c) is guided to use given tools (in this example, a search engine tool), to generate its final answer based on the tool’s result — the observations. In this case, the action-only model falls short due to its inability to reason. Even though it has the same actions and observations as the ReAct model, it cannot effectively combine them into a coherent final answer. On the other hand, ReAct (1d) was able to answer the question correctly, while gathering information to help with reasoning and using that reasoning to decide what to look for next, showing a strong synergy between reasoning and actions.
This can be exemplify by the figure below, where a question from HotPotQA dataset is given (1), and each of the compared methods is required to answer the question. In the Chain-of-Thought (CoT) method (1b), the model is instructed to generate thoughts before answering the question directly, which in general, improves the decision-making capabilities of the model, but may lead to issues like fact hallucination, as in the example below. The Action-only model (1c) is guided to use given tools (in this example, a search engine tool), to generate its final answer based on the tool’s result — the observations. In this case, the action-only model falls short due to its inability to reason. Even though it has the same actions and observations as the ReAct model, it cannot effectively combine them into a coherent final answer. On the other hand, ReAct (1d) was able to answer the question correctly, while gathering information to help with reasoning and using that reasoning to decide what to look for next, showing a strong synergy between reasoning and actions.
In practice, to use ReAct prompting effectively, the following key components need to be defined:
Although ReAct provides notable benefits by integrating reasoning and actions within LLMs, implementing this paradigm can poses several challenges and limitations:
ReAct has played a crucial role in advancing research and development of AI systems. The strength of the this paradigm lies in its ability to integrate reasoning with decision-making capabilities within LLMs. This approach proves particularly valuable in real-world scenarios where models need to leverage external APIs and tools.
To support ReAct like capabilities, many LLM providers—such as Anthropic, Google, and OpenAI and others — offer tool-calling features. These features let users include information about available tools in their requests and allow the model to use those tools when giving responses.
Alongside advancements in function-calling capabilities, recent progress has also focused on enhancing the reasoning abilities of LLMs. A prominent example is OpenAI's o1 model, which employs reinforcement learning to iteratively refine its chain of thought, enabling more sophisticated reasoning and optimizing its decision-making processes.
In the next post, we will transition from theory to practice by providing a detailed walkthrough of ReAct’s code implementation, stay tuned!