A RAG Solution for Multi-hop Question Answering

Aug 03, 2024

In open-domain question answering, multi-hop question answering is complex and challenging. It requires the system to integrate information from multiple documents for multi-step reasoning to provide accurate answers.

"Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach" proposes a new iterative RAG method to improve the accuracy and efficiency of multi-hop question answering.

Motivation

Multi-hop question answering faces two main challenges:

Context Overload: Due to the significant increase in information during multi-turn retrieval, traditional methods often overlook key details or introduce noise when generating answers, leading to decreased accuracy.
Over-Planning and Repetitive Planning: Current iterative RAG methods often lack mechanisms to record the retrieval path, resulting in unnecessary retrieval even when sufficient information is already available (over-planning). They also may repeatedly generate sub-questions that have already been addressed, wasting computational resources and reducing efficiency (repetitive planning).

Solution

A novel method called ReSP (Retrieve, Summarize, Plan), featuring a dual-function summarizer, is introduced. This summarizer compresses information for both the overarching question and the current sub-question in each retrieval round, effectively addressing context overload and optimizing the retrieval path to avoid over-planning and repetitive planning.

Figure 1: The ReSP framework consists of four modules: Reasoner, Retriever, Summarizer, and Generator. Source: https://arxiv.org/pdf/2407.13101.

The ReSP framework consists of four components:

Reasoner: Decides whether to exit the iteration and generate a response based on the current memory queues or generate a new sub-question for further iteration.
Retriever: Retrieves relevant documents from the corpus based on the sub-question provided by the reasoner.
Summarizer: Performs dual summarization on the retrieved documents, updating the global evidence memory and local pathway memory.
Generator: Generates the final answer based on the information in the memory queues.

By this means, we effectively compress and integrate information in each iteration, ensuring that the model does not miss critical content or introduce unnecessary noise when generating answers.

Dual-Function Summarizer

The Dual-Function Summarizer is a pivotal innovation within the ReSP (Retrieve, Summarize, Plan) framework, designed to tackle the challenges of context overload and repetitive planning in multi-hop question answering. This summarizer performs two key functions concurrently: summarizing global evidence and addressing local pathways.

Global Evidence Summarization: The summarizer creates a summary of corroborative information from the retrieved documents for the overarching question. This summary is stored in the global evidence memory, which helps the model determine whether sufficient information has been gathered to answer the main question, thereby avoiding over-planning.
Local Pathway Summarization: Simultaneously, the summarizer generates a response for the current sub-question based on the retrieved documents, storing this information in the local pathway memory. This ensures that each sub-question is addressed in detail, preventing repetitive planning by keeping track of previously retrieved sub-questions.

Summary-Enhanced Iterative RAG Process

Summary-Enhanced Iterative RAG Process involves several key steps, which is illustrated in Figure 1.

Initial Document Retrieval: Given a query Q and a document corpus D, the retriever identifies the top K documents relevant to Q.
Dual Summarization: These documents are processed by the dual-function summarizer, which:
- Global Evidence Memory: Summarizes information relevant to the overarching question.
- Local Pathway Memory: Generates responses for the current sub-question.
Memory Queue Update: The contents of the global evidence memory and local pathway memory are concatenated and used as contextual inputs for the reasoner.
Reasoner Decision: The reasoner evaluates whether the current information is sufficient to answer the overarching question. If sufficient, the iterative process stops, and the generator produces the final answer using the memory queues. If not, a new sub-question Q* is generated, ensuring it is distinct from previously retrieved sub-questions, and the process iterates.

This enhanced process ensures that critical information is retained and utilized effectively, mitigating the risks of over-planning and repetitive planning.

Evaluation

Experiments has been conducted on two multi-hop question-answering benchmark datasets: HotpotQA and 2WikiMultihopQA.

Figure 2: Performance comparison on HotpotQA and 2WikiMultihopQA. We report the token-level F1 score of answer strings. All methods utilize fine-tuning-free Llama3-8B-instruct for generation. Source: https://arxiv.org/pdf/2407.13101.

As shown in Figure 2, the results demonstrate that ReSP significantly outperforms existing methods on both datasets:

HotpotQA: ReSP improved the F1 score by 4.1 points compared to the state-of-the-art methods.
2WikiMultihopQA: ReSP improved the F1 score by 5.9 points compared to the state-of-the-art methods.

Figure 3: Impact of base model size on different modules. Source: https://arxiv.org/pdf/2407.13101.

Additionally, ReSP conducted several comparative experiments to examine the impact of model size on the performance of different modules and to verify the robustness of ReSP in handling context length. The findings indicate that using a larger base model can significantly enhance the performance of the generator module, while the reasoner and summarizer modules do not necessarily benefit from a larger model.

Conclusion

This article introduces a new iterative RAG method, ReSP, which effectively addresses context overload and repetitive planning issues in multi-hop question answering by incorporating a dual-function summarizer.

ReSP offers new ideas and methods for researchers in the field of multi-hop question answering.

We hope that after reading, you will have a deeper understanding of the ReSP method and gain inspiration for practical applications.

AI Exploration Journey

Discussion about this post