spRAG: A Method for Managing Complex Queries

Jun 06, 2024

Traditional Retrieval-Augmented Generation (RAG) often faces difficulties when handling complex queries. This is due to the fact that the responses to such queries are typically dispersed across multiple chunks. Consequently, traditional RAG struggles to efficiently extract information.

Recently, I've seen a open-source method called spRAG, which is a RAG framework designed for unstructured data. In complex open-book question-answering tasks, such as those on FinanceBench, spRAG can accurately answer 83% of the time. In contrast, the regular RAG baseline only correctly answers 19% of the questions.

spRAG improves the performance of standard RAG system through two key techniques:

AutoContext
Relevant Segment Extraction (RSE)

We focus on how spRAG handles complex queries across chunks. It's worth noting that there are currently no papers on spRAG, only analysis combined with code.

AutoContext: Automatic Injection of Document-Level Context

In traditional RAG, documents are typically divided into fixed-length chunks for embedding. This simple method often overlooks document-level context information, leading to less accurate and comprehensive context embedding.

To address this issue, AutoContext was developed. Its key idea is to automatically incorporate document-level context information into each chunk before embedding each text block.

Specifically, it creates a document summary of 1-2 sentences and adds it, along with the filename, to the start of each chunk. As a result, each chunk is no longer isolated but carries the context information of the entire document. The code for getting document summary is shown below.

def get_document_context(auto_context_model: LLM, text: str, document_title: str, auto_context_guidance: str = ""):
    # truncate the content if it's too long
    max_content_tokens = 6000 # if this number changes, also update the truncation message above
    text, num_tokens = truncate_content(text, max_content_tokens)
    if num_tokens < max_content_tokens:
        truncation_message = ""
    else:
        truncation_message = TRUNCATION_MESSAGE
    
    # get document context
    prompt = PROMPT.format(auto_context_guidance=auto_context_guidance, document=text, document_title=document_title, truncation_message=truncation_message)
    chat_messages = [{"role": "user", "content": prompt}]
    document_context = auto_context_model.make_llm_call(chat_messages)
    return document_context

Relevant Segment Extraction: Intelligent combination of related text blocks

RSE is a post-processing step. Its objective is to intelligently identify and combine chunks that can provide the most relevant information, thereby forming longer segments.

Specifically, RSE first groups the retrieved chunks that are either content-similar or semantically related. Then, based on the query requirements, it intelligently selects and combines these blocks to form the best segments. The corresponding code is shown below.

def get_best_segments(all_relevance_values: list[list], document_splits: list[int], max_length: int, overall_max_length: int, minimum_value: float) -> list[tuple]:
    """
    This function takes the chunk relevance values and then runs an optimization algorithm to find the best segments.

    - all_relevance_values: a list of lists of relevance values for each chunk of a meta-document, with each outer list representing a query
    - document_splits: a list of indices that represent the start of each document - best segments will not overlap with these

    Returns
    - best_segments: a list of tuples (start, end) that represent the indices of the best segments (the end index is non-inclusive) in the meta-document
    """
    best_segments = []
    total_length = 0
    rv_index = 0
    bad_rv_indices = []
    while total_length < overall_max_length:
        # cycle through the queries
        if rv_index >= len(all_relevance_values):
            rv_index = 0
        # if none of the queries have any more valid segments, we're done
        if len(bad_rv_indices) >= len(all_relevance_values):
            break        
        # check if we've already determined that there are no more valid segments for this query - if so, skip it
        if rv_index in bad_rv_indices:
            rv_index += 1
            continue
        
        # find the best remaining segment for this query
        relevance_values = all_relevance_values[rv_index] # get the relevance values for this query
        best_segment = None
        best_value = -1000
        for start in range(len(relevance_values)):
            # skip over negative value starting points
            if relevance_values[start] < 0:
                continue
            for end in range(start+1, min(start+max_length+1, len(relevance_values)+1)):
                # skip over negative value ending points
                if relevance_values[end-1] < 0:
                    continue
                # check if this segment overlaps with any of the best segments
                if any(start < seg_end and end > seg_start for seg_start, seg_end in best_segments):
                    continue
                # check if this segment overlaps with any of the document splits
                if any(start < split and end > split for split in document_splits):
                    continue
                # check if this segment would push us over the overall max length
                if total_length + end - start > overall_max_length:
                    continue
                segment_value = sum(relevance_values[start:end]) # define segment value as the sum of the relevance values of its chunks
                if segment_value > best_value:
                    best_value = segment_value
                    best_segment = (start, end)
        
        # if we didn't find a valid segment, mark this query as done
        if best_segment is None or best_value < minimum_value:
            bad_rv_indices.append(rv_index)
            rv_index += 1
            continue

        # otherwise, add the segment to the list of best segments
        best_segments.append(best_segment)
        total_length += best_segment[1] - best_segment[0]
        rv_index += 1
    
    return best_segments

Conclusion

In summary, spRAG manages complex queries across chunks primarily through the following two methods:

Injecting document summarization information by the AutoContext module.
Intelligent assembly of related chunks from the RSE module.

It provides an improved idea for traditional RAG.

AI Exploration Journey

Discussion about this post