spRAG: A Method for Managing Complex Queries
Traditional Retrieval-Augmented Generation (RAG) often faces difficulties when handling complex queries. This is due to the fact that the responses to such queries are typically dispersed across multiple chunks. Consequently, traditional RAG struggles to efficiently extract information.
Recently, I've seen a open-source method called spRAG, which is a RAG framework designed for unstructured data. In complex open-book question-answering tasks, such as those on FinanceBench, spRAG can accurately answer 83% of the time. In contrast, the regular RAG baseline only correctly answers 19% of the questions.
spRAG improves the performance of standard RAG system through two key techniques:
AutoContext
Relevant Segment Extraction (RSE)
We focus on how spRAG handles complex queries across chunks. It's worth noting that there are currently no papers on spRAG, only analysis combined with code.
AutoContext: Automatic Injection of Document-Level Context
In traditional RAG, documents are typically divided into fixed-length chunks for embedding. This simple method often overlooks document-level context information, leading to less accurate and comprehensive context embedding.
To address this issue, AutoContext was developed. Its key idea is to automatically incorporate document-level context information into each chunk before embedding each text block.
Specifically, it creates a document summary of 1-2 sentences and adds it, along with the filename, to the start of each chunk. As a result, each chunk is no longer isolated but carries the context information of the entire document. The code for getting document summary is shown below.
def get_document_context(auto_context_model: LLM, text: str, document_title: str, auto_context_guidance: str = ""):
# truncate the content if it's too long
max_content_tokens = 6000 # if this number changes, also update the truncation message above
text, num_tokens = truncate_content(text, max_content_tokens)
if num_tokens < max_content_tokens:
truncation_message = ""
else:
truncation_message = TRUNCATION_MESSAGE
# get document context
prompt = PROMPT.format(auto_context_guidance=auto_context_guidance, document=text, document_title=document_title, truncation_message=truncation_message)
chat_messages = [{"role": "user", "content": prompt}]
document_context = auto_context_model.make_llm_call(chat_messages)
return document_context
Relevant Segment Extraction: Intelligent combination of related text blocks
RSE is a post-processing step. Its objective is to intelligently identify and combine chunks that can provide the most relevant information, thereby forming longer segments.
Specifically, RSE first groups the retrieved chunks that are either content-similar or semantically related. Then, based on the query requirements, it intelligently selects and combines these blocks to form the best segments. The corresponding code is shown below.
def get_best_segments(all_relevance_values: list[list], document_splits: list[int], max_length: int, overall_max_length: int, minimum_value: float) -> list[tuple]:
"""
This function takes the chunk relevance values and then runs an optimization algorithm to find the best segments.
- all_relevance_values: a list of lists of relevance values for each chunk of a meta-document, with each outer list representing a query
- document_splits: a list of indices that represent the start of each document - best segments will not overlap with these
Returns
- best_segments: a list of tuples (start, end) that represent the indices of the best segments (the end index is non-inclusive) in the meta-document
"""
best_segments = []
total_length = 0
rv_index = 0
bad_rv_indices = []
while total_length < overall_max_length:
# cycle through the queries
if rv_index >= len(all_relevance_values):
rv_index = 0
# if none of the queries have any more valid segments, we're done
if len(bad_rv_indices) >= len(all_relevance_values):
break
# check if we've already determined that there are no more valid segments for this query - if so, skip it
if rv_index in bad_rv_indices:
rv_index += 1
continue
# find the best remaining segment for this query
relevance_values = all_relevance_values[rv_index] # get the relevance values for this query
best_segment = None
best_value = -1000
for start in range(len(relevance_values)):
# skip over negative value starting points
if relevance_values[start] < 0:
continue
for end in range(start+1, min(start+max_length+1, len(relevance_values)+1)):
# skip over negative value ending points
if relevance_values[end-1] < 0:
continue
# check if this segment overlaps with any of the best segments
if any(start < seg_end and end > seg_start for seg_start, seg_end in best_segments):
continue
# check if this segment overlaps with any of the document splits
if any(start < split and end > split for split in document_splits):
continue
# check if this segment would push us over the overall max length
if total_length + end - start > overall_max_length:
continue
segment_value = sum(relevance_values[start:end]) # define segment value as the sum of the relevance values of its chunks
if segment_value > best_value:
best_value = segment_value
best_segment = (start, end)
# if we didn't find a valid segment, mark this query as done
if best_segment is None or best_value < minimum_value:
bad_rv_indices.append(rv_index)
rv_index += 1
continue
# otherwise, add the segment to the list of best segments
best_segments.append(best_segment)
total_length += best_segment[1] - best_segment[0]
rv_index += 1
return best_segments
Conclusion
In summary, spRAG manages complex queries across chunks primarily through the following two methods:
Injecting document summarization information by the AutoContext module.
Intelligent assembly of related chunks from the RSE module.
It provides an improved idea for traditional RAG.