This article is the 20th in this thought-provoking series.
Today, we will explore four fascinating topics in AI, which are:
HtmlRAG: From Text Fragments to a Global View
AFLOW: A Master Prospector in the Desert of Workflow Search Space
ChunkRAG: Eagle-Eyed Reader for Precise Knowledge Extraction
MarkItDown: A Tool for Converting Files to Markdown
Exquisite Video Imagery:
HtmlRAG: From Text Fragments to a Global View
Open-source code: https://github.com/plageon/HtmlRAG
Vivid Description
HtmlRAG is like opening a book where you can see both the text and understand its chapters and layout, rather than seeing just disconnected words (traditional RAG).
Overview
Current RAG systems convert HTML to plain text before processing, losing valuable structural information.
Thus, an intuitive idea arises: Could using HTML format directly in RAG systems better preserve document information?
HtmlRAG leverages HTML format instead of plain text in RAG systems to preserve semantic and structural information.
Due to HTML's longer context length, HtmlRAG uses progressive trimming to shorten documents. The four steps shown in Figure 2 are as follows: HTML Cleaning, Block Tree Construction, Text-Embedding-Based Block Pruning, and Generative Fine-Grained Block Pruning.
Commentary
HtmlRAG uses HTML as a knowledge carrier in RAG systems, leveraging its structure and using pruning algorithms to optimize context length.
However, most documents are not stored in HTML format, requiring conversion tools that can slow down processing.
AFLOW: A Master Prospector in the Desert of Workflow Search Space
The code will be available at https://github.com/geekan/MetaGPT
Vivid Description
AFLOW is a master prospector who, in the vast desert (workflow search space), uses advanced tools (Monte Carlo Tree Search (MCTS) and operators) to continuously dig and find the deepest buried gold (efficient solutions).
Overview
When building agentic workflows, LLMs require significant human effort, which limits their scalability and generalizability.
AFLOW reformulates workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. It is an automated framework that efficiently explores this space using Monte Carlo Tree Search (MCTS), iteratively optimizing workflows through code modification, tree-structured experience, and execution feedback.
AFLOW's core concept is to model workflows as a sequence of interconnected LLM-invoking nodes, where nodes represent LLM operations, and edges define the logic, dependencies, and flow between these operations. Operators are combinations of node operations that define logical relationships and common task patterns between nodes.
As shown in Figure 4, AFLOW performs an MCTS-based search within a space defined by nodes with flexible prompt parameters, a given operator set, and code-represented edges.
Using a specialized MCTS variant for workflow optimization, AFLOW iteratively cycles through four steps: Soft Mixed Probability Selection, LLM-Based Expansion, Execution Evaluation, and Experience Backpropagation. This process continues until it reaches the maximum iterations or meets convergence criteria.
Commentary
AFLOW redefines workflow optimization as a search problem over code-represented workflows, showing innovation.
But I have the following concerns:
Although operators improve search efficiency, these operators require prior design and their applicability may be limited in complex or novel tasks.
AFLOW's search process terminates when certain conditions are met (such as no improvement for n rounds), but this may result in high-potential paths being missed.
ChunkRAG: Eagle-Eyed Reader for Precise Knowledge Extraction
Vivid Description
ChunkRAG is like a sharp-eyed reader who first breaks down long articles into small paragraphs, then applies expert judgment to pick out the most relevant passages, capturing all key points while avoiding irrelevant content.
Overview
Traditional RAG systems can produce inaccurate content by retrieving irrelevant information. Current document-level filtering fails to remove less relevant content within documents.
Consider a query asking "What is the capital of France?" Without proper filtering, the system might include unnecessary facts about other French cities, leading to incorrect or verbose responses (Figure 5, Left).
ChunkRAG introduces a novel LLM-based chunk filtering framework that enhances the precision and factuality of generated content through semantic chunking and relevance scoring.
ChunkRAG framework operates in three main stages: Semantic Chunking, Hybrid Retrieval and Filtering, and Controlled Response Generation.
Commentary
In summary, ChunkRAG integrates several advanced RAG technologies—semantic chunking, query rewriting, self-reflection, and hybrid retrieval strategies—to enhance performance.
In my view:
ChunkRAG’s semantic chunking method we discussed earlier has a limitation: it performs poorly with sentences that have weak semantic similarity but strong logical connections, particularly in complex structures.
In the future, the self-reflection mechanism is expected to become a crucial element in quality control for complex task content generation.
MarkItDown: A Tool for Converting Files to Markdown
Overview
MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc.). It supports PDF, Images, PowerPoint, Word, and more.
It has recently become very popular and trending.
As usual, let's take a look at how it converts PDFs and images to MarkDown:
For PDFs, it eventually arrives at class PdfConverter, which calls
pdfminer.high_level.extract_text(...)
.For images, it eventually arrives at class ImageConverter, which extracts meta data and can call an multimodal LLM to get a caption/description.
Commentary
The approach is quite straightforward, though the project seems to still be in development. Looking forward to its future development.
Finally, if you’re interested in the series, feel free to check out my other articles.