AI Exploration Journey
Subscribe
Sign in
Home
RAG
Document Parsing
AI Innovations and Insights
LLM Reasoning
LLMs
Data Structures
Archive
Leaderboard
About
PDF Parsing and Document Intelligence
Latest
Top
Discussions
NVIDIA Nemotron-Parse 1.1: A Lightweight PDF Parser That Actually Understands Layout — AI Innovations and Insights 93
If you’re the person in your company responsible for “feeding all the PDFs into a search or Q&A system,” you’ve probably been through the pain of…
Dec 3
2
1
HunyuanOCR: Unifying Multi-Stage OCR Pipelines into an End-to-End 1B VLM — AI Innovations and Insights 91
Real-world OCR isn’t just about reading crisp PDFs.
Nov 27
•
Florian
5
2
2
MonkeyOCR v1.5: Making Complex PDFs Parseable — AI Innovations and Insights 90
If you’ve ever worked with real scanned documents or PDFs, you’ve likely run into this mess: a table nested inside another table, split awkwardly across…
Nov 22
•
Florian
1
1
From Résumés(PDFs) to Clean Data: Layout-Aware Parsing with Tiny LLMs — AI Innovations and Insights 88
Have you ever wondered how to parse a résumé, or had to work with résumés on the job?
Nov 11
•
Florian
3
1
Making Documents Talk with Doc-Researcher: Document Parsing + Hybrid Retrieval for Multi-Agent Research — AI Innovations and Insights 86
Did you know that document parsing can be integrated with Deep Research?
Nov 4
•
Florian
2
1
Hybrid OCR-LLM: Not a Bigger Model, but a Smarter Pipeline — AI Innovations and Insights 84
Have you ever encountered documents like forms, certificates, or reports during document parsing?
Oct 30
•
Florian
3
2
DeepSeek-OCR: See Less, Remember More — AI Innovations and Insights 83
Document parsing can cut token usage without sacrificing OCR accuracy?
Oct 26
•
Florian
2
1
From Big Picture to Details: MinerU 2.5 Redefines Document Parsing — AI Innovations and Insights 77
In a previous post, I introduced MinerU, a popular open-source framework for document parsing built around a pipeline-based design (AI Innovations and…
Oct 5
•
Florian
1
2
Taming Chaotic Layouts: SFT + Layout-Centric RL for Document Understanding — AI Innovations and Insights 75
Complex layouts and reading order have always been among the trickiest parts of document understanding.
Sep 28
•
Florian
2
2
DianJin-OCR-R1: a Smarter OCR Pipeline for LVLMs That Think Before Response — AI Innovations and Insights 74
How do we turn a "story-telling" LVLM into an engineer that sees clearly and speaks accurately, and bring OCR hallucinations down to earth?AI…
Sep 21
•
Florian
5
3
Rethinking Scanned Document Parsing with Layout-Aware RL — AI Innovations and Insights 67
Welcome back, let's dive into Chapter 67 of this insightful series!AI Exploration Journey is a reader-supported publication.
Aug 22
•
Florian
2
1
PMA: Adaptive and Parallel Documents Understanding in Multi-Agent Systems — AI Innovations and Insights 65
Welcome back, let's dive into Chapter 65 of this insightful series!AI Exploration Journey is a reader-supported publication.
Aug 12
•
Florian
4
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts