Accurately extracting information from documents has been a decades-old dream. Important workflows — from automated back-office processing to enterprise RAG — depend on it. LLMs promise to fulfill this dream but currently fall short: they hallucinate information, struggle with long documents, and break down on complex layouts. The solution: LLMs specialized in information extraction. In this talk, I will present: NuExtract — the first LLM specialized in extracting structured information (JSON output); NuMarkdown — the first reasoning OCR LLM (RAG-ready Markdown output). These low-hallucination open-source models outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. I will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.
talk-data.com
Speaker
Etienne Bernard
3
talks
AI/ML expert; co-founder & CEO of NuMind; speaker at 100+ events.
Bio from: Outclassing Frontier LLMs at Extracting Information
Filter by Event / Source
Talks & appearances
3 activities · Newest first
In this talk, the speaker presents NuExtract, the first LLM specialized in extracting structured information (JSON output), and NuMarkdown, the first reasoning OCR LLM (RAG-ready Markdown output). The talk demonstrates low-hallucination open-source models that outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. It will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.
Création de LLMs et d'extraction d'information et de documents