talk-data.com talk-data.com

Topic

numarkdown

1

tagged

Activity Trend

2 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Outclassing Frontier LLMs at Extracting Information ×

Accurately extracting information from documents has been a decades-old dream. Important workflows — from automated back-office processing to enterprise RAG — depend on it. LLMs promise to fulfill this dream but currently fall short: they hallucinate information, struggle with long documents, and break down on complex layouts. The solution: LLMs specialized in information extraction. In this talk, I will present: NuExtract — the first LLM specialized in extracting structured information (JSON output); NuMarkdown — the first reasoning OCR LLM (RAG-ready Markdown output). These low-hallucination open-source models outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. I will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.