In this talk, the speaker presents NuExtract, the first LLM specialized in extracting structured information (JSON output), and NuMarkdown, the first reasoning OCR LLM (RAG-ready Markdown output). The talk demonstrates low-hallucination open-source models that outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. It will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.
talk-data.com
Topic
ocr
1
tagged
Activity Trend
2
peak/qtr
2020-Q1
2026-Q1
Top Events
Data Science Retreat Demo Day #37
1
[AI Alliance] Workshop: Hands-on with Docling
1
Building AI Agents with Multimodal Models: NVIDIA DLI Workshop for Academia
1
Data Science Retreat Demo Day #41
1
Building AI Agents with Multimodal Models: NVIDIA DLI Workshop for Academia
1
Outclassing Frontier LLMs at Extracting Information
1
[AI Alliance] Workshop: Hands-on with Docling
1
Top Speakers
Filtering by:
Outclassing Frontier LLMs at Extracting Information
×