In this talk, the speaker presents NuExtract, the first LLM specialized in extracting structured information (JSON output), and NuMarkdown, the first reasoning OCR LLM (RAG-ready Markdown output). The talk demonstrates low-hallucination open-source models that outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. It will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.
talk-data.com
Topic
information extraction
1
tagged
Activity Trend
1
peak/qtr
2020-Q1
2026-Q1
Top Speakers
Filtering by:
Etienne Bernard
×