talk-data.com talk-data.com

Topic

ocr

7

tagged

Activity Trend

2 peak/qtr
2020-Q1 2026-Q1

Activities

7 activities · Newest first

In this talk, the speaker presents NuExtract, the first LLM specialized in extracting structured information (JSON output), and NuMarkdown, the first reasoning OCR LLM (RAG-ready Markdown output). The talk demonstrates low-hallucination open-source models that outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. It will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.