talk-data.com talk-data.com

Event

Outclassing Frontier LLMs at Extracting Information

2025-12-22 – 2025-12-22 Meetup Visit website ↗

Activities tracked

1

Please register using the zoom link to get a reminder:

https://us02web.zoom.us/webinar/register/3317557643700/WN_t-UvP6PUQrugTmkVDzIcvA

Accurately extracting information from documents has been a decades-old dream. Important workflows — from automated back-office processing to enterprise RAG — depend on it. LLMs promise to fulfill this dream but currently fall short: they hallucinate information, struggle with long documents, and break down on complex layouts. The solution: LLMs specialized in information extraction. In this talk, I will present: - **NuExtract** — the first LLM specialized in extracting structured information (JSON output) - **NuMarkdown** — the first reasoning OCR LLM (RAG-ready Markdown output). **These low-hallucination [open-source] models outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller**, enabling private usage. I will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.

Agenda: (PST) 11:50 am - 11:55 am Arrival and socializing and Opening (PST) 11:55 am - 1:00 pm "Outclassing Frontier LLMs at Extracting Information" (PST) 1:00 pm - 1:10 pm Q&A

About Etienne Bernard

Co-founder & CEO - Company: NuMind - Etienne is an AI/ML expert\, co-founder & CEO of [NuMind](https://www.numind.ai) — a startup developing LLMs specialized in information extraction. Etienne holds a physics PhD (ENS+MIT)\, led the ML group of Wolfram Research\, and wrote [Introduction to Machine Learning](https://www.amazon.com/Introduction-Machine-Learning-Etienne-Bernard/dp/1579550487?ref=d6k_applink_bb_dls&dplnkId=d2b94865-0ad9-46fb-94ae-43d55b9c3f64&dplnkId=561af1be-731e-4c4a-ba3e-ec80d95ff29d). Additional key points: - Spoke at 100+ events such as - ML Prague (keynote): https://www.mlprague.com/prague2018/ - SXSW: https://schedule.sxsw.com/2016/events/event_PP54827 - Invited guest on France 24: https://www.youtube.com/watch?v=jnVFExf1nbk - Authored

Please register using the zoom link to get a reminder:

https://us02web.zoom.us/webinar/register/3317557643700/WN_t-UvP6PUQrugTmkVDzIcvA

Sessions & talks

Showing 1–1 of 1 · Newest first

Search within this event →

Outclassing Frontier LLMs at Extracting Information

2025-12-22
talk
Etienne Bernard (NuMind)

In this talk, the speaker presents NuExtract, the first LLM specialized in extracting structured information (JSON output), and NuMarkdown, the first reasoning OCR LLM (RAG-ready Markdown output). The talk demonstrates low-hallucination open-source models that outclass frontier LLMs like GPT-5 and Gemini 2.5 while being orders of magnitude smaller, enabling private usage. It will demonstrate the abilities of these LLMs, show how to use them at scale, and discuss what’s coming next in information extraction.