talk-data.com talk-data.com

Nour El Mawass

Speaker

Nour El Mawass

2

talks

Filter by Event / Source

Talks & appearances

2 activities · Newest first

Search activities →
Documents Meet LLMs: Tales from the Trenches

Processing documents with LLMs comes with unexpected challenges: handling long inputs, enforcing structured outputs, catching hallucinations, and recovering from partial failures. In this talk, we’ll cover why large context windows are not a silver bullet, why chunking is deceptively hard and how to design input and output that allow for intelligent retrial. We'll also share practical prompting strategies, discuss OCR and parsing tools, compare different LLMs (and their cloud APIs) and highlight real-world insights from our experience developing production GenAI applications with multiple document processing scenarios.

Retrieval-augmented generation (RAG) has become a key application for large language models (LLMs), enhancing their responses with information from external databases. However, RAG systems are prone to errors, and their complexity has made evaluation a critical and challenging area. Various libraries (like RAGAS and TruLens) have introduced evaluation tools and metrics for RAGs, but these evaluations involve using one LLM to assess another, raising questions about their reliability. Our study examines the stability and usefulness of these evaluation methods across different datasets and domains, focusing on the effects of the choice of the evaluation LLM, query reformulation, and dataset characteristics on RAG performance. It also assesses the stability of the metrics on multiple runs of the evaluation and how metrics correlate with each other. The talk aims to guide users in selecting and interpreting LLM-based evaluations effectively.