talk-data.com talk-data.com

G

Speaker

Giampaolo Casolla

1

talks

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →
From Manual to LLMs: Scaling Product Categorization

How to use LLMs to categorize hundreds of thousands of products into 1,000 categories at scale? Learn about our journey from manual/rule-based methods, via fine-tuned semantic models, to a robust multi-step process which uses embeddings and LLMs via the OpenAI APIs. This talk offers data scientists and AI practitioners learnings and best practices for putting such a complex LLM-based system into production. This includes prompt development, balancing cost vs. accuracy via model selection, testing mult-case vs. single-case prompts, and saving costs by using the OpenAI Batch API and a smart early-stopping approach. We also describe our automation and monitoring in a PySpark environment.