Giampaolo Casolla

Activities

1

talks

Filter by Event / Source

PyData Berlin 2025 1

Talks & appearances

1 activities · Newest first

Search activities →

From Manual to LLMs: Scaling Product Categorization

2025-09-02 · PyData Berlin 2025 Watch

talk

with Ansgar Grüne , Giampaolo Casolla

AI/ML API LLM PySpark

How to use LLMs to categorize hundreds of thousands of products into 1,000 categories at scale? Learn about our journey from manual/rule-based methods, via fine-tuned semantic models, to a robust multi-step process which uses embeddings and LLMs via the OpenAI APIs. This talk offers data scientists and AI practitioners learnings and best practices for putting such a complex LLM-based system into production. This includes prompt development, balancing cost vs. accuracy via model selection, testing mult-case vs. single-case prompts, and saving costs by using the OpenAI Batch API and a smart early-stopping approach. We also describe our automation and monitoring in a PySpark environment.