PyData Boston 2025

LLMOps in Practice: Building Secure, Governed Pipelines for Large Language Models

2025-12-10 Watch

talk

Siddharth Shankar

LLM Cyber Security TensorFlow

As organizations move from prototyping LLMs to deploying them in production, the biggest challenges are no longer about model accuracy - they’re about trust, security, and control. How do we monitor model behavior, prevent prompt injection, track drift, and enforce governance across environments?

This talk presents a real-world view of how to design secure and governed LLM pipelines, grounded in open-source tooling and reproducible architectures. We’ll discuss how multi-environment setups (sandbox, runner, production) can isolate experimentation from deployment, how to detect drift and hallucination using observability metrics, and how to safeguard against prompt injection, data leakage, and bias propagation.

Attendees will gain insight into how tools like MLflow, Ray, and TensorFlow Data Validation can be combined for ** version tracking, monitoring, and auditability**, without turning your workflow into a black box. By the end of the session, you’ll walk away with a practical roadmap on what makes an LLMOps stack resilient: reproducibility by design, continuous evaluation, and responsible governance across the LLM lifecycle.

Surviving the Agentic Hype with Small Language Models

2025-12-10 Watch

talk

Serhii Sokolenko (Tower Dev)

AI/ML LLM Python

The AI landscape is abuzz with talk of "agentic intelligence" and "autonomous reasoning." But beneath the hype, a quieter revolution is underway: Small Language Models (SLMs) are starting to perform the core reasoning and orchestration tasks once thought to require massive LLMs. In this talk, we’ll demystify the current state of “AI agents,” show how compact models like Phi-2, xLAM 8B, and Nemotron-H 9B can plan, reason, and call tools effectively, and demonstrate how you can deploy them on consumer-grade hardware. Using Python and lightweight frameworks such as LangChain, we’ll show how anyone can quickly build and experiment with their own local agentic systems. Attendees will leave with a grounded understanding of agent architectures, SLM capabilities, and a roadmap for running useful agents without the GPU farm.

Evaluating AI Agents in production with Python

2025-12-10 Watch

talk

Susan Shu Chang

AI/ML LLM Python

This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.

Is Your LLM Evaluation Missing the Point?

2025-12-10 Watch

talk

Daina Bouquin

AI/ML LLM

Your LLM evaluation suite shows 93% accuracy. Then domain experts point out it's producing catastrophically wrong answers for real-world use cases. This talk explores the collaboration gap between AI engineers and domain experts that technical evaluation alone cannot bridge. Drawing from government, healthcare, and civic tech case studies, we'll examine why tools like PromptFoo, DeepEval, and RAGAS are necessary but insufficient and how structured collaboration with domain stakeholders reveals critical failures invisible to standard metrics. You'll leave with practical starting points for building cross-functional evaluation that catches problems before deployment.

One agent, one job, better AI

2025-12-10

talk

David Jones-Gilardi

AI/ML LLM

Building accurate AI workflows can get complicated fast. By explicitly defining and modularizing agent tasks, my AI flows have become more precise, consistent, and efficient—delivering improved outcomes consistently. But can we prove it? In this talk, I'll walk you through an agentic app built with Langflow, and show how giving agents narrower, well-defined tasks leads directly to more accurate, consistent results. We'll put that theory to the test using evals with Pytest and LangSmith, iterating across different agent setups, analyzing data, and tightening up the app. By the end, we'll have a clear, repeatable workflow that lets us have confidence in how future agent or LLM changes will affect outcomes, before we ever hit deploy.

Uncertainty-Guided AI Red Teaming: Efficient Vulnerability Discovery in LLMs

2025-12-10

talk

Zvi Topol

AI/ML LLM Python Cyber Security

AI red teaming is crucial for identifying security and safety vulnerabilities (e.g., jailbreaks, prompt injection, harmful content generation) of Large Language Models. However, manual and brute-force adversarial testing is resource-intensive and often inefficiently consumes time and compute resources exploring low-risk regions of the input space. This talk introduces a practical, Python-based methodology for accelerating red teaming using model uncertainty quantification (UQ).

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

2025-12-09 Watch

talk

AI/ML Data Management Dataflow LLM RAG

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.

Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare

2025-12-09

talk

Aman Bhandari , Lily Xu

AI/ML GenAI LLM

Informed Consent Forms (ICFs) are critical documents in clinical trials. They are the first, and often most crucial, touchpoint between a patient and a clinical trial study. Yet the process of developing them is laborious, high-stakes, and heavily regulated. Each form must be tailored to jurisdictional requirements and local ethics boards, reviewed by cross-functional teams, and written in plain language that patients can understand. Producing them at scale across countries and disease areas demands manual effort and creates major operational bottlenecks. We used a combination of traditional AI and large language models to autodraft the ICF across clinical trial types, across countries and across disease areas at scale. The build, test, iteration and deployment offers both technical and non technical lessons learned for generative AI applications for complex documents at scale and for meaningful impact.

Generative Programming with Mellea: from Agentic Soup to Robust Software

2025-12-08 Watch

talk

Jake Lorocco , Nathan Fulton

AI/ML LLM Python

Agentic frameworks make it easy to build and deploy compelling demos. But building robust systems that use LLMs is difficult because of inherent environmental non-determinism. Each user is different, each request is different; the very flexibility that makes LLMs feel magical in-the-small also makes agents difficult to wrangle in-the-large.

Developers who have built large agentic-like systems know the pain. Exceptional cases multiply, prompt libraries grow, instructions are co-mingled with user input. After a few iterations, an elegant agent evolves into a big ball of mud.

This hands-on tutorial introduces participants to Mellea, an open-source Python library for writing structured generative programs. Mellea puts the developer back in control by providing the building blocks needed to circumscribe, control, and mediate essential non-determinism.

Going multi-modal: How to leverage the lastest multi-modal LLMs and deep learning models on real world applications

2025-12-08

talk

Isaac Godfried

LLM

Multimodal deep learning models continue improving rapidly, but creating real-world applications that effectively leverage multiple data types remains challenging. This hands-on tutorial covers model selection, embedding storage, fine-tuning, and production deployment through two practical examples: a historical manuscript search system and flood forecasting with satellite imagery and time series data.

"Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop

2025-12-08

talk

Ian Stokes-Rees

Analytics API GenAI LLM Python

In this 90 minute tutorial we'll get anyone with some basic Python and Command Line skills up and running with their own 100% laptop based set of LLMs, and explain some successful patterns for leveraging LLMs in a data analysis environment. We'll also highlight pit-falls waiting to catch you out, and encourage you that your pre-GenAI analytics skills are still relevant today and likely will be for the foreseeable future by demonstrating the limits of LLMs for data analysis tasks.

Building LLM Agents Made Simple

2025-12-08 Watch

talk

Eric Ma

API GitHub LLM Python

Learn to build practical LLM agents using LlamaBot and Marimo notebooks. This hands-on tutorial teaches the most important lesson in agent development: start with workflows, not technology.

We'll build a complete back-office automation system through three agents: a receipt processor that extracts data from PDFs, an invoice writer that generates documents, and a coordinator that orchestrates both. This demonstrates the fundamental pattern for agent systems—map your boring workflows first, build focused agents for specific tasks, then compose them so agents can use other agents as tools.

By the end, you'll understand how to identify workflows worth automating, build agents with decision-making loops, compose agents into larger systems, and integrate them into your own work. You'll leave with working code and confidence to automate repetitive tasks.

Prerequisites: Intermediate Python, familiarity with APIs, basic LLM understanding. Participants should have Ollama and models installed beforehand (setup instructions provided).

Materials: GitHub repository with Marimo notebooks. Setup uses Pixi for dependency management.

Create your Health Research Agent

2025-12-08

talk

Leonardo Ferreira

AI/ML Docker Linux LLM Python

PubMed is a free search interface for biomedical literature, including citations and abstracts from many life science scientific journals. It is maintained by the National Library of Medicine at the NIH. Yet, most users only interact with it through simple keyword searches. In this hands-on tutorial, we will introduce PubMed as a data source for intelligent biomedical research assistants — and build a Health Research AI Agent using modern agentic AI frameworks such as LangChain, LangGraph, and Model Context Protocol (MCP) with minimum hardware requirements and no key tokens. To ensure compatibility, the agent will run in a Docker container which will host all necessary elements.

Participants will learn how to connect language models to structured biomedical knowledge, design context-aware queries, and containerize the entire system using Docker for maximum portability. By the end, attendees will have a working prototype that can read and reason over PubMed abstracts, summarize findings according to a semantic similarity engine, and assist with literature exploration — all running locally on modest hardware.

Expected Audience: Enthusiasts, researchers, and data scientists interested in AI agents, biomedical text mining, or practical LLM integration. Prior Knowledge: Python and Docker familiarity; no biomedical background required. Minimum Hardware Requirements: 8GB RAM (+16GB recommended), 30GB disk space, Docker pre-installed. MacOS, Windows, Linux. Key Takeaway: How to build a lightweight, reproducible research agent that combines open biomedical data with modern agentic AI frameworks.

Hands-On with LLM-Powered Recommenders: Hybrid Architectures for Next-Gen Personalization

2025-12-08 Watch

talk

Sheetal Borar , Astha Puri

LLM Data Streaming

Recommender systems power everything from e-commerce to media streaming, but most pipelines still rely on collaborative filtering or neural models that focus narrowly on user–item interactions. Large language models (LLMs), by contrast, excel at reasoning across unstructured text, contextual information, and explanations. This tutorial bridges the two worlds. Participants will build a hybrid recommender system that uses structured embeddings for retrieval and integrates an LLM layer for personalization and natural-language explanations. We’ll also discuss practical engineering constraints: scaling, latency, caching, distillation/quantization, and fairness. By the end, attendees will leave with a working hybrid recommender they can extend for their own data, along with a playbook for when and how to bring LLMs into recommender workflows responsibly.

talk-data.com

Top Topics

Top Speakers

LLMOps in Practice: Building Secure, Governed Pipelines for Large Language Models

Surviving the Agentic Hype with Small Language Models

Evaluating AI Agents in production with Python

Is Your LLM Evaluation Missing the Point?

One agent, one job, better AI

Uncertainty-Guided AI Red Teaming: Efficient Vulnerability Discovery in LLMs

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare

Generative Programming with Mellea: from Agentic Soup to Robust Software

Going multi-modal: How to leverage the lastest multi-modal LLMs and deep learning models on real world applications

"Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop

Building LLM Agents Made Simple

Create your Health Research Agent

Hands-On with LLM-Powered Recommenders: Hybrid Architectures for Next-Gen Personalization