PyData Boston 2025

How AI Is Transforming Data Careers — A Panel Discussion

2025-12-10 Watch

talk

Chuxin Liu , Gayathri Ramanathan

AI/ML Data Science

AI is transforming data careers. Roles once centered on modeling and feature engineering are evolving into positions that involve building AI products, crafting prompts, and managing workflows shaped by automation and augmentation. In this panel discussion, ambassadors from Women in Data Science (WiDS) share how they have adapted through this shift—turning personal experiments into company practices, navigating uncertainty, and redefining their professional identities. They’ll also discuss how to future-proof your career by integrating AI into your daily work and career growth strategy. Attendees will leave with a clearer view of how AI is reshaping data careers and practical ideas for how to evolve their own skills, direction, and confidence in an era where AI is not replacing, but redefining, human expertise.

LLMOps in Practice: Building Secure, Governed Pipelines for Large Language Models

2025-12-10 Watch

talk

Siddharth Shankar

LLM Cyber Security TensorFlow

As organizations move from prototyping LLMs to deploying them in production, the biggest challenges are no longer about model accuracy - they’re about trust, security, and control. How do we monitor model behavior, prevent prompt injection, track drift, and enforce governance across environments?

This talk presents a real-world view of how to design secure and governed LLM pipelines, grounded in open-source tooling and reproducible architectures. We’ll discuss how multi-environment setups (sandbox, runner, production) can isolate experimentation from deployment, how to detect drift and hallucination using observability metrics, and how to safeguard against prompt injection, data leakage, and bias propagation.

Attendees will gain insight into how tools like MLflow, Ray, and TensorFlow Data Validation can be combined for ** version tracking, monitoring, and auditability**, without turning your workflow into a black box. By the end of the session, you’ll walk away with a practical roadmap on what makes an LLMOps stack resilient: reproducibility by design, continuous evaluation, and responsible governance across the LLM lifecycle.

Measuring Media Impact: Practical Geo-Lift Incrementality Testing

2025-12-10 Watch

talk

Bryce Casavant

Marketing

Measuring the true incremental impact of media spend remains one of the toughest problems in marketing, especially in an era where privacy limits user-level tracking. This talk examines how geo-lift incrementality testing can be utilized to accurately measure the true causal impact of marketing and media channels. Attendees will learn what design decisions matter, how to analyze results, and common pitfalls to avoid when running marketing incrementality tests. The goal is to bring causal inference theory into real-world measurement, enabling practitioners to make informed, data-driven decisions with confidence.

The JupyterLab Extension Ecosystem: Trends & Signals from PyPI and GitHub

2025-12-10 Watch

talk

Konstantin Taletskiy

AI/ML BigQuery GitHub

What does the JupyterLab extension ecosystem actually look like in 2025? While extensions drive much of JupyterLab's practical value, their overall landscape remains largely unexplored. This talk analyzes public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health: monthly downloads by category, release recency, star-download relationships, and the rise of AI-focused extensions. I will present my approach for building this analysis pipeline and offer lessons learned. Finally, I will demonstrate of an open, read-only web catalog built on this data set.

MMM Open- Source Showdown: A Practitioner's Benchmark of PyMC-Marketing vs. Google Meridian

2025-12-10 Watch

talk

Luca

Marketing MMM

Your Marketing Mix Model is only as good as the library you build it on. But how do you choose between PyMC-Marketing and Google Meridian when the feature lists look so similar? You need hard evidence, not marketing claims. Which library is actually faster on multi-geo data? Do their different statistical approaches (splines vs. Fourier series) lead to different budget decisions?

This talk delivers that evidence. We present a rigorous, open-source benchmark that stress-tests both libraries on the metrics that matter in production. Using a synthetic dataset that replicates real-world ad spend patterns, we measure:

Speed: Effective sample size per second (ESS/s) across different data scales.
Accuracy: How well each model recovers both sales figures and true channel contributions.
Reliability: A deep dive into convergence diagnostics and residual analysis.
Resources: The real memory cost of fitting these models.

You'll walk away from this session with a clear, data-driven verdict, ready to choose the right tool and defend that choice to your team.

No Cloud? No Problem. Local RAG with Embedding Gemma

2025-12-10 Watch

talk

Sanjit Paliwal

API Cloud Computing RAG

Running Retrieval-Augmented Generation (RAG) pipelines often feels tied to expensive cloud APIs or large GPU clusters—but it doesn’t have to be. This session explores how Embedding Gemma, Google’s lightweight open embedding model, enables powerful RAG and text classification workflows entirely on a local machine. Using the Sentence Transformers framework with Hugging Face, high-quality embeddings can be generated efficiently for retrieval and classification tasks. Real-world examples involving call transcripts and agent remark classification illustrate how robust results can be achieved without the cloud—or the budget.

Surviving the Agentic Hype with Small Language Models

2025-12-10 Watch

talk

Serhii Sokolenko (Tower Dev)

AI/ML LLM Python

The AI landscape is abuzz with talk of "agentic intelligence" and "autonomous reasoning." But beneath the hype, a quieter revolution is underway: Small Language Models (SLMs) are starting to perform the core reasoning and orchestration tasks once thought to require massive LLMs. In this talk, we’ll demystify the current state of “AI agents,” show how compact models like Phi-2, xLAM 8B, and Nemotron-H 9B can plan, reason, and call tools effectively, and demonstrate how you can deploy them on consumer-grade hardware. Using Python and lightweight frameworks such as LangChain, we’ll show how anyone can quickly build and experiment with their own local agentic systems. Attendees will leave with a grounded understanding of agent architectures, SLM capabilities, and a roadmap for running useful agents without the GPU farm.

Break

2025-12-10

talk

Data engineering with Python the right way: introducing the composable, Python-native data stack

2025-12-10

talk

Deepyaman Datta

API Data Engineering dbt Modern Data Stack Python SQL

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges open-source libraries like Kedro, Pandera, and the Boring Semantic Layer and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

Evaluating AI Agents in production with Python

2025-12-10 Watch

talk

Susan Shu Chang

AI/ML LLM Python

This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.

Processing large JSON files without running out of memory

2025-12-10 Watch

talk

Itamar Turner-Trauring

JSON Python

If you need to process a large JSON file in Python, it’s very easy to run out of memory while loading the data, leading to a super-slow run time or out-of-memory crashes. In this talk you'll learn:

How to measure memory usage.
Why loading JSON takes a lot of memory.
Four different ways to reduce memory usage when loading large JSON files.

Unlocking Smarter Typeahead Search: A Hybrid Framework for Large-Scale Query Suggestions

2025-12-10

talk

Brandon (Anbang) Wu

We present a hybrid framework for typeahead search that combines prefix matching with semantic retrieval using open-source tools. Applied at Quizlet, it indexed 200 million terms and improved coverage, boosted relevance, and lifted suggestion engagement by up to 37 percent—offering a reusable approach for building scalable, robust query suggestions.

Is Your LLM Evaluation Missing the Point?

2025-12-10 Watch

talk

Daina Bouquin

AI/ML LLM

Your LLM evaluation suite shows 93% accuracy. Then domain experts point out it's producing catastrophically wrong answers for real-world use cases. This talk explores the collaboration gap between AI engineers and domain experts that technical evaluation alone cannot bridge. Drawing from government, healthcare, and civic tech case studies, we'll examine why tools like PromptFoo, DeepEval, and RAGAS are necessary but insufficient and how structured collaboration with domain stakeholders reveals critical failures invisible to standard metrics. You'll leave with practical starting points for building cross-functional evaluation that catches problems before deployment.

Tracking Policy Evolution Through Clustering: A New Approach to Temporal Pattern Analysis in Multi-Dimensional Data

2025-12-10

talk

Sarthak Pattnaik

Matplotlib Pandas Python Scikit-learn

Analyzing how patterns evolve over time in multi-dimensional datasets is challenging—traditional time-series methods often struggle with interpretability when comparing multiple entities across different scales. This talk introduces a clustering-based framework that transforms continuous data into categorical trajectories, enabling intuitive visualization and comparison of temporal patterns.What & Why: The method combines quartile-based categorization with modified Hamming distance to create interpretable "trajectory fingerprints" for entities over time. This approach is particularly valuable for policy analysis, economic comparisons, and any domain requiring longitudinal pattern recognition.Who: Data scientists and analysts working with temporal datasets, policy researchers, and anyone interested in comparative analysis across entities with different scales or distributions.Type: Technical presentation with practical implementation examples using Python (pandas, scikit-learn, matplotlib). Moderate mathematical content balanced with intuitive visualizations.Takeaway: Attendees will learn a novel approach to temporal pattern analysis that bridges the gap between complex statistical methods and accessible, policy-relevant insights. You'll see practical implementations analyzing 60+ years of fiscal policy data across 8 countries, with code available for adaptation to your own datasets.

When Rivers Speak: Analyzing Massive Water Quality Datasets using USGS API and Remote SSH in Positron

2025-12-10 Watch

talk

Rodrigo Silva Ferreira

Analytics API Data Engineering DuckDB HTML Parquet

Rivers have long been storytellers of human history. From the Nile to the Yangtze, they have shaped trade, migration, settlement, and the rise of civilizations. They reveal the traces of human ambition... and the costs of it. Today, from the Charles to the Golden Gate, US rivers continue to tell stories, especially through data.

Over the past decades, extensive water quality monitoring efforts have generated vast public datasets: millions of measurements of pH, dissolved oxygen, temperature, and conductivity collected across the country. These records are more than environmental snapshots; they are archives of political priorities, regulatory choices, and ecological disruptions. Ultimately, they are evidence of how societies interact with their environments, often unevenly.

In this talk, I’ll explore how Python and modern data workflows can help us "listen" to these stories at scale. Using the United States Geological Survey (USGS) Water Data APIs and Remote SSH in Positron, I’ll process terabytes of sensor data spanning several years and regions. I’ll demonstrate that, while Parquet and DuckDB enable scalable exploration of historical records, using Remote SSH is paramount in order to enable large-scale data analysis. By doing so, I hope to answer some analytical questions that can surface patterns linked to industrial growth, regulatory shifts, and climate change.

By treating rivers as both ecological systems and social mirrors, we can begin to see how environmental data encodes histories of inequality, resilience, and transformation.

Whether your interest lies in data engineering, environmental analytics, or the human dimensions of climate and infrastructure, this talk will explore topics at the intersection of environmental science, will offer both technical methods and sociological lenses to understand the stories rivers continue to tell.

Lunch

2025-12-10

talk

Fun With Python and Emoji: What Might Adding Pictures to Text Programming Languages Look Like?

2025-12-10

talk

Ted Conway

Python SQL

We all mix pictures, emojis and text freely in our communications. So, why not in our code? This session takes a whimsical look at what mixing emoji with Python and SQL might look like (spoiler alert: a lot like those "rebus" stories in Highlights Magazine for Kids!). We'll discuss the benefits of doing so, challenges that emoji present, and demo a rudimentary Python preprocessor that intercepts Python and SQL code containing emojis submitted from Jupyter notebooks and translates it back into text-only code using an emoji-to-text dictionary before passing it on to Python for execution. This session is intended for all levels of programmers.

Modeling Aesthetic Identity: Building a Digital Twin from Instagram Likes & Visual Preferences

2025-12-10 Watch

talk

Pranav Kompally

People's visual and brand preferences encode a rich signal of identity that goes beyond clicks or text. In this talk, I present a pipeline for modeling a user’s “aesthetic identity” using Instagram likes, liked visuals, and followed brands. I show how to convert images and brand interactions into embedding spaces, condition a language model (via adapter / LoRA fine-tuning) to emulate that user’s responses, and evaluate the fidelity of that “digital twin.” You’ll leave with a reproducible architecture for persona modeling from multimodal data, along with insights into pitfalls of overfitting, privacy, and drift.

One agent, one job, better AI

2025-12-10

talk

David Jones-Gilardi

AI/ML LLM

Building accurate AI workflows can get complicated fast. By explicitly defining and modularizing agent tasks, my AI flows have become more precise, consistent, and efficient—delivering improved outcomes consistently. But can we prove it? In this talk, I'll walk you through an agentic app built with Langflow, and show how giving agents narrower, well-defined tasks leads directly to more accurate, consistent results. We'll put that theory to the test using evals with Pytest and LangSmith, iterating across different agent setups, analyzing data, and tightening up the app. By the end, we'll have a clear, repeatable workflow that lets us have confidence in how future agent or LLM changes will affect outcomes, before we ever hit deploy.

Accelerating Geospatial Analysis with GPUs

2025-12-10 Watch

talk

Jaya Venkatesh , Naty Clementi , Jacob Tomlinson

AI/ML Cloud Computing Data Science

Geospatial analysis often relies on raster data, n‑dimensional arrays where each cell holds a spatial measurement. Many raster operations, such as computing indices, statistical analysis, and classification, are naturally parallelizable and ideal for GPU acceleration.

This talk demonstrates an end‑to‑end GPU‑accelerated semantic segmentation pipeline for classifying satellite imagery into multiple land cover types. Starting with cloud-hosted imagery, we will process data in chunks, compute features, train a machine learning model, and run large-scale predictions. This process is accelerated with the open-source RAPIDS ecosystem, including Xarray, cuML, and Dask, often requiring only minor changes to familiar data science workflows.

Attendees who work with raster data or other parallelizable, computationally intensive workflows will benefit most from this talk, which focuses on GPU acceleration techniques. While the talk draws from geospatial analysis, key geospatial concepts will be introduced for beginners. The methods demonstrated can be applied broadly across domains to accelerate large-scale data processing.

Applying Foundational Models for Time Series Anomaly Detection

2025-12-10 Watch

talk

Abhishek Murthy

AI/ML

The time series machine learning community has begun adopting foundational models for forecasting and anomaly detection. These models, such as TimeGPT, MOMENT, Morai, and Chronos, offer zero-shot learning and promise to accelerate the development of AI use cases.

In this talk, we'll explore two popular foundational models, TimeGPT and MOMENT, for Time Series Anomaly Detection (TSAD). We'll specifically focus on the Novelty Detection flavor of TSAD, where we only have access to nominal (normal) data and the goal is to detect deviations from this norm.

TimeGPT and MOMENT take fundamentally different approaches to novelty detection.

• TimeGPT uses a forecasting-based method, tracking observed data against its forecasted confidence intervals. An anomaly is flagged when an observation falls sufficiently outside these intervals.

• MOMENT, an open-source model, uses a reconstruction-based approach. The model first encodes nominal data, then characterizes the reconstruction errors. During inference, it compares the test data's reconstruction error to these characterized values to identify anomalies.

We'll detail these approaches using the UCR anomaly detection dataset. The talk will highlight potential pitfalls when using these models and compare them with traditional TSAD algorithms.

This talk is geared toward data scientists interested in the nuances of applying foundational models for TSAD. No prior knowledge of time series anomaly detection or foundational models is required.

Building Production RAG Systems for Health Care Domains : Clinical Decision

2025-12-10 Watch

talk

Nikunj Doshi , Shikhar Patel

RAG

Building on but moving far beyond the single-specialty focus of HandRAG, this session examines how Retrieval-Augmented Generation can be engineered to support clinical reasoning across multiple high stakes surgical areas, including orthopedic, cardiovascular, neurosurgical, and plastic surgery domains. Using a corpus of more than 7,800 clinical publications and cross specialty validation studies, the talk highlights practical methods for structuring heterogeneous medical data, optimizing vector retrieval with up to 35% latency gains, and designing prompts that preserve terminology accuracy across diverse subspecialties. Attendees will also learn a three-tier evaluation framework that improved critical-error detection by 2.4×, as well as deployment strategy such as automated literature refresh pipelines and cost-efficient architectures that reduced inference spending by 60% that enable RAG systems to operate reliably in real production healthcare settings.

fastplotlib: driving scientific discovery through data visualization

2025-12-10 Watch

talk

Kushal Kolar , Caitlin Lewis

API DataViz

Fast interactive visualization remains a considerable barrier in analysis pipelines for large neuronal datasets. Here, we present fastplotlib, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. Fastplotlib is built upon pygfx, which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as Vulkan for fast rendering of objects. Fastplotlib is non-blocking, allowing for interactivity with data after plot generation. Ultimately, fastplotlib is a general-purpose scientific plotting library that is useful for fast and live visualization and analysis of complex datasets.

Break

2025-12-10

talk

Rethinking Feature Importance: Evaluating SHAP and TreeSHAP for Tree-Based Machine Learning Models

2025-12-10 Watch

talk

Yunxin Gao

AI/ML

Tree-based machine learning models such as XGBoost, LightGBM, and CatBoost are widely used, but understanding their predictions remains challenging. SHAP (SHapley Additive exPlanations) provides feature attributions based on Shapley values, yet its assumptions — feature independence, additivity, and consistency — are often violated in practice, potentially producing misleading explanations. This talk critically examines SHAP’s limitations in tree-based models and introduces TreeSHAP, its specialized implementation for decision trees. Rather than presenting it as perfect, we evaluate its effectiveness, highlighting where it succeeds and where explanations remain limited. Attendees will gain a practical, critical understanding of SHAP and TreeSHAP, and strategies for interpreting tree-based models responsibly.

Target audience: Data scientists, ML engineers, and analysts familiar with tree-based models. Background: Basic understanding of feature importance and model interpretability.

talk-data.com

Top Topics

Top Speakers

How AI Is Transforming Data Careers — A Panel Discussion

LLMOps in Practice: Building Secure, Governed Pipelines for Large Language Models

Measuring Media Impact: Practical Geo-Lift Incrementality Testing

The JupyterLab Extension Ecosystem: Trends & Signals from PyPI and GitHub

MMM Open- Source Showdown: A Practitioner's Benchmark of PyMC-Marketing vs. Google Meridian

No Cloud? No Problem. Local RAG with Embedding Gemma

Surviving the Agentic Hype with Small Language Models

Break

Data engineering with Python the right way: introducing the composable, Python-native data stack

Evaluating AI Agents in production with Python

Processing large JSON files without running out of memory

Unlocking Smarter Typeahead Search: A Hybrid Framework for Large-Scale Query Suggestions

Is Your LLM Evaluation Missing the Point?

Tracking Policy Evolution Through Clustering: A New Approach to Temporal Pattern Analysis in Multi-Dimensional Data

When Rivers Speak: Analyzing Massive Water Quality Datasets using USGS API and Remote SSH in Positron

Lunch

Fun With Python and Emoji: What Might Adding Pictures to Text Programming Languages Look Like?

Modeling Aesthetic Identity: Building a Digital Twin from Instagram Likes & Visual Preferences

One agent, one job, better AI

Accelerating Geospatial Analysis with GPUs

Applying Foundational Models for Time Series Anomaly Detection

Building Production RAG Systems for Health Care Domains : Clinical Decision

fastplotlib: driving scientific discovery through data visualization

Break

Rethinking Feature Importance: Evaluating SHAP and TreeSHAP for Tree-Based Machine Learning Models