Search – talk-data.com

Title & Speakers	Event
Event PyData Seattle 2025 2025-11-09
The Problem of Address Matching: a Journey through NLP and AI 2025-11-09 · 23:30 Ivan Perez Avellaneda The problem of address matching arrives when the address of one physical place is written in two or more different ways. This situation is very common in companies that receive records of customers from different sources. The differences can be classified as syntactic and semantic. In the first type, the meaning is the same but the way they are written is different. For example, one can find "Street" vs "St". In the second type, the meaning is not exactly the same. For example, one can find "Road" instead of "Street". To solve this problem and match addresses, we have a couple of approaches. The first and simple is by using similarity metrics. The second uses natural language and transformers. This is a hands-on talk and is intended for data process analyst. We are going to go through these solutions implemented in a Jupyter notebook using Python. AI/ML NLP Python
Subgraph Isomorphism at Scale with data science tools 2025-11-09 · 23:30 Esteban Ginez Traditional subgraph isomorphism algorithms like VF2 rely on sequential tree-search that can't leverage parallel computing. This talk introduces Δ-Motif, a data-centric approach that transforms graph matching into data operations using Python's data science stack. Δ-Motif decomposes graphs into small "motifs" to reconstruct matches. By representing graphs as tabular data with RAPIDS cuDF and Pandas, we achieve 10-595X speedups over VF2 without custom GPU kernels. I'll demonstrate practical applications from social networks to quantum computing, and show when GPU acceleration provides the biggest benefits for graph analysis problems. Perfect for data scientists working with network analysis, recommendation systems, or pattern matching at scale Data Science Pandas Python
Break 2025-11-09 · 23:00
Newcomer Sprint! 2025-11-09 · 21:30 Fangchen Li , Eloisa Elias T , Rachel Wagner-Kaiser , Joseph Holsten , C.A.M. Gerlach , Jake Stevens-Haas Looking to contribute to open source, but wasn’t sure where to start? Want to level up your skills in debugging, programming, collaboration and more? Curious about how to fix a bug or add a feature you’re missing in your favorite software project? Come to our special newcomer sprint to learn how and try it for yourself! Newcomers to Python or open source are welcome and encouraged, as well as attendees with open source experience to help guide them! Python
GPU Accelerated Python 2025-11-09 · 21:30 Andy Terrel Accelerating Python using the GPU is much easier than you might think. We will explore the powerful CUDA-enabled Python ecosystem in this tutorial through hands-on examples using some of the most popular accelerated scientific computing libraries. Python
LLMs, Chatbots, and Dashboards: Visualize and Analyze Your Data with Natural Language 2025-11-09 · 21:30 Daniel Chen LLMs have a lot of hype around them these days. Let’s demystify how they work and see how we can put them in context for data science use. As data scientists, we want to make sure our results are inspectable, reliable, reproducible, and replicable. We already have many tools to help us in this front. However, LLMs provide a new challenge; we may not always be given the same results back from a query. This means trying to work out areas where LLMs excel in, and use those behaviors in our data science artifacts. This talk will introduce you to LLMs, the Chatlas packages, and how they can be integrated into a Shiny to create an AI-powered dashboard (using querychat). We’ll see how we can leverage the tasks LLMs are good at to better our data science products. AI/ML Dashboard Data Science LLM
Lunch 2025-11-09 · 20:30
Building Bazel Packages for AI/ML: SciPy, PyTorch, and Beyond 2025-11-09 · 19:00 Ramesh Oswal , Jiten Oswal AI/ML workloads depend heavily on complex software stacks, including numerical computing libraries (SciPy, NumPy), deep learning frameworks (PyTorch, TensorFlow), and specialized toolchains (CUDA, cuDNN). However, integrating these dependencies into Bazel-based workflows remains challenging due to compatibility issues, dependency resolution, and performance optimization. This session explores the process of creating and maintaining Bazel packages for key AI/ML libraries, ensuring reproducibility, performance, and ease of use for researchers and engineers. AI/ML NumPy PyTorch SciPy TensorFlow
Going From Notebooks to Production Code 2025-11-09 · 19:00 Robert Masson , Catherine Nelson – author Do you need to move your code from notebooks into production? Or do you want to level up your software engineering skills? In this tutorial, we will show you how to turn a Jupyter notebook into a robust, reproducible Python script. You will learn how to use tools for converting notebooks into scripts, how to make your code modular, and how to write unit tests. Python
Building a Deep Research Agentic Workflow 2025-11-09 · 19:00 Ravi Kumar Yadav , nidhin pattaniyil OpenAI and Gemini's Deep Research offerings are a great way to get a detailed research report on a topic. In this beginner friendly tutorial, we’ll walk through building a simple lightweight agent workflow to perform deep research. LLM
How to make datamap web-apps of embedding vectors via open source tooling 2025-11-09 · 19:00 John Tigue Datamaps are ML-powered visualizations of high-dimensional data, and in this talk the data is collections of embedding vectors. Interactive datamaps run in-browser as web-apps, potentially without any code running on the web server. Datamap tech can be used to visualize, say, the entire collection of chunks in a RAG vector database. The best-of-breed tools of this new datamap technique are liberally licensed open source. This presentation is an introduction to building with those repos. The maths will be mentioned only in passing; the topic here is simply how-to with specific tools. Talk attendees will be learning about Python tools, which produce high-quality web UIs. DataMapPlot is the premiere tool for rendering a datamap as a web-app. Here is a live demo thereof: https://connoiter.com/datamap/cff30bc1-0576-44f0-a07c-60456e131b7b 00-25: Intro to datamaps 25-45: Pipeline architecture 45-55: demos touring such tools as UMAP, HDBSCAN, DataMapPlot, Toponomy, etc. 55-90: Group coding A Google account is required to log in to Google Colab, where participants can run the workshop notebooks. A Hugging Face API key (token) is needed to download Gemma models. AI/ML API Python RAG Vector DB
Break 2025-11-09 · 18:30
There's no place like home: using AI agents in Jupyter notebooks 2025-11-09 · 17:00 Sarah Kaiser This talk explores how AI agents integrated directly into Jupyter notebooks can help with every part of your data science work. We'll cover the latest notebook-focused agentic features in VS Code, demonstrating how they automate tedious tasks like environment management or graph styling, enhance your "scratch notebook" to sharable code, and more generally streamline data science workflows directly in notebooks. AI/ML Data Science
Building Intelligent DIY Robots: From Hardware to Vision Systems 2025-11-09 · 17:00 FTC 18225 High Definition In this talk, Ethan Lee, lead programmer of an FTC (FIRST Tech Challenge) high school robotics team, and Jake Poznanski, startup founder and software engineer, will show how software, hardware, and data converge to build intelligent robots. Ethan will discuss how FTC robots apply computer vision, including OpenCV and neural networks, to convert raw camera data into autonomous robot action. He will also examine the challenges of operating under strict computation constraints, such as latency, calibration, and synchronization. Jake will explore the process of creating a DIY robot, such as CAD design, electronics, and message passing.
Scaling Large-Scale Interactive Data Visualization with Accelerated Computing 2025-11-09 · 17:00 Allison Ding As datasets continue to grow in both size and complexity, CPU-based visualization pipelines often become bottlenecks, slowing down exploratory data analysis and interactive dashboards. In this session, we’ll demonstrate how GPU acceleration can transform Python-based interactive visualization workflows, delivering speedups of up to 50x with minimal code changes. Using libraries such as hvPlot, Datashader, cuxfilter, and Plotly Dash, we’ll walk through real-world examples of visualizing both tabular and unstructured data and demonstrate how RAPIDS, a suite of open-source GPU-accelerated data science libraries from NVIDIA, accelerates these workflows. Attendees will learn best practices for accelerating preprocessing, building scalable dashboards, and profiling pipelines to identify and resolve bottlenecks. Whether you are an experienced data scientist or developer, you’ll leave with practical techniques to instantly scale your interactive visualization workflows on GPUs. Data Science DataViz Plotly Python
Registration & Breakfast 2025-11-09 · 16:00
Conference Social 2025-11-09 · 01:30 Join your fellow conference attendees and local meetup members at Bellevue Brewing Company - Spring District Brewpub 12190 NE District Wy, Bellevue, WA 98005 https://maps.app.goo.gl/3HSM4WvPXSfVWS3f7
Beyond Just Prediction: Causal Thinking in Machine Learning 2025-11-09 · 00:05 Avik Basu Most ML models excel at prediction, answering questions like "Who will buy our product?" or "Which customers are likely to churn?". But when it comes to making actionable decisions, prediction alone can be misleading. Correlation does not imply causation, and business decisions require understanding causal relationships to drive the right outcomes. In this talk, we will explore how causal machine learning, specifically uplift modeling, can bridge the gap between prediction and decision making. Using a real-world use case, we will showcase how uplift modeling helps identify who will respond positively to interventions while avoiding those who they might deter. AI/ML	Video
Unlocking Parallel PyTorch Inference (and More!) with Python Free-Threading 2025-11-09 · 00:05 Trent Nelson From the speaker who got kicked off the stage after 54 minutes of his 45-minute PyParallel talk at PyData NYC 2013, comes a new talk foaming about the virtues of Python's new free-threaded support! Python PyTorch	Video
Diversity Panel: Data for All: Empowering Underrepresented Voices in Data Science and Analytics 2025-11-09 · 00:05 Oli Dinov , Anquida Adams , Micheleen Harris , Heejoon Ahn , Eloisa Elias T Data science has the power to shape industries and societies. This panel will focus on empowering underrepresented groups in data science through education, access to tools, and career opportunities. Panelists will share their journeys, discuss the importance of democratizing data skills, and explore how to make the field more accessible to diverse talent. Analytics Data Science

The Problem of Address Matching: a Journey through NLP and AI 2025-11-09 · 23:30

Ivan Perez Avellaneda

The problem of address matching arrives when the address of one physical place is written in two or more different ways. This situation is very common in companies that receive records of customers from different sources. The differences can be classified as syntactic and semantic. In the first type, the meaning is the same but the way they are written is different. For example, one can find "Street" vs "St". In the second type, the meaning is not exactly the same. For example, one can find "Road" instead of "Street". To solve this problem and match addresses, we have a couple of approaches. The first and simple is by using similarity metrics. The second uses natural language and transformers. This is a hands-on talk and is intended for data process analyst. We are going to go through these solutions implemented in a Jupyter notebook using Python.

AI/ML NLP Python

Subgraph Isomorphism at Scale with data science tools 2025-11-09 · 23:30

Esteban Ginez

Traditional subgraph isomorphism algorithms like VF2 rely on sequential tree-search that can't leverage parallel computing. This talk introduces Δ-Motif, a data-centric approach that transforms graph matching into data operations using Python's data science stack. Δ-Motif decomposes graphs into small "motifs" to reconstruct matches. By representing graphs as tabular data with RAPIDS cuDF and Pandas, we achieve 10-595X speedups over VF2 without custom GPU kernels. I'll demonstrate practical applications from social networks to quantum computing, and show when GPU acceleration provides the biggest benefits for graph analysis problems. Perfect for data scientists working with network analysis, recommendation systems, or pattern matching at scale

Data Science Pandas Python

Break 2025-11-09 · 23:00

Newcomer Sprint! 2025-11-09 · 21:30

Fangchen Li , Eloisa Elias T , Rachel Wagner-Kaiser , Joseph Holsten , C.A.M. Gerlach , Jake Stevens-Haas

Looking to contribute to open source, but wasn’t sure where to start? Want to level up your skills in debugging, programming, collaboration and more? Curious about how to fix a bug or add a feature you’re missing in your favorite software project? Come to our special newcomer sprint to learn how and try it for yourself! Newcomers to Python or open source are welcome and encouraged, as well as attendees with open source experience to help guide them!

Python

GPU Accelerated Python 2025-11-09 · 21:30

Andy Terrel

Accelerating Python using the GPU is much easier than you might think. We will explore the powerful CUDA-enabled Python ecosystem in this tutorial through hands-on examples using some of the most popular accelerated scientific computing libraries.

Python

LLMs, Chatbots, and Dashboards: Visualize and Analyze Your Data with Natural Language 2025-11-09 · 21:30

Daniel Chen

LLMs have a lot of hype around them these days. Let’s demystify how they work and see how we can put them in context for data science use. As data scientists, we want to make sure our results are inspectable, reliable, reproducible, and replicable. We already have many tools to help us in this front. However, LLMs provide a new challenge; we may not always be given the same results back from a query. This means trying to work out areas where LLMs excel in, and use those behaviors in our data science artifacts. This talk will introduce you to LLMs, the Chatlas packages, and how they can be integrated into a Shiny to create an AI-powered dashboard (using querychat). We’ll see how we can leverage the tasks LLMs are good at to better our data science products.

AI/ML Dashboard Data Science LLM

Lunch 2025-11-09 · 20:30

Building Bazel Packages for AI/ML: SciPy, PyTorch, and Beyond 2025-11-09 · 19:00

Ramesh Oswal , Jiten Oswal

AI/ML workloads depend heavily on complex software stacks, including numerical computing libraries (SciPy, NumPy), deep learning frameworks (PyTorch, TensorFlow), and specialized toolchains (CUDA, cuDNN). However, integrating these dependencies into Bazel-based workflows remains challenging due to compatibility issues, dependency resolution, and performance optimization. This session explores the process of creating and maintaining Bazel packages for key AI/ML libraries, ensuring reproducibility, performance, and ease of use for researchers and engineers.

AI/ML NumPy PyTorch SciPy TensorFlow

Going From Notebooks to Production Code 2025-11-09 · 19:00

Robert Masson , Catherine Nelson – author

Do you need to move your code from notebooks into production? Or do you want to level up your software engineering skills? In this tutorial, we will show you how to turn a Jupyter notebook into a robust, reproducible Python script. You will learn how to use tools for converting notebooks into scripts, how to make your code modular, and how to write unit tests.

Python

Building a Deep Research Agentic Workflow 2025-11-09 · 19:00

Ravi Kumar Yadav , nidhin pattaniyil

OpenAI and Gemini's Deep Research offerings are a great way to get a detailed research report on a topic.

In this beginner friendly tutorial, we’ll walk through building a simple lightweight agent workflow to perform deep research.

LLM

How to make datamap web-apps of embedding vectors via open source tooling 2025-11-09 · 19:00

John Tigue

Datamaps are ML-powered visualizations of high-dimensional data, and in this talk the data is collections of embedding vectors. Interactive datamaps run in-browser as web-apps, potentially without any code running on the web server. Datamap tech can be used to visualize, say, the entire collection of chunks in a RAG vector database.

The best-of-breed tools of this new datamap technique are liberally licensed open source. This presentation is an introduction to building with those repos. The maths will be mentioned only in passing; the topic here is simply how-to with specific tools. Talk attendees will be learning about Python tools, which produce high-quality web UIs.

DataMapPlot is the premiere tool for rendering a datamap as a web-app. Here is a live demo thereof: https://connoiter.com/datamap/cff30bc1-0576-44f0-a07c-60456e131b7b

00-25: Intro to datamaps 25-45: Pipeline architecture 45-55: demos touring such tools as UMAP, HDBSCAN, DataMapPlot, Toponomy, etc. 55-90: Group coding

A Google account is required to log in to Google Colab, where participants can run the workshop notebooks. A Hugging Face API key (token) is needed to download Gemma models.

AI/ML API Python RAG Vector DB

Break 2025-11-09 · 18:30

There's no place like home: using AI agents in Jupyter notebooks 2025-11-09 · 17:00

Sarah Kaiser

This talk explores how AI agents integrated directly into Jupyter notebooks can help with every part of your data science work. We'll cover the latest notebook-focused agentic features in VS Code, demonstrating how they automate tedious tasks like environment management or graph styling, enhance your "scratch notebook" to sharable code, and more generally streamline data science workflows directly in notebooks.

AI/ML Data Science

Building Intelligent DIY Robots: From Hardware to Vision Systems 2025-11-09 · 17:00

FTC 18225 High Definition

In this talk, Ethan Lee, lead programmer of an FTC (FIRST Tech Challenge) high school robotics team, and Jake Poznanski, startup founder and software engineer, will show how software, hardware, and data converge to build intelligent robots. Ethan will discuss how FTC robots apply computer vision, including OpenCV and neural networks, to convert raw camera data into autonomous robot action. He will also examine the challenges of operating under strict computation constraints, such as latency, calibration, and synchronization. Jake will explore the process of creating a DIY robot, such as CAD design, electronics, and message passing.

Scaling Large-Scale Interactive Data Visualization with Accelerated Computing 2025-11-09 · 17:00

Allison Ding

As datasets continue to grow in both size and complexity, CPU-based visualization pipelines often become bottlenecks, slowing down exploratory data analysis and interactive dashboards. In this session, we’ll demonstrate how GPU acceleration can transform Python-based interactive visualization workflows, delivering speedups of up to 50x with minimal code changes. Using libraries such as hvPlot, Datashader, cuxfilter, and Plotly Dash, we’ll walk through real-world examples of visualizing both tabular and unstructured data and demonstrate how RAPIDS, a suite of open-source GPU-accelerated data science libraries from NVIDIA, accelerates these workflows. Attendees will learn best practices for accelerating preprocessing, building scalable dashboards, and profiling pipelines to identify and resolve bottlenecks. Whether you are an experienced data scientist or developer, you’ll leave with practical techniques to instantly scale your interactive visualization workflows on GPUs.

Data Science DataViz Plotly Python

Registration & Breakfast 2025-11-09 · 16:00

Conference Social 2025-11-09 · 01:30

Join your fellow conference attendees and local meetup members at Bellevue Brewing Company - Spring District Brewpub 12190 NE District Wy, Bellevue, WA 98005

https://maps.app.goo.gl/3HSM4WvPXSfVWS3f7

Beyond Just Prediction: Causal Thinking in Machine Learning 2025-11-09 · 00:05

Avik Basu

Most ML models excel at prediction, answering questions like "Who will buy our product?" or "Which customers are likely to churn?". But when it comes to making actionable decisions, prediction alone can be misleading. Correlation does not imply causation, and business decisions require understanding causal relationships to drive the right outcomes.

In this talk, we will explore how causal machine learning, specifically uplift modeling, can bridge the gap between prediction and decision making. Using a real-world use case, we will showcase how uplift modeling helps identify who will respond positively to interventions while avoiding those who they might deter.

AI/ML

Unlocking Parallel PyTorch Inference (and More!) with Python Free-Threading 2025-11-09 · 00:05

Trent Nelson

From the speaker who got kicked off the stage after 54 minutes of his 45-minute PyParallel talk at PyData NYC 2013, comes a new talk foaming about the virtues of Python's new free-threaded support!

Python PyTorch

Diversity Panel: Data for All: Empowering Underrepresented Voices in Data Science and Analytics 2025-11-09 · 00:05

Oli Dinov , Anquida Adams , Micheleen Harris , Heejoon Ahn , Eloisa Elias T

Data science has the power to shape industries and societies. This panel will focus on empowering underrepresented groups in data science through education, access to tools, and career opportunities. Panelists will share their journeys, discuss the importance of democratizing data skills, and explore how to make the field more accessible to diverse talent.

Analytics Data Science

talk-data.com

People (4 results)

Companies (1 result)

Activities & events