SciPy 2025

Scaling-up deep learning inference to large-scale bioimage data

2025-07-08

talk

Peter Sobolewski , Fernando Cervantes Sanchez

AI/ML Python

Artificial intelligence has been successfully applied to bioimage understanding and achieved significative results in the last decade. Advances in imaging technologies have also allowed the acquisition of higher resolution images. That has increased not only the magnification at what images are captured, but the size of the acquired images as well. This comprises a challenge for deep learning inference in large-scale images, since these methods are commonly used in relatively small regions rather than whole images. This workshop presents techniques to scale-up inference of deep learning models to large-scale image data with help of Dask for parallelization in Python.

Shiny for Python: Building Production-Ready Dashboards in Python

2025-07-08

talk

Daniel Chen

Python

Shiny is a framework for building web applications and data dashboards in Python. In this workshop, you will see how the basic building blocks of shiny can be extended to create your own scalable production-ready python applications.

In particular, this workshop covers:

Overview of the basic building blocks of a Shiny for Python application
How to refactor applications into shiny modules
How to write tests for your shiny application
Deploy and share your application

At the end of this course you will be able to:

Build a Shiny app in Python
Refactor your reactive logic into Shiny Modules
Identify when to write Shiny modules
Write unit tests and end-to-end tests for your shiny application
Deploy and share your application (for free!)

Create custom image visualization and analysis tools with napari

2025-07-08

talk

Peter Sobolewski , Tim Monko , Draga Doncila Pop

Python

With cameras in everything from microscopes to telescopes to satellites, scientists produce image data in countless formats, shapes, sizes, and dimensions. Python provides a rich ecosystem of libraries to make sense of them. napari is a Python library for multidimensional image visualization, but it does double duty as a standalone application that can be easily extended with GUI tools for analysis, visualization, and annotation. In this tutorial, we'll start with the basics of image visualization and analysis in Python, then show how to extend the napari user interface to make analysis workflows as easy as pushing a button, and finally show how to share these extensions as plugins, which can be easily installed by users and collaborators. If you work with images (particularly multidimensional images), and especially if you work with scientists who may not be comfortable with Python, this tutorial might be for you!

Create Your First Python Package: Make Your Python Code Easier to Share and Use

2025-07-08

talk

Leah Wasser , Inessa Pawson , Carol Willing , Tetsuo Koyama , Jeremiah Paige

GitHub Python

Python packaging can be overwhelming. However, a trusted, community-vetted workflow can make it easier. In this hands-on workshop, you’ll learn a tested approach developed by the pyOpenSci community and vetted by Python packaging maintainers. You’ll create an installable, maintainable, and citable package using a quickstart template. You’ll also receive step-by-step guidance on publishing to TestPyPI (and resources for conda-forge, and adding a DOI with Zenodo). If you can’t install software on your laptop, you can use GitHub Codespaces to participate in the workshop. Join us to package your Python code confidently and to access ongoing support in our community beyond the workshop.

Geospatial data visualisation in Python

2025-07-08

talk

Adam Symington

Data Science DataViz Matplotlib Plotly Python

The rapid expansion of the geospatial industry and accompanying increase in availability of geospatial data, presents unique opportunities and challenges in data science. As the need for skilled data scientists increases, the ability to manipulate and interpret this data becomes crucial. This workshop introduces the essentials of geospatial data manipulation and data visualisation, emphasizing hands-on techniques to transform, analyze and visualise diverse datasets effectively.

Throughout the workshop, attendees will explore the extensive ecosystem of geospatial Python libraries. Key tools include GeoPandas, Shapely and Cartopy for vector data, GDAL, Rasterio and rioxarray for raster data and participants will also learn to integrate these with popular plotting libraries such as Matplotlib, Bokeh, and Plotly for visualizations.

This tutorial will cover three primary topics: visualizing geospatial shapes, managing raster datasets, and synthesizing multiple data types into unified visual representations. Each section will incorporate data manipulation exercises to ensure attendees not only visualize but also deeply understand geospatial data.

Targeting both beginners and advanced practitioners, the workshop will employ real-world examples to guide participants through the necessary steps to produce striking and informative geospatial visualizations. By the end, attendees will be equipped with the knowledge to leverage advanced data science techniques in their geospatial projects, making them proficient in both the analysis and communication of spatial information.

Processing Cloud-optimized data in Python with Serverless Functions (Lithops, Dataplug)

2025-07-08

talk

Universitat Rovira i Virgili (Pedro Garcia Lopez) , Enrique Molina Giménez

Cloud Computing Cloud Storage Data Management GitHub Python

Cloud-optimized (CO) data formats are designed to efficiently store and access data directly from cloud storage without needing to download the entire dataset. These formats enable faster data retrieval, scalability, and cost-effectiveness by allowing users to fetch only the necessary subsets of data. They also allow for efficient parallel data processing using on-the-fly partitioning, which can considerably accelerate data management operations. In this sense, cloud-optimized data is a nice fit for data-parallel jobs using serverless. FaaS provides a data-driven scalable and cost-efficient experience, with practically no management burden. Each serverless function will read and process a small portion of the cloud-optimized dataset, being read in parallel directly from object storage, significantly increasing the speedup.

In this talk, you will learn how to process cloud-optimized data formats in Python using the Lithops toolkit. Lithops is a serverless data processing toolkit that is specially designed to process data from Cloud Object Storage using Serverless functions. We will also demonstrate the Dataplug library that enables Cloud Optimized data managament of scientific settings such as genomics, metabolomics, or geospatial data. We will show different data processing pipelines in the Cloud that demonstrate the benefits of cloud-optimized data management.

Show your work: Tutorial on building and hosting web applications

2025-07-08

talk

Archit Datar , Kedar Dabhadkar

DataViz Python

TL;DR Learn how to turn your Python functions into interactive web applications using open-source tools. By the end, each of us will have deployed a portfolio (or store) with multiple web applications and learned how to reproduce it easily later on.

Tell me more Work not shown is work lost. Many excellent scientists and engineers are not always adept at showcasing their work. This results in many interesting scientific ideas that have never been brought to light.

However, using today's tools, one no longer has to leave the Python ecosystem to create classy, complete prototypes using modern data visualization and web development tools. With over five years of experience building and presenting data solutions at huge science companies, we show it doesn't have to be challenging. We provide a walkthrough of the primary web application frameworks and showcase Fast Dash, an open-source Python library that we built to address specific prototyping needs.

This tutorial is designed for all data professionals who value the ability to quickly convert their scientific code into web applications. Participants will learn about the leading frameworks, their strengths and limitations, and a decision flowchart for picking the best one for a given task. We will go through some day-to-day applications and hands-on Python coding throughout the session. Whether you bring your use-cases and datasets, or pick from our suggestions, you'll have a reproducible portfolio (app store) of deployed web applications by the end!

Building machine learning pipelines that scale: a case study using Ibis and IbisML

2025-07-07

talk

Anjali Datta , Deepyaman Datta

AI/ML Analytics Data Engineering Pandas Python Scikit-learn

Pandas and scikit-learn have become staples in the machine learning toolkit for processing and modeling tabular data in Python. However, when data size scales up, these tools become slow or run out of memory. Ibis provides a unified, Pythonic, dataframe-like interface to 20+ execution backends, including dataframe libraries, databases, and analytics engines. Ibis enables users to leverage these powerful tools without rewriting their data engineering code (or learning SQL). IbisML extends the benefits of using Ibis to the ML workflow by letting users preprocess their data at scale on any Ibis-supported backend.

In this tutorial, you'll build an end-to-end machine learning project to predict the live win probability after each move during chess games.

Develop Pythonic spreadsheets running Python in and out of the grid

2025-07-07

talk

Sarah Kaiser , Jim Kitchen

Python

Spreadsheets are one of the most common ways to share and work with data which helpfully also works great in Python! In this tutorial, we will cover some of the basics and best pratice of consuming and producing spreadsheets in Python as well as a deep dive into how to run Python directly in your spreadsheets. We will introduce and dive deep into the new Python in Excel features as well as the Anaconda Toolbox for Excel add-in.

Introduction to Data Analysis Using Pandas

2025-07-07

talk

Stefanie Molin

Matplotlib Pandas Python Seaborn

Working with data can be challenging: it often doesn’t come in the best format for analysis, and understanding it well enough to extract insights requires both time and the skills to filter, aggregate, reshape, and visualize it. This session will equip you with the knowledge you need to effectively use pandas – a powerful library for data analysis in Python – to make this process easier.

Pandas makes it possible to work with tabular data and perform all parts of the analysis from collection and manipulation through aggregation and visualization. While most of this session focuses on pandas, during our discussion of visualization, we will also introduce at a high level Matplotlib (the library that pandas uses for its visualization features, which when used directly makes it possible to create custom layouts, add annotations, etc.) and Seaborn (another plotting library, which features additional plot types and the ability to visualize long-format data).

Reproducible Machine Learning Workflows for Scientists with pixi

2025-07-07

talk

John Kirkham , Ruben Arts , Matthew Feickert

AI/ML Linux Python PyTorch

Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like pixi, provide automatic reproducibility solutions for all dependencies while providing a high level interface well suited for researchers.

This tutorial will provide a practical introduction to using pixi to easily create scientific and AI/ML environments that benefit from hardware acceleration, across multiple machines and platforms. The focus will be on applications using the PyTorch and JAX Python machine learning libraries with CUDA enabled, as well as deploying these environments to production settings in Linux container images.

The-Silmaril: Practice #ontology engineering with Python (and other languages).

2025-07-07

talk

Shaurya Agarwal

Pandas PySpark Python SciPy

Ontologies provide a powerful way to structure knowledge, enable reasoning, and support more meaningful queries compared to traditional data models. Recently, interest in ontologies has resurged, driven by advancements in language models, reasoning capabilities, and the growing adoption of platforms like Palantir Foundry.

In this hands-on tutorial, participants will explore ontology development across multiple domains using a variety of Python-based tools such as rdflib, Owlready2, PySpark, Pandas, and SciPy. They will learn how ontologies facilitate semantic reasoning, improve data interoperability, and enhance query capabilities.
Additionally, attendees will build a rudimentary reasoning engine to better understand inference mechanisms.
The tutorial emphasizes practical applications and comparisons with conventional data representations, making it ideal for researchers, data engineers, and developers interested in knowledge representation and reasoning.

All the SQL a Pythonista needs to know: an introduction to SQL and DataFrames with DuckDB

2025-07-07

talk

Jacob Matson , Alex Monahan , Guen Prawiroatmodjo

Cloud Computing DuckDB HTML Pandas Polars Python

Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.

Building with LLMs Made Simple

2025-07-07

talk

Eric Ma

LLM Python

In this tutorial, you will learn how to integrate Large Language Models (LLMs) directly into Python programs as thoughtfully-designed core components of the program rather than bolt-on additions. This hands-on session teaches design principles and practical techniques for incorporating LLM outputs into program control flow. We will use LlamaBot, an open-source Python interface to LLMs, focusing on local execution with local and efficient models.

The Accelerated Python Developer's Toolbox

2025-07-07

talk

Katrina Riehl

Python

As general purpose GPU programming has risen in popularity, many Python programmers have expressed a need to use this technology in their libraries and applications. They soon realize that the GPU landscape is vast and sometimes difficult to traverse for Python users.

In this talk, I will demystify the CUDA-enabled Accelerated Python landscape, focusing on the advantages and disadvantages of popular libraries, the common performance issues encountered, and the best practices to getting the most out of your GPU. Topics include CuPy, numba, nvmath-python, cuDF, and cuML.

This talk is beginner-friendly, but even the most seasoned programmer will gain insight into the Python GPU computing landscape.

Thinking in arrays

2025-07-07

talk

Peter Fackeldey , Jim Pivarski

NumPy Python

Despite its reputation for being slow, Python is the leading language of scientific computing, which generally needs large-scale (fast) computations. This is because most scientific problems can be split into "metadata bookkeeping" and "number crunching," where the latter is performed by array-oriented (vectorized) calls into precompiled routines.

This tutorial is an introduction to array-oriented programming. We'll focus on techniques that are equally useful in any array library, with a particular focus on NumPy and JAX. You'll work in groups on four class projects: Conway's Game of Life using arrays, iterative computations on arrays, just-in-time (JIT) compilation for the Mandelbrot set, and exploring data in ragged arrays.

Vega-Altair: A Structured Way to Build Interactive Charts

2025-07-07

talk

Jon Mease , Dylan Wootton

API DataViz Python

This tutorial is an introduction to data visualization using the popular Vega-Altair Python library. Vega-Altair provides a simple and expressive API, enabling authors to rapidly create a wide range of interactive charts.

Participants will explore the fundamentals of effective chart design and gain hands-on experience building a variety of visualizations using Vega-Altair's declarative API. Furthermore, this tutorial will introduce users to advanced topics such as data transformations and interaction design. We will finish off by covering practical workflows such as integrating Vega-Altair into dashboarding systems, publishing visualizations, and creating reusable, themed charting libraries. By the end of the session, attendees will have the skills to leverage Vega-Altair for both rapid prototyping and production-ready visualizations in diverse environments

talk-data.com

Top Topics

Top Speakers

Scaling-up deep learning inference to large-scale bioimage data

Shiny for Python: Building Production-Ready Dashboards in Python

Create custom image visualization and analysis tools with napari

Create Your First Python Package: Make Your Python Code Easier to Share and Use

Geospatial data visualisation in Python

Processing Cloud-optimized data in Python with Serverless Functions (Lithops, Dataplug)

Show your work: Tutorial on building and hosting web applications

Building machine learning pipelines that scale: a case study using Ibis and IbisML

Develop Pythonic spreadsheets running Python in and out of the grid

Introduction to Data Analysis Using Pandas

Reproducible Machine Learning Workflows for Scientists with pixi

The-Silmaril: Practice #ontology engineering with Python (and other languages).

All the SQL a Pythonista needs to know: an introduction to SQL and DataFrames with DuckDB

Building with LLMs Made Simple

The Accelerated Python Developer's Toolbox

Thinking in arrays

Vega-Altair: A Structured Way to Build Interactive Charts