talk-data.com talk-data.com

Event

SciPy 2025

2025-07-07 – 2025-07-13 PyData

Activities tracked

142

Sessions & talks

Showing 26–50 of 142 · Newest first

Search within this event →

Break

2025-07-11
talk

SciPy Tools Plenary

2025-07-11
talk

Rubin Observatory: What will you discover when you’re always watching

2025-07-11
talk

After two decades of planning, Rubin Observatory is finally observing the sky. Built to image the entire southern hemisphere every few nights with a 3.2-gigapixel camera, Rubin will produce a time-lapse of the Universe, revealing moving asteroids, pulsing stars, supernovae, and rare transients that you only catch if you're always watching.

In this talk, I'll share the “first look” images from Rubin Observatory as well as what it took to get here: from scalable algorithms to infrastructure that moves data from a mountaintop in Chile to scientists around the world in seconds. I'll reflect on what we learned building the data management system in Python over the years, including stories of choices that impacted scalability, interfaces, and maintainability. Rubin Observatory is here. And it's for you.

Opening Notes

2025-07-11
talk

Registration and Breakfast

2025-07-11
talk

AI for Scientific Discovery

2025-07-11
talk

AI, particularly generative AI, is rapidly transforming the scientific landscape, offering unprecedented opportunities and novel challenges across all stages of research. This Birds of a Feather session aims to bring together researchers, developers, and practitioners to share experiences, discuss best practices, and explore the evolving role of AI in science.

Open-source science-specific Research Software Engineer Communities: benefits and lessons learned

2025-07-11
talk

Research software engineer (RSE) communities of practice specific to a given science are crucial social structures between developers, maintainers and users prompting naturally occurring peer mentoring opportunities, software improvements through collaborative contributions, and sharing of best practices and lessons learned from challenges specific to that science discipline. Members of such communities benefit from the vast resources and support available through other RSEs of their own scientific field, and the users of those software benefit from a more capable and user-friendly product.

While the US-RSE (us-rse.org) advocates for recognition of the overall RSE community, provides individual RSEs with a sense of belonging (e.g., inclusivity), and provides helpful resources, it lacks the science specific support possible in more focused communities of practice. This session features short scene-setting presentations, followed by an open panel discussion with leaders of science-specific communities of practice for RSEs (e.g., Python in Heliophysics Community (PyHC), PlanetaryPy, earthaccess, and Pangeo) on the benefits of and lessons learned from leading those groups in comparison to more general RSE communities. Example discussion topics include the benefits of science-specific RSE communities, development of science-specific software standards, encouraging psychological safety, and community creation and sustainability.

Reliable executable tutorials -- CI/CD challenges

2025-07-11
talk

This BoF aims to host discussion about best practices for maintaining executable tutorials that are reproducible and reliable. The BoF is intended to be a platform to collect tips and tricks of CI/CD practices, too. The moderators recently put together a repository that builds on their experiences of maintaining numerous tutorial repositories https://scientific-python.github.io/executable-tutorials/ that covers some of the use cases but we are well aware that there are still user scenarios and use cases that are not well covered.

The BoF is a complement for both the Teaching&Learning and Maintainers track, none of the talks in those tracks seem to focus on the technical challenges around tutorials.

Lightning Talks

2025-07-11
talk

Lightning talks are 5-minute talks on any topic of interest for the SciPy community. We encourage spontaneous and prepared talks from everyone, but we can’t guarantee spots. Sign ups are at the NumFOCUS booth during the conference.

Break

2025-07-11
talk

Accelerated DataFrames for all: Bringing GPU acceleration to pandas and Polars

2025-07-10
talk

In Python, data analytics users often prioritize convenience, flexibility, and familiarity over pure performance. The cuDF DataFrame library provides a pandas-like experience with from 10x up to 50x performance improvements, but subtle differences prevent it from being a true drop-in replacement for many users. This talk will showcase the evolution of this library to provide zero-code change experiences, first for pandas users and now for Polars. We will provide examples of this usage and a high level overview of how users can make use of these today. We will then delve into the details of how GPU acceleration is implemented differently in pandas and Polars, along with a deep dive into some of the different technical challenges encountered for each. This talk will have something for both data practitioners and library developers.

Enabling Innovative Analysis on Heterogeneous Clusters through HTCdaskgateway

2025-07-10
talk

High energy particle (HEP) physics research is going through fundamental changes as we move to collect larger amounts of data from the Large Hadron Collider (LHC). Analysis facilities and distributed computing, through HTCs, have come together to create the next pythonic generation of analysis by utilizing htcdaskgateway, a Dask gateway extension, allowing users to spawn workers compatible with both their analysis and heterogeneous clusters in line with authentication requirements. This is enabling physicists to engage with scientific python in ways they had not before because of domain specific C++ tools. An example of htcdaskgateway’s use is Fermilab’s Elastic Analysis Facility.

Getting all your snakes in a grid: collaborating and teaching with Python in Excel and the Anaconda Toolbox

2025-07-10
talk

Working with data in grids or spreadsheets is great for collaboration as there are many different tools to view and edit the files. Data science workflows often include packages like openpyxl to create, load, edit, and export spreadsheets that then are shared with others who can use other tools like Excel, Google Sheets, or IDEs to view them. The new Python in Excel feature as well as the Anaconda Toolbox add-in provides the tools to run Python directly in cells in a spreadsheet, making it easier for Pythonistas to access and collaborate on code. This talk will introduce how these features work, demo collaborating on Python code in a worksheet, and talk about some case studies where these tools have been used to teach and collaborate with Python.

The brave new world of slicing and dicing Xarray objects.

2025-07-10
talk

We illustrate the power and flexibility of a new extension point in Xarray's data model: "custom indexes" that allow Xarray users to neatly handle complex grids, and enables at least one new data model (vector data cubes). We present a whirlwind tour of specific examples to illustrate the power of this feature, and aim to stimulate experimentation during the sprints.

AI as a Detector: Lessons in Real Time Pulsar Discovery

2025-07-10
talk

The Universe isn't always so quiet: neutron stars, fast radio bursts, and potentially alien civilizations emit bursts of electromagnetic energy - radio transients - into the unknown. In some cases, these emissions, like with pulsars, are constant and periodic; but in others, like with fast radio bursts, they're short in duration and infrequent. Classical detection surveys typically rely on dedispersion techniques and human-crafted signal processing filters to remove noise and highlight a signal of interest. But what if we're missing something?

In this talk we will introduce a workflow to avoid classical processing all together. By feeding RF samples directly from the telescope's digitizers into GPU computing, we can train an AI model to serve as a detector -- not only enabling real time performance, but also making decisions directly on raw spectrogram data, eliminating the need for classical processing. We will demonstrate how each step of the pipeline works - from AI model training and data curation to real-time inferencing at scale. Our hope is that this new sensor processing architecture can simplify development, democratize science, and process increasingly large amounts of data in real time.

Keeping Python Fun: Using Robotics Competitions to Teach Data Analysis and Application Development

2025-07-10
talk

The Issaquah Robotics Society (IRS) has been teaching Python and data analysis to high school students since 2016. Our presentation will summarize what we’ve learned from nine years of combining Python, competitive robotics, and high school students with no prior programming experience. We’ll focus on the importance of keeping it fun, learning the tools, and how to provide useful feedback without making learning Python feel like just another class. We’ll also explain how Python helps us win robotics competitions.

VirtualiZarr and Icechunk: How to build a cloud-optimised datacube of archival files in 3 lines of xarray

2025-07-10
talk

The best way to distribute large scientific datasets is via the Cloud, in Cloud-Optimized formats. But often this data is stuck in archival pre-Cloud file formats such as netCDF.

VirtualiZarr makes it easy to create "Virtual" Zarr datacubes, allowing performant access to huge archival datasets as if it were in the Cloud-Optimized Zarr format, without duplicating any of the original data.

We will demonstrate using VirtualiZarr to generate references to archival files, combine them into one array datacube using xarray-like syntax, commit them to Icechunk, and read the data back with zarr-python v3.

Zamba: Computer vision for wildlife conservation

2025-07-10
talk

Camera traps are an essential tool for wildlife research. Zamba is an open source Python package that leverages machine learning and computer vision to automate time-intensive processing tasks for wildlife camera trap data. This talk will dive into Zamba's capabilities and key factors that influenced its design and development. Topics will include the importance of code-free custom model training, Zamba’s origins in an open machine learning competition, and the technical challenges of processing video data. Attendees will walk away with a better understanding of how machine learning and Python tools can support conservation efforts.

Noise-Resilient Quantum Computing with Python

2025-07-10
talk

Today’s quantum computers are far noisier than their classical counterparts. Unlike traditional computing errors, quantum noise is more complex, arising from decoherence, crosstalk, and gate imperfections that corrupt quantum states. Error mitigation has become a rapidly evolving field, offering ways to address these errors on existing devices. New techniques emerge regularly, requiring flexible tools for implementation and testing. This talk explores the challenges of mitigating noise and how researchers and engineers use Python to iterate quickly while maintaining reliable and reproducible workflows.

Reproducible Science Made Easy: Package Management with Pixi

2025-07-10
talk

Reproducibility is a major underpinning of the scientific method. In scientific computing, this also includes the ability to reproduce your dependencies. Yet, in 2025 this still remains a challenging topic.

Pixi is a modern package manager built on the Conda ecosystem. It integrates very well with all existing packages on conda-forge. Pixi makes package management reproducible, fast and painless – so that scientists can go back to coding instead of dealing with “dependency hell”. Pixi improves the mix Conda and PyPI package management by integrating with uv by astral.sh and streamlines automation with a cross-platform task runner. These features combined with a powerful lockfile make creating reproducible projects trivial.

This talk is for people who are interested in new, fast ways to set up their software (dev) environments on different systems – think your coworker's computer, CI, containers, and more.

Teaching Python with GPUs: Empowering educators to share knowledge that uses GPUs

2025-07-10
talk

In today’s world of ever-growing data and AI, learning about GPUs has become an essential part of software carpentry, professional development and the education curriculum. However, teaching with GPUs can be challenging, from resource accessibility to managing dependencies and varying knowledge levels.

During this talk we will address these issues by offering practical strategies to promote active learning with GPUs and share our experiences from running numerous Python conference tutorials that leveraged GPUs. Attendees will learn different options to how to provide GPU access, tailor content for different expertise levels, and simplify package management when possible.

If you are an educator, researcher, and/or developer who is interested in teaching or learning about GPU computing with Python, this talk will give you the confidence to teach topics that require GPU acceleration and quickly get your audience up and running.

Towards a more sustainable and reliable mybinder.org

2025-07-10
talk

mybinder.org has served millions of scientific python users for 8 years now! It is an experiment in running open source infrastructure as a public good. Sustainability challenges faced by open source software production are magnified here - we need people time to manage the infrastructure, pay for computational infrastructure required to run the service, operate it reliably by responding to outages in a timely fashion, and fight off abuse from malicious actors. This talk covers the lessons learnt over the years, and new community oriented experiments to better sustainability, functionality & reliability that we are trying out now.

Jupyter Book 2.0 – A Next-Generation tool for sharing for Computational Content

2025-07-10
talk

Jupyter Book allows researchers and educators to create books and knowledge bases that are reusable, reproducible, and interactive. Jupyter Book 2 has been rebuilt on a new document engine that prioritizes extensibility, machine readability and flexible deployment, allowing us to create and share interactive computational content in new ways. In this talk, we will introduce Jupyter Book 2.0, demonstrate its game changing features, and showcase real-world examples like The Turing Way, QuantEcon and Project Pythia. We'll conclude with a live demo, taking a folder of notebooks and markdown files and turning them into a deployable, feature-rich website.

Numba v2: Towards a SuperOptimizing Python Compiler

2025-07-10
talk

The rapidly evolving Python ecosystem presents increasing challenges for adapting code using traditional methods. Developers frequently need to rewrite applications to leverage new libraries, hardware architectures, and optimization techniques. To address this challenge, the Numba team is developing a superoptimizing compiler built on equality saturation-based term rewriting. This innovative approach enables domain experts to express and share optimizations without requiring extensive compiler expertise. This talk explores how Numba v2 enables sophisticated optimizations—from floating-point approximation and automatic GPU acceleration to energy-efficient multiplication for deep learning models—all through the familiar NumPy API. Join us to discover how Numba v2 is bringing superoptimization capabilities to the Python ecosystem.

Probing the Hidden World of Battery Chemistry With X-rays

2025-07-10
talk

This track highlights the fantastic scientific applications that the SciPy community creates with the tools we collectively make. Talk proposals to this track should be stories of how using the Scientific Python ecosystem the speakers were able to overcome challenges, create new collaborations, reduce the time to scientific insight, and share their results in ways not previously possible. Proposals should focus on novel applications and problems, and be of broad interest to the conference, but should not shy away from explaining the scientific nuances that make the story in the proposal exciting.