talk-data.com

Topic

Python

programming_language data_science web_development

Activities

1446

tagged

Activity Trend

185 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly Data Science Books 220 Data Engineering Podcast 183 O'Reilly Data Engineering Books 151 SciPy 2025 67 PyConDE & PyData Berlin 2023 49 Data + AI Summit 2025 30 Databricks DATA + AI Summit 2023 29 O'Reilly AI & ML Books 27 PyData Seattle 2025 23 PyData Paris 2025 20 O'Reilly Data Visualization Books 20 PyData London 2025 20

Top Speakers

Tobias Macey 183 Bryce Adelstein Lelbach (NVIDIA) 18 Conor Hoekstra 17 Harpreet Sahota (Voxel51) 15 Dan Gural (Voxel51) 14 Avery Smith 13 Kyle Polich 8 Al Martin (IBM) 7 Dr. Yasin Ceran (KAIST) 7 Luca Massaron 7 Julie Hoyer 6 Gleb Mezhanskiy (Datafold) 6

Activities

1446 activities · Newest first

All Video Podcast Book

Packaging a Scientific Python Project

2025-07-09 · SciPy 2025

talk

by Henry Fredrick Schreiner III

GitHub

One of the most important aspects of developing scientific software is distribution for others. The Scientific Python Development Guide was developed to provide up-to-date best practices for packaging, linting, and testing, along with a versatile template supporting multiple backends, and a WebAssembly-powered repo-review tool to check a repository directly in the guide. This talk, with the guide for reference, will cover key best practices for project setup, backend selection, packaging metadata, GitHub Actions for testing and deployment, tools for validating code quality. We will even cover tools for packaging compiled components that are simple enough for anyone to use.

Python is all you need: an overview of the composable, Python-native data stack

2025-07-09 · SciPy 2025

talk

by Deepyaman Datta

API Data Engineering dbt Modern Data Stack SQL

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges existing open-source libraries like Kedro and Pandera and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

Cubed: Scalable array processing with bounded-memory in Python

2025-07-09 · SciPy 2025

talk

by Tom White , Tom Nicholas

Cloud Computing NumPy

Cubed is a framework for distributed processing of large arrays without a cluster. Designed to respect memory constraints at all times, Cubed can express any NumPy-like array operation as a series of embarrassingly-parallel, bounded-memory steps. By using Zarr as persistent storage between steps, Cubed can run in a serverless fashion on both a local machine and on a range of Cloud platforms. After explaining Cubed’s model, we will show how Cubed has been integrated with Xarray and demonstrate its performance on various large array geoscience workloads.

CuPy: My Journey toward GPU-Accelerated Computing in Python

2025-07-09 · SciPy 2025

talk

by Leo Fang

NumPy SciPy

This talk walks all Pythonistas through recent CuPy feature development. Join me and hear my story on how an open-source novice started contributing to and helping CuPy over the years grow into a full-fledged, reliable, GPU-accelerated array library that covers most of NumPy, SciPy, and Numba functionalities.

Unlocking AI Performance with NeMo Curator: Scalable Data Processing for LLMs

2025-07-09 · SciPy 2025

talk

by Allison Ding

AI/ML Data Quality LLM

Training Large Language Models (LLMs) requires processing massive-scale datasets efficiently. Traditional CPU-based data pipelines struggle to keep up with the exponential growth of data, leading to bottlenecks in model training. In this talk, we present NeMo Curator, an accelerated, scalable Python-based framework designed to curate high-quality datasets for LLMs efficiently. Leveraging GPU-accelerated processing with RAPIDS, NeMo Curator provides modular pipelines for synthetic data generation, deduplication, filtering, classification, and PII redaction—improving data quality and training efficiency.

We will showcase real-world examples demonstrating how multi-node, multi-GPU processing scales dataset preparation to 100+ TB of data, achieving up to 7% improvement in LLM downstream tasks. Attendees will gain insights into configurable pipelines that enhance training workflows, with a focus on reproducibility, scalability, and open-source integration within Python's scientific computing ecosystem.

Breaking Out of the Loop: Refactoring Legacy Software with Polars

2025-07-09 · SciPy 2025

talk

by Brodie Vidrine

Java Polars

Data manipulation libraries like Polars allow us to analyze and process data much faster than with native Python, but that’s only true if you know how to use them properly. When the team working on NCEI's Global Summary of the Month first integrated Polars, they found it was actually slower than the original Java version. In this talk, we'll discuss how our team learned how to think about computing problems like spreadsheet programmers, increasing our products’ processing speed by over 80%. We’ll share tips for rewriting legacy code to take advantage of parallel processing. We’ll also cover how we created custom, pre-compiled functions with Numba when the business requirements were too complex for native Polars expressions.

Burning fuel for cheap! Transport-independent depletion in OpenMC

2025-07-09 · SciPy 2025

talk

by Oleksandr Yardas

API Monte Carlo

OpenMC is an open source, community-developed, Monte Carlo tool for neutron transport simulations, featuring a depletion module for fuel burnup calculations in nuclear reactors and a Python API. Depletion calculations can be expensive as they require solving the neutron transport and bateman equations in each timestep to update the neutron flux and material composition, respectively. Material properties such as temperature and density govern material cross sections, which in turn govern reaction rates. The reaction rates can effect the neutron population. In a scenario where there is no significant change in the material properties or composition, the transport simulation may only need to be run once; the same cross sections are used for the entire depletion calculation. We recently extended the depletion module in OpenMC to enable transport-independent depletion using multigroup cross sections and fluxes. This talk will focus on the technical details of this feature, its validation, and briefly touch on areas where the feature has been used. Two recent use cases will be highlighted. The first use case calculates shutdown dose rates for fusion power applications, and the second performs depletion for fission reactor fuel cycle modeling.

GBNet: Gradient Boosting packages integrated into PyTorch

2025-07-09 · SciPy 2025

talk

by Michael Horrell

PyTorch

GBNet

Gradient Boosting Machines (GBMs) are widely used for their predictive power and interpretability, while Neural Networks offer flexible architectures but can be opaque. GBNet is a Python package that integrates XGBoost and LightGBM with PyTorch. By leveraging PyTorch’s auto-differentiation, GBNet enables novel architectures for GBMs that were previously exclusive to pure Neural Networks. The result is a greatly expanded set of applications for GBMs and an improved ability to interpret expressive architectures due to the use of GBMs.

Accelerating Genomic Data Science and AI/ML with Composability

2025-07-09 · SciPy 2025

talk

by Nezar Abdennur , Trevor Manz

AI/ML Analytics Arrow Data Analytics Data Science Rust

The practice of data science in genomics and computational biology is fraught with friction. This is largely due to a tight coupling of bioinformatic tools to file input/output. While omic data is specialized and the storage formats for high-throughput sequencing and related data are often standardized, the adoption of emerging open standards not tied to bioinformatics can help better integrate bioinformatic workflows into the wider data science, visualization, and AI/ML ecosystems. Here, we present two bridge libraries as short vignettes for composable bioinformatics. First, we present Anywidget, an architecture and toolkit based on modern web standards for sharing interactive widgets across all Jupyter-compatible runtimes, including JupyterLab, Google Colab, VSCode, and more. Second, we present Oxbow, a Rust and Python-based adapter library that unifies access to common genomic data formats by efficiently transforming queries into Apache Arrow, a standard in-memory columnar representation for tabular data analytics. Together, we demonstrate the composition of these libraries to build a custom connected genomic analysis and visualization environments. We propose that components such as these, which leverage scientific domain-agnostic standards to unbundle specialized file manipulation, analytics, and web interactivity, can serve as reusable building blocks for composing flexible genomic data analysis and machine learning workflows as well as systems for exploratory data analysis and visualization.

Dynamic Data with Matplotlib

2025-07-09 · SciPy 2025

talk

by Kyle Sunden

DataViz Matplotlib Pandas

Matplotlib is already a favorite plotting library for creating static data visualizations in Python. Here, we discuss the development of a new DataContainer interface and accompanying transformation pipeline which enable easier dynamic data visualization in Matplotlib. This improves the experience of plotting pure functions, automatically recomputing when you pan and zoom. Data containers can ingest data from a variety of sources, including structured data such as Pandas Dataframes or Xarrays, up to live updating data from web services or databases. The flexible transformation pipeline allows for control over how your data is encoded into a plot.

Python for Climate Science: Using Intake to provide easy access to Climate Model data

2025-07-09 · SciPy 2025

talk

by Charles Turner

Climate models generate a lot of data - and this can make it hard for researchers to efficiently access and use the data they need. The solutions of yesteryear include standardised file structures, sqlite databases, and just knowing where to look. All of these work - to varying degrees - but can leave new users scratching their heads. In this talk, I'll outline how ACCESS-NRI built tooling around Intake and Intake-ESM to make it easy for climate researchers to access available data, share their own, and avoid writing the custom scripts over and over to work with the data their experiments generate.

RAG Apps with Python, SurrealDB & Streamlit

2025-07-09 · RAG apps using Python, SurrealDB and Streamlit

talk

streamlit surrealdb

An online session on building production-ready RAG apps using Python, SurrealDB, and Streamlit. Learn the fundamentals in pure Python before using a framework, how to manage multi-model data with SurrealDB, and how to build a front end for your RAG app with Streamlit.

RAG Apps with Python, SurrealDB & Streamlit

2025-07-09 · RAG apps using Python, SurrealDB and Streamlit

talk

streamlit surrealdb

Overview of building production-ready RAG applications using only Python, SurrealDB, and Streamlit; covers fundamentals in pure Python, managing multi-model data with SurrealDB, and building a front end for a RAG app with Streamlit.

What We Maintain, We Defend

2025-07-09 · SciPy 2025

talk

by Hon. Kathryn D. Huff, PhD

SciPy

Scientific Python is not only at the heart of discovery and advancement, but also infrastructure. This talk will provide a perspective on how open-source Python tools that are already powering real-world impact across the sciences are also supportive of public institutions and critical public data infrastructure. Drawing on her previous experience leading policy efforts in the Department of Energy as well as her experience in open-source scientific computing, Katy will highlight the indispensable role of transparency, reproducibility, and community in high-stakes domains. This talk invites the SciPy community to recognize its unique strengths and to amplify their impact by contributing to the public good through technically excellent, civic-minded development.

Data wrangling in a modern terminal

2025-07-09 · PyData Prague #28 - Terminal Request

talk

DataViz console-based tools data analysis terminal

Once we constrain ourselves to a rectangle of fixed-width characters (preferably white on a black background), we start to see the world a bit differently. If we want to thoroughly investigate it (a.k.a. perform data analysis), we have to be equipped with appropriate tools - be it techniques, libraries or standalone console-based applications. Let's see what the terminal has to offer when reading, manipulating, presenting and even plotting numerical data. We might even finish with a live dashboard your audience will love (or perhaps will not).

Right-Sized Scaling: Python APIs at Billions of Requests Without the Complexity

2025-07-09 · PyData Prague #28 - Terminal Request

talk

apis scaling web

At Printables.com, we handle billions of requests every month using a fairly simple, Python-based API stack that scales reliably without unnecessary complexity. In this talk, I’ll share how embracing pragmatism over hype helped us avoid overengineering—proving that microservices and complex architectures aren’t always the answer for every challenge. We’ll explore key design choices, real-world bottlenecks, and practical lessons from our journey to build a maintainable, cost-effective system that delivers at scale. Whether you’re growing a startup or managing a mature platform, you’ll gain actionable insights for scaling Python APIs with confidence.

Bring Accelerated Computing to Data Science in Python

2025-07-08 · SciPy 2025

talk

by Kevin Lee

Data Science

As data science continues to evolve, the ever-growing size of datasets poses significant computational challenges. Traditional CPU-based processing often struggles to keep pace with the demands of data science workflows. Accelerated computing with GPUs offers a solution by enabling massive parallelism and significantly reducing processing times for data-heavy tasks. In this session, we will explore GPU computing architecture, how it differs from CPUs, and why it is particularly well-suited for data science workloads. This hands-on lab will dive into the different approaches to GPU programming, from low-level CUDA coding to high-level Python libraries within RAPIDS such as, CuPy, cuDF, cuGraph, and cuML.

Scaling-up deep learning inference to large-scale bioimage data

2025-07-08 · SciPy 2025

talk

by Peter Sobolewski , Fernando Cervantes Sanchez

AI/ML

Artificial intelligence has been successfully applied to bioimage understanding and achieved significative results in the last decade. Advances in imaging technologies have also allowed the acquisition of higher resolution images. That has increased not only the magnification at what images are captured, but the size of the acquired images as well. This comprises a challenge for deep learning inference in large-scale images, since these methods are commonly used in relatively small regions rather than whole images. This workshop presents techniques to scale-up inference of deep learning models to large-scale image data with help of Dask for parallelization in Python.

Shiny for Python: Building Production-Ready Dashboards in Python

2025-07-08 · SciPy 2025

talk

by Daniel Chen

Shiny is a framework for building web applications and data dashboards in Python. In this workshop, you will see how the basic building blocks of shiny can be extended to create your own scalable production-ready python applications.

In particular, this workshop covers:

Overview of the basic building blocks of a Shiny for Python application
How to refactor applications into shiny modules
How to write tests for your shiny application
Deploy and share your application

At the end of this course you will be able to:

Build a Shiny app in Python
Refactor your reactive logic into Shiny Modules
Identify when to write Shiny modules
Write unit tests and end-to-end tests for your shiny application
Deploy and share your application (for free!)

The future of Python Notebooks

2025-07-08 · Django London July

talk

AI/ML

This talk covers how Python notebooks are evolving from static documents to interactive, collaborative, and production-ready environments, using Marimo. We’ll examine emerging trends—such as AI-powered and reactive notebooks, notebook-as-app frameworks, and integration with modern workflows—equipping attendees with insights to leverage notebooks to reshape coding, teaching, and scientific publishing.

Page 16 of 73

← Previous

1 ... 14 15 16 17 18 ... 73