API

User guides: engaging new users, delighting old ones

2025-07-09 · SciPy 2025

talk

by Michael Chow (RStudio)

DuckDB Polars

User guides are the piece you often hit right after clicking the "Learn" or "Get Started" button in a package's documentation. They're responsible for onboarding new users, and providing a learning path through a package. Surprisingly, while pieces of documentation like the API Reference tend to be the same, the design of user guides tend to differ across packages.

In this talk, I'll discuss how to design an effective user guide for open source software. I'll explain how the guides for Polars, DuckDB, and FastAPI balance working end-to-end like a course, with being browsable like a reference.

Python is all you need: an overview of the composable, Python-native data stack

2025-07-09 · SciPy 2025

talk

by Deepyaman Datta

Data Engineering dbt Modern Data Stack Python SQL

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges existing open-source libraries like Kedro and Pandera and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

Burning fuel for cheap! Transport-independent depletion in OpenMC

2025-07-09 · SciPy 2025

talk

by Oleksandr Yardas

Monte Carlo Python

OpenMC is an open source, community-developed, Monte Carlo tool for neutron transport simulations, featuring a depletion module for fuel burnup calculations in nuclear reactors and a Python API. Depletion calculations can be expensive as they require solving the neutron transport and bateman equations in each timestep to update the neutron flux and material composition, respectively. Material properties such as temperature and density govern material cross sections, which in turn govern reaction rates. The reaction rates can effect the neutron population. In a scenario where there is no significant change in the material properties or composition, the transport simulation may only need to be run once; the same cross sections are used for the entire depletion calculation. We recently extended the depletion module in OpenMC to enable transport-independent depletion using multigroup cross sections and fluxes. This talk will focus on the technical details of this feature, its validation, and briefly touch on areas where the feature has been used. Two recent use cases will be highlighted. The first use case calculates shutdown dose rates for fusion power applications, and the second performs depletion for fission reactor fuel cycle modeling.

Network Analysis Made Simple

2025-07-08 · SciPy 2025

talk

by Eric Ma

LLM

Through the use of NetworkX's API, tutorial participants will learn about the basics of graph theory and its use in applied network science. Starting with a computationally-oriented definition of a graph and its associated methods, we will progress through the following concepts: path and structure finding, visualization, and graph storage on disk. We will also offer tutorial participants the option of one advanced topic overview, including the use of graphs alongside LLMs for knowledge retrieval, scalable alternatives to NetworkX including cuGraph, and the use of linear algebraic translation of graph problems to speed up computations.

3D Visualization with PyVista

2025-07-07 · SciPy 2025

talk

by Alexander Kaszynski , Tetsuo Koyama , Bane Sullivan

GitHub HTML Matplotlib

PyVista is a general purpose 3D visualization library used for over 2000+ open source projects for the visualization of everything from computer aided engineering and geophysics to volcanoes and digital artwork.

PyVista exposes a Pythonic API to the Visualization Toolkit (VTK) to provide tooling that is immediately usable without any prior knowledge of VTK and is being built as the 3D equivalent of Matplotlib, with plugins to Jupyter to enable visualization of 3D data using both server- and client-side rendering.

Vega-Altair: A Structured Way to Build Interactive Charts

2025-07-07 · SciPy 2025

talk

by Jon Mease , Dylan Wootton

DataViz Python

This tutorial is an introduction to data visualization using the popular Vega-Altair Python library. Vega-Altair provides a simple and expressive API, enabling authors to rapidly create a wide range of interactive charts.

Participants will explore the fundamentals of effective chart design and gain hands-on experience building a variety of visualizations using Vega-Altair's declarative API. Furthermore, this tutorial will introduce users to advanced topics such as data transformations and interaction design. We will finish off by covering practical workflows such as integrating Vega-Altair into dashboarding systems, publishing visualizations, and creating reusable, themed charting libraries. By the end of the session, attendees will have the skills to leverage Vega-Altair for both rapid prototyping and production-ready visualizations in diverse environments

Foundational Data Engineering At Two Sigma

2025-07-06 · Data Engineering Podcast Listen

podcast_episode

by Effie Baram (Two Sigma) , Tobias Macey

AI/ML Data Collection Data Engineering Data Management Data Quality Datafold Python

Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing data quality with delivery speed, and the socio-technical challenges of building a foundational data platform that supports research and operational needs while maintaining regulatory compliance and data quality. Effie also shares insights into treating data as code, leveraging modern data warehouses, and the evolving role of data engineers in a rapidly changing technological landscape.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial. Your host is Tobias Macey and today I'm interviewing Effie Baram about data engineering in the finance sectorInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the role of data in the context of Two Sigma?What are some of the key characteristics of the types of data sources that you work with?Your role is leading "foundational data engineering" at Two Sigma. Can you unpack that title and how it shapes the ways that you think about what you build?How does the concept of "foundational data" influence the ways that the business thinks about the organizational patterns around data?Given the regulatory environment around finance, how does that impact the ways that you think about the "what" and "how" of the data that you deliver to data consumers?Being the foundational team for data use at Two Sigma, how have you approached the design and architecture of your technical systems?How do you think about the boundaries between your responsibilities and the rest of the organization?What are the design patterns that you have found most helpful in empowering data consumers to build on top of your work?What are some of the elements of sociotechnical friction that have been most challenging to address?What are the most interesting, innovative, or unexpected ways that you have seen the ideas around "foundational data" applied in your organization?What are the most interesting, unexpected, or challenging lessons that you have learned while working with financial data?When is a foundational data team the wrong approach?What do you have planned for the future of your platform design?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links 2SigmaReliability EngineeringSLA == Service-Level AgreementAirflowParquet File FormatBigQuerySnowflakedbtGemini AssistMCP == Model Context ProtocoldtraceThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Aran White — Head of API Security, Broadcom

2025-07-02 · GenerationAI 2025 - Munich (Agentic AI, MCP and more)

talk

by Aran White (Broadcom)

Cyber Security

Featured speaker at GenerationAI 2025