Python

The Data Engineer's Guide to Microsoft Fabric

2027-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Henrik Reich (twoday Data & AI)

Data Engineering Data Lakehouse Databricks ETL/ELT Microsoft Fabric Spark SQL Data Streaming analytics-platforms data data-science +1 more

Modern data engineering is evolving; and with Microsoft Fabric, the entire data platform experience is being redefined. This essential book offers a fresh, hands-on approach to navigating this shift. Rather than being an introduction to features, this guide explains how Fabric's key components—Lakehouse, Warehouse, and Real-Time Intelligence—work under the hood and how to put them to use in realistic workflows. Written by Christian Henrik Reich, a data engineering expert with experience that extends from Databricks to Fabric, this book is a blend of foundational theory and practical implementation of lakehouse solutions in Fabric. You'll explore how engines like Apache Spark and Fabric Warehouse collaborate with Fabric's Real-Time Intelligence solution in an integrated platform, and how to build ETL/ELT pipelines that deliver on speed, accuracy, and scale. Ideal for both new and practicing data engineers, this is your entry point into the fabric of the modern data platform. Acquire a working knowledge of lakehouses, warehouses, and streaming in Fabric Build resilient data pipelines across real-time and batch workloads Apply Python, Spark SQL, T-SQL, and KQL within a unified platform Gain insight into architectural decisions that scale with data needs Learn actionable best practices for engineering clean, efficient, governed solutions

Causal Inference with Bayesian Networks

2026-09-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Yousri El Fattah , Reza Bagheri

AI/ML bayesian-statistics data data-science data-science-tasks statistics

Leverage the power of graphical models for probabilistic and causal inference to build knowledge-based system applications and to address causal effect queries with observational data for decision aiding and policy making. Key Features Gain a firm understanding of Bayesian networks and structured algorithms for probabilistic inference Acquire a comprehensive understanding of graphical models and their applications in causal inference Gain insights into real-world applications of causal models in multiple domains Enhance your coding skills in R and Python through hands-on examples of causal inference Book Description This is a practical guide that explores the theory and application of Bayesian networks (BN) for probabilistic and causal inference. The book provides step-by-step explanations of graphical models of BN and their structural properties; the causal interpretations of BN and the notion of conditioning by intervention; and the mathematical model of structural equations and the representation in structured causal models (SCM). For probabilistic inference in Bayesian networks, you will learn methods of variable elimination and tree clustering. For causal inference you will learn the computational framework of Pearl's do-calculus for the identification and estimation of causal effects with causal models. In the context of causal inference with observational data, you will be introduced to the potential outcomes framework and explore various classes of meta-learning algorithms that are used to estimate the conditional average treatment effect in causal inference. The book includes practical exercises using R and Python for you to engage in and solidify your understanding of different approaches to probabilistic and causal inference. By the end of this book, you will be able to build and deploy your own causal inference application. You will learn from causal inference sample use cases for diagnosis, epidemiology, social sciences, economics, and finance. What you will learn Representation of knowledge with Bayesian networks Interpretation of conditional independence assumptions Interpretation of causality assumptions in graphical models Probabilistic inference with Bayesian networks Causal effect identification and estimation Machine learning methods for causal inference Coding in R and Python for probabilistic and causal inference Who this book is for This book will serve as a valuable resource for a wide range of professionals including data scientists, software engineers, policy analysts, decision-makers, information technology professionals involved in developing expert systems or knowledge-based applications that deal with uncertainty, as well as researchers across diverse disciplines seeking insights into causal analysis and estimating treatment effects in randomized studies. The book will enable readers to leverage libraries in R and Python and build software prototypes for their own applications.

AI Agents with MCP

2026-08-25 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Kyle Stratis (Stratis Data Labs)

AI/ML GenAI ai-agents ai-ml artificial-intelligence-ai data

Since its release in late 2024, Anthropic's Model Context Protocol (MCP) has redefined how developers build and connect AI agents to tools, data, and each other. AI Agents with MCP is the first comprehensive guide to this rapidly emerging standard, helping engineers unlock its full potential with hands-on projects. Whether you're developing agentic workflows, bridging tools across platforms, or creating robust multiagent systems, this book walks you through every layer of MCP--from protocol structure to server and client implementation. Author Kyle Stratis provides the practical expertise needed to build fully functional MCP servers, clients, and more. Unlike high-level overviews or fragmented documentation, this book gives you a deep systems-level understanding of MCP's capabilities--and limitations. With its flexible, model-agnostic design, MCP continues to gain traction across the generative AI community; this book ensures you're ready to build with it confidently and effectively. Understand the structure and core concepts of the Model Context Protocol Build complete MCP servers, clients, and transport layers in Python Consume tools, prompts, and data via MCP-based agent workflows Extend agent capabilities with MCP for large-scale and AI-native systems

Practical Statistics for Data Scientists, 3rd Edition

2026-08-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andrew Bruce , Peter Bruce , Peter Gedeck

Data Science data data-science data-science-tasks statistics

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. And many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

Data Engineering with Azure Databricks

2026-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

AI/ML Airflow Analytics Azure ADF Azure DevOps CI/CD Cloud Computing Data Engineering Data Governance Data Lakehouse Databricks +11 more

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

Leading the AI Data Revolution: Roadmap to Agentic Management with Python, Spark & Open Tables

2026-03-11 · gartner-data-analytics-us-2026

talk

by Aaron Rosenbaum (Gartner)

AI/ML Data Management Spark

We are at the start of a massive, AI-driven feedback loop. A loop between a universal language, Python, a universal engine, Spark, and universal storage, Open Table Formats, that will accelerate us from simple automation to fully agentic, automated data management. This session helps D&A leaders assess their strategy for navigating this disruptive transition and its opportunities and risks.

Data Contracts in Practice

2026-02-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ryan Collingwood

Data Contracts Data Governance Data Quality JSON SQL YAML data data-engineering

In 'Data Contracts in Practice', Ryan Collingwood provides a detailed guide to managing and formalizing data responsibilities within organizations. Through practical examples and real-world use cases, you'll learn how to systematically address data quality, governance, and integration challenges using data contracts. What this Book will help me do Learn to identify and formalize expectations in data interactions, improving clarity among teams. Master implementation techniques to ensure data consistency and quality across critical business processes. Understand how to effectively document and deploy data contracts to bolster data governance. Explore solutions for proactively addressing and managing data changes and requirements. Gain real-world skills through practical examples using technologies like Python, SQL, JSON, and YAML. Author(s) Ryan Collingwood is a seasoned expert with over 20 years of experience in product management, data analysis, and software development. His holistic techno-social approach, designed to address both technical and organizational challenges, brings a unique perspective to improving data processes. Ryan's writing is informed by his extensive hands-on experience and commitment to enabling robust data ecosystems. Who is it for? This book is ideal for data engineers, software developers, and business analysts working to enhance organizational data integration. Professionals with a familiarity of system design, JSON, and YAML will find it particularly beneficial. Enterprise architects and leadership roles looking to understand data contract implementation and their business impacts will also greatly benefit. Basic understanding of Python and SQL is recommended to maximize learning.

Time Series Analysis with Python Cookbook - Second Edition

2026-01-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tarek A. Atwan

AI/ML Pandas Polars PyTorch TensorFlow data data-science data-science-tasks statistics time-series

Perform time series analysis and forecasting confidently with this Python code bank and reference manual Purchase of the print or Kindle book includes a free PDF eBook Key Features Explore up-to-date forecasting and anomaly detection techniques using statistical, machine learning, and deep learning algorithms Learn different techniques for evaluating, diagnosing, and optimizing your models Work with a variety of complex data with trends, multiple seasonal patterns, and irregularities Book Description To use time series data to your advantage, you need to be well-versed in data preparation, analysis, and forecasting. This fully updated second edition includes chapters on probabilistic models and signal processing techniques, as well as new content on transformers. Additionally, you will leverage popular libraries and their latest releases covering Pandas, Polars, Sktime, stats models, stats forecast, Darts, and Prophet for time series with new and relevant examples. You'll start by ingesting time series data from various sources and formats, and learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods. Further, you'll explore forecasting using classical statistical models (Holt-Winters, SARIMA, and VAR). Learn practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Then we will move into more advanced topics such as building ML and DL models using TensorFlow and PyTorch, and explore probabilistic modeling techniques. In this part, you’ll also learn how to evaluate, compare, and optimize models, making sure that you finish this book well-versed in wrangling data with Python. What you will learn Understand what makes time series data different from other data Apply imputation and interpolation strategies to handle missing data Implement an array of models for univariate and multivariate time series Plot interactive time series visualizations using hvPlot Explore state-space models and the unobserved components model (UCM) Detect anomalies using statistical and machine learning methods Forecast complex time series with multiple seasonal patterns Use conformal prediction for constructing prediction intervals for time series Who this book is for This book is for data analysts, business analysts, data scientists, data engineers, and Python developers who want practical Python recipes for time series analysis and forecasting techniques. Fundamental knowledge of Python programming is a prerequisite. Prior experience working with time series data to solve business problems will also help you to better utilize and apply the different recipes in this book.

Managing and Visualizing BIM Data with AI

2026-01-16 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Bruno Martorelli

AI/ML Analytics BI Data Analytics Data Management DataViz DynamoDB IoT Power BI business-intelligence data data-science +2 more

Unlock the potential of your BIM workflows with artificial intelligence and data visualization tools. This book provides guided instruction on using software like Revit, Dynamo, Python, and Power BI to automate processes, derive insights, and craft tailored dashboards that empower data-driven decisions in AEC projects. What this Book will help me do Effectively preprocess and manage BIM data for analysis and visualization. Design interactive and insightful dashboards in Power BI for project stakeholders. Integrate real-time IoT data and advanced analytics into BIM projects. Automate repetitive tasks in Revit using Dynamo and Python scripting. Understand the ethical considerations and emerging trends in AI for BIM. Author(s) Bruno Martorelli, a seasoned BIM manager, specializes in integrating technology and data analytics into construction workflows. With a background in architecture and programming, he bridges the gap between traditional methods and modern innovations. Bruno is dedicated to sharing practical strategies for data automation and visualization. Who is it for? This book is tailored for architects, engineers, and construction managers interested in elevating their BIM practices. If you're familiar with Revit and possess a basic understanding of data management, you'll find this resource invaluable. Beginners in Python or Power BI will also find accessible guidance to start applying advanced techniques in their workflows.

Initiation à la data analyse avec Python

2026-01-15 · Initiez-vous à la Data Analyse avec Python

workshop

by Intervenants du Wagon (Le Wagon)

Pandas jupyter notebook visualisation

Atelier de 2 heures animé par les intervenants du Wagon sur les bases de Python et Jupyter Notebook, les fondamentaux de Pandas et des techniques de visualisation, et le sourcing de données Airbnb en ligne pour l’analyse.

Learn Data Science Using SAS Studio : From Clicks to Code

2026-01-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Engy Fouda

Analytics Data Science Marketing SAS analytics-platforms data data-science

Do you want to create data analysis reports without writing a line of code? This book introduces SAS Studio, a free, web-based data science product for educational and non-commercial purposes. The power of SAS Studio lies in its visual, point-and-click user interface, which generates SAS code. It is easier to learn SAS Studio than to learn R and Python to accomplish data cleaning, statistics, and visualization tasks. The book includes a case study analyzing the data required to predict the results of presidential elections in the state of Maine for 2016 and 2020. In addition to the presidential elections, the book provides real-life examples, including analyses of stock, oil, and gold prices, crime, marketing, and healthcare. You will see data science in action and how easily it can be performed using complicated tasks and visualizations in SAS Studio. You will learn, step by step, how to perform visualizations, including creating maps. In most cases, you will not need a line of code as you work with the SAS Studio graphical user interface. The book includes explanations of the code that SAS Studio generates automatically. You will learn how to edit this code to perform more complicated advanced tasks. What You Will Learn Become familiar with the SAS Studio IDE. How to create essential visualizations. Know the fundamental statistical analysis required in most data science and analytics reports. Clean the most common dataset problems Learn linear and logistic regression for data prediction and analysis. Write programs in SAS. How to analyze data and get insights from it for decision-making. Learn character, numeric, date, time, and datetime functions and typecasting. Who This Book Is For A general audience of people who are new to data science, students, and data analysts and scientists who are new to SAS. No prior programming or statistical knowledge is required.

The Data Flow Map: A Practical Guide to Clear and Creative Analytics in Any Data Environment

2026-01-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Nick Ryberg

AI/ML Analytics Data Analytics Data Collection SQL data data-science

Unlock the secrets of practical data analysis with the Data Flow Map framework—a game-changing approach that transcends tools and platforms. This book isn’t just another programming manual; it’s a guide to thinking and communicating about data at a higher level. Whether you're working with spreadsheets, databases, or AI-driven models, you'll learn how to express your analytics in clear, common language that anyone can understand. In today’s data-rich world, clarity is the real challenge. Technical details often obscure insights that could drive real impact. The Data Flow Map framework simplifies complexity into three core motions: source, focus, and build. The first half of the book explores these concepts through illustrations and stories. The second half applies them to real-world datasets using tools like Excel, SQL, and Python, showing how the framework works across platforms and use cases. A vital resource for analysts at any level, this book offers a practical, tool-agnostic approach to data analysis. With hands-on examples and a universal mental model, you’ll gain the confidence to tackle any dataset, align your team, and deliver insights that matter. Whether you're a beginner or a seasoned pro, the Data Flow Map framework will transform how you approach data analytics. What You Will Learn Grasp essential elements applicable to every data analysis workflow Adapt quickly to any dataset, tool, or platform Master analytic thinking at a higher level Use analytics patterns to better understand the world Break complex analysis into manageable, repeatable steps Iterate faster to uncover deeper insights and better solutions Communicate findings clearly for better decision-making Who This Book Is For Aspiring data professionals and experienced analysts, from beginners to seasoned data engineers, focused on data collection, analysis, and decision making

Engineering Lakehouses with Open Table Formats

2025-12-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dipankar Mazumdar , Vinoth Govindarajan (Apple)

Airflow Flink Big Data Data Lakehouse Data Management dbt Delta Hudi Iceberg Spark data data-engineering +2 more

Engineering Lakehouses with Open Table Formats introduces the architecture and capabilities of open table formats like Apache Iceberg, Apache Hudi, and Delta Lake. The book guides you through the design, implementation, and optimization of lakehouses that can handle modern data processing requirements effectively with real-world practical insights. What this Book will help me do Understand the fundamentals of open table formats and their benefits in lakehouse architecture. Learn how to implement performant data processing using tools like Apache Spark and Flink. Master advanced topics like indexing, partitioning, and interoperability between data formats. Explore data lifecycle management and integration with frameworks like Apache Airflow and dbt. Build secure lakehouses with regulatory compliance using best practices detailed in the book. Author(s) Dipankar Mazumdar and Vinoth Govindarajan are seasoned professionals with extensive experience in big data processing and software architecture. They bring their expertise from working with data lakehouses and are known for their ability to explain complex technical concepts clearly. Their collaborative approach brings valuable insights into the latest trends in data management. Who is it for? This book is ideal for data engineers, architects, and software professionals aiming to master modern lakehouse architectures. If you are familiar with data lakes or warehouses and wish to transition to an open data architectural design, this book is suited for you. Readers should have basic knowledge of databases, Python, and Apache Spark for the best experience.

Hands-On Software Engineering with Python - Second Edition

2025-12-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brian Allbee

Agile/Scrum CI/CD Cloud Computing Data Modelling Docker GitHub Pydantic programming-languages software-development

Grow your software engineering discipline, incorporating and mastering design, development, testing, and deployment best practices examples in a realistic Python project structure. Key Features Understand what makes Software Engineering a discipline, distinct from basic programming Gain practical insight into updating, refactoring, and scaling an existing Python system Implement robust testing, CI/CD pipelines, and cloud-ready architecture decisions Book Description Software engineering is more than coding; it’s the strategic design and continuous improvement of systems that serve real-world needs. This newly updated second edition of Hands-On Software Engineering with Python expands on its foundational approach to help you grow into a senior or staff-level engineering role. Fully revised for today’s Python ecosystem, this edition includes updated tooling, practices, and architectural patterns. You’ll explore key changes across five minor Python versions, examine new features like dataclasses and type hinting, and evaluate modern tools such as Poetry, pytest, and GitHub Actions. A new chapter introduces high-performance computing in Python, and the entire development process is enhanced with cloud-readiness in mind. You’ll follow a complete redesign and refactor of a multi-tier system from the first edition, gaining insight into how software evolves—and what it takes to do that responsibly. From system modeling and SDLC phases to data persistence, testing, and CI/CD automation, each chapter builds your engineering mindset while updating your hands-on skills. By the end of this book, you'll have mastered modern Python software engineering practices and be equipped to revise and future-proof complex systems with confidence. What you will learn Distinguish software engineering from general programming Break down and apply each phase of the SDLC to Python systems Create system models to plan architecture before writing code Apply Agile, Scrum, and other modern development methodologies Use dataclasses, pydantic, and schemas for robust data modeling Set up CI/CD pipelines with GitHub Actions and cloud build tools Write and structure unit, integration, and end-to-end tests Evaluate and integrate tools like Poetry, pytest, and Docker Who this book is for This book is for Python developers with a basic grasp of software development who want to grow into senior or staff-level engineering roles. It’s ideal for professionals looking to deepen their understanding of software architecture, system modeling, testing strategies, and cloud-aware development. Familiarity with core Python programming is required, as the book focuses on applying engineering principles to maintain, extend, and modernize real-world systems.

Bioinformatics with Python Cookbook - Fourth Edition

2025-12-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Shane Brubaker

AI/ML Cloud Computing Data Science bioinformatics data data-science data-science-domains

Bioinformatics with Python Cookbook provides a practical, hands-on approach to solving computational biology challenges with Python, enabling readers to analyze sequencing data, leverage AI for bioinformatics applications, and design robust computational pipelines. What this Book will help me do Perform comprehensive sequence analysis using Python libraries for refined data interpretation. Configure and run bioinformatics workflows on cloud environments for scalable solutions. Apply advanced data science practices to analyze and visualize bioinformatics data. Explore the integration of AI tools in processing multimodal biological datasets. Understand and utilize bioinformatics databases for research and development. Author(s) Shane Brubaker is an experienced computational biologist and software developer with a strong background in bioinformatics and Python programming. With years of experience in data analysis and software engineering, Shane has authored numerous solutions for real-world bioinformatics issues. He brings a practical, example-driven teaching approach, aimed at empowering readers to apply techniques effectively in their work. Who is it for? This book is suitable for bioinformatics professionals, data scientists, and software engineers with moderate experience seeking to expand their computational biology knowledge. Readers should have basic understanding of biology, programming, and cloud tools. By engaging with this book, learners can advance their skills in Python and bioinformatics to address complex biological data challenges effectively.

Building Translamore: Lessons Learned from a Side Project

2025-12-11 · Flutter Berlin – Winter Wonderland

talk

dart llms prompt engineering

Mildly annoyed by the big green owl's limitations, I decided to build Translamore - an app that lets you turn whatever you're reading into your own language exercises. What started as a small weekend project quickly turned into a full-blown side quest. I'll kick things off with a quick demo of Translamore and then share some of the lessons from building it in my spare time: Project Management: staying organized when no one's watching, figuring out what's worth your time, keeping motivation alive, and the few tools that saved my sanity; LLMs & Prompt Engineering: what actually worked for me, using unit tests to wrangle prompts, a bit of templating magic, and my Prompt Resolver contraption; Server-Side Dart: why you really shouldn't ship your LLM API keys, how I structured packages and dependencies, used sealed classes for the API, and yes - called Python from Dart in the least elegant way possible. Expect some lessons, a few confessions, and probably one or two dont do what I did moments.

Surviving the Agentic Hype with Small Language Models

2025-12-10 · PyData Boston 2025 Watch

talk

by Serhii Sokolenko (Tower Dev)

AI/ML LLM

The AI landscape is abuzz with talk of "agentic intelligence" and "autonomous reasoning." But beneath the hype, a quieter revolution is underway: Small Language Models (SLMs) are starting to perform the core reasoning and orchestration tasks once thought to require massive LLMs. In this talk, we’ll demystify the current state of “AI agents,” show how compact models like Phi-2, xLAM 8B, and Nemotron-H 9B can plan, reason, and call tools effectively, and demonstrate how you can deploy them on consumer-grade hardware. Using Python and lightweight frameworks such as LangChain, we’ll show how anyone can quickly build and experiment with their own local agentic systems. Attendees will leave with a grounded understanding of agent architectures, SLM capabilities, and a roadmap for running useful agents without the GPU farm.

Data engineering with Python the right way: introducing the composable, Python-native data stack

2025-12-10 · PyData Boston 2025

talk

by Deepyaman Datta

API Data Engineering dbt Modern Data Stack SQL

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges open-source libraries like Kedro, Pandera, and the Boring Semantic Layer and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

Evaluating AI Agents in production with Python

2025-12-10 · PyData Boston 2025 Watch

talk

by Susan Shu Chang

AI/ML LLM

This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.

Processing large JSON files without running out of memory

2025-12-10 · PyData Boston 2025 Watch

talk

by Itamar Turner-Trauring

JSON

If you need to process a large JSON file in Python, it’s very easy to run out of memory while loading the data, leading to a super-slow run time or out-of-memory crashes. In this talk you'll learn:

How to measure memory usage.
Why loading JSON takes a lot of memory.
Four different ways to reduce memory usage when loading large JSON files.

talk-data.com

Activity Trend

Top Events

Top Speakers

The Data Engineer's Guide to Microsoft Fabric

Causal Inference with Bayesian Networks

AI Agents with MCP

Practical Statistics for Data Scientists, 3rd Edition

Data Engineering with Azure Databricks

Leading the AI Data Revolution: Roadmap to Agentic Management with Python, Spark & Open Tables

Data Contracts in Practice

Time Series Analysis with Python Cookbook - Second Edition

Managing and Visualizing BIM Data with AI

Initiation à la data analyse avec Python

Learn Data Science Using SAS Studio : From Clicks to Code

The Data Flow Map: A Practical Guide to Clear and Creative Analytics in Any Data Environment

Engineering Lakehouses with Open Table Formats

Hands-On Software Engineering with Python - Second Edition

Bioinformatics with Python Cookbook - Fourth Edition

Building Translamore: Lessons Learned from a Side Project

Surviving the Agentic Hype with Small Language Models

Data engineering with Python the right way: introducing the composable, Python-native data stack

Evaluating AI Agents in production with Python

Processing large JSON files without running out of memory