Building Data Products

2026-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jean-Georges Perrin (Actian)

AI/ML CI/CD Data Contracts DevOps Cyber Security data data-engineering

As organizations grapple with fragmented data, siloed teams, and inconsistent pipelines, data products have emerged as a practical solution for delivering trusted, scalable, and reusable data assets. In Building Data Products, Jean-Georges Perrin provides a comprehensive, standards-driven playbook for designing, implementing, and scaling data products that fuel innovation and cross-functional collaboration—whether or not your organization adopts a full data mesh strategy. Drawing on extensive industry experience and practitioner interviews, Perrin shows readers how to build metadata-rich, governed data products aligned to business domains. Covering foundational concepts, real-world use cases, and emerging standards like Bitol ODPS and ODCS, this guide offers step-by-step implementation advice and practical code examples for key stages—ownership, observability, active metadata, compliance, and integration. Design data products for modular reuse, discoverability, and trust Implement standards-driven architectures with rich metadata and security Incorporate AI-driven automation, SBOMs, and data contracts Scale product-driven data strategies across teams and platforms Integrate data products into APIs, CI/CD pipelines, and DevOps practices

No Cloud? No Problem. Local RAG with Embedding Gemma

2025-12-10 · PyData Boston 2025 Watch

talk

by Sanjit Paliwal

Cloud Computing RAG

Running Retrieval-Augmented Generation (RAG) pipelines often feels tied to expensive cloud APIs or large GPU clusters—but it doesn’t have to be. This session explores how Embedding Gemma, Google’s lightweight open embedding model, enables powerful RAG and text classification workflows entirely on a local machine. Using the Sentence Transformers framework with Hugging Face, high-quality embeddings can be generated efficiently for retrieval and classification tasks. Real-world examples involving call transcripts and agent remark classification illustrate how robust results can be achieved without the cloud—or the budget.

Data engineering with Python the right way: introducing the composable, Python-native data stack

2025-12-10 · PyData Boston 2025

talk

by Deepyaman Datta

Data Engineering dbt Modern Data Stack Python SQL

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges open-source libraries like Kedro, Pandera, and the Boring Semantic Layer and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

When Rivers Speak: Analyzing Massive Water Quality Datasets using USGS API and Remote SSH in Positron

2025-12-10 · PyData Boston 2025 Watch

talk

by Rodrigo Silva Ferreira

Analytics Data Engineering DuckDB HTML Parquet Python

Rivers have long been storytellers of human history. From the Nile to the Yangtze, they have shaped trade, migration, settlement, and the rise of civilizations. They reveal the traces of human ambition... and the costs of it. Today, from the Charles to the Golden Gate, US rivers continue to tell stories, especially through data.

Over the past decades, extensive water quality monitoring efforts have generated vast public datasets: millions of measurements of pH, dissolved oxygen, temperature, and conductivity collected across the country. These records are more than environmental snapshots; they are archives of political priorities, regulatory choices, and ecological disruptions. Ultimately, they are evidence of how societies interact with their environments, often unevenly.

In this talk, I’ll explore how Python and modern data workflows can help us "listen" to these stories at scale. Using the United States Geological Survey (USGS) Water Data APIs and Remote SSH in Positron, I’ll process terabytes of sensor data spanning several years and regions. I’ll demonstrate that, while Parquet and DuckDB enable scalable exploration of historical records, using Remote SSH is paramount in order to enable large-scale data analysis. By doing so, I hope to answer some analytical questions that can surface patterns linked to industrial growth, regulatory shifts, and climate change.

By treating rivers as both ecological systems and social mirrors, we can begin to see how environmental data encodes histories of inequality, resilience, and transformation.

Whether your interest lies in data engineering, environmental analytics, or the human dimensions of climate and infrastructure, this talk will explore topics at the intersection of environmental science, will offer both technical methods and sociological lenses to understand the stories rivers continue to tell.

fastplotlib: driving scientific discovery through data visualization

2025-12-10 · PyData Boston 2025 Watch

talk

by Kushal Kolar , Caitlin Lewis

DataViz

Fast interactive visualization remains a considerable barrier in analysis pipelines for large neuronal datasets. Here, we present fastplotlib, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. Fastplotlib is built upon pygfx, which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as Vulkan for fast rendering of objects. Fastplotlib is non-blocking, allowing for interactivity with data after plot generation. Ultimately, fastplotlib is a general-purpose scientific plotting library that is useful for fast and live visualization and analysis of complex datasets.

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks

2025-12-09 · PyData Eindhoven 2025 Watch

talk

by Daria Mustafina

Databricks Pandas Spark

In industries like energy and retail, forecasting often requires local models when each time series has unique behavior — though training thousands of them can be overwhelming. However, training and managing thousands of such models presents scalability and operational challenges. This talk shows how we scaled local models on Databricks by leveraging the Pandas API on Spark, and shares practical lessons on storage, reuse, and scaling challenges to make this approach efficient when it’s truly needed

"Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop

2025-12-08 · PyData Boston 2025

talk

by Ian Stokes-Rees

Analytics GenAI LLM Python

In this 90 minute tutorial we'll get anyone with some basic Python and Command Line skills up and running with their own 100% laptop based set of LLMs, and explain some successful patterns for leveraging LLMs in a data analysis environment. We'll also highlight pit-falls waiting to catch you out, and encourage you that your pre-GenAI analytics skills are still relevant today and likely will be for the foreseeable future by demonstrating the limits of LLMs for data analysis tasks.

Building LLM Agents Made Simple

2025-12-08 · PyData Boston 2025 Watch

talk

by Eric Ma

GitHub LLM Python

Learn to build practical LLM agents using LlamaBot and Marimo notebooks. This hands-on tutorial teaches the most important lesson in agent development: start with workflows, not technology.

We'll build a complete back-office automation system through three agents: a receipt processor that extracts data from PDFs, an invoice writer that generates documents, and a coordinator that orchestrates both. This demonstrates the fundamental pattern for agent systems—map your boring workflows first, build focused agents for specific tasks, then compose them so agents can use other agents as tools.

By the end, you'll understand how to identify workflows worth automating, build agents with decision-making loops, compose agents into larger systems, and integrate them into your own work. You'll leave with working code and confidence to automate repetitive tasks.

Prerequisites: Intermediate Python, familiarity with APIs, basic LLM understanding. Participants should have Ollama and models installed beforehand (setup instructions provided).

Materials: GitHub repository with Marimo notebooks. Setup uses Pixi for dependency management.

AWS re:Invent 2025 - Customize Amazon Nova models for enhanced tool calling (AIM380)

2025-12-07 · AWS re:Invent 2024 Watch

video

Agile/Scrum AWS Cloud Computing Amazon SageMaker

Learn how to leverage Amazon SageMaker's training jobs with recipes to fine-tune Nova models using Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT) techniques and seamlessly deploy them in Amazon Bedrock. This session demonstrates how to fine-tune Nova models for advanced tool-calling capabilities, enabling intelligent agents that can autonomously execute workflows by interfacing with multiple AWS services, custom APIs, and internal tools. Learn how to customize, evaluate, and deploy Nova models for production-ready agentic systems that can handle multi-step business processes while maintaining reliability and control.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Integrate any agent framework with Amazon Bedrock AgentCore (AIM396)

2025-12-06 · AWS re:Invent 2024 Watch

video

Agile/Scrum AWS Cloud Computing LLM Cyber Security

Bring existing agents to AWS with Amazon Bedrock AgentCore. Whether you've built custom agents or use frameworks like LangChain, LangGraph, CrewAI, or LlamaIndex, this session shows how to run them on secure, scalable AWS infrastructure without rewriting your logic. Discover how AgentCore's API-based orchestration integrates external agent frameworks and services, centralizing tooling, observability, memory, and security. Learn how Cohere Health successfully integrated their healthcare workflow agents with AgentCore to enable secure, scalable processing of healthcare prior authorization decisions, leveraging AgentCore's services like Runtime and Memory to meet strict healthcare compliance requirements while maintaining the flexibility to use their chosen frameworks.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing LLM

Building on our advancements unveiled at the New York Summit, this session explores the evolution of Nova Agents beyond conventional web browsing, human-in-the-loop (HITL) oversight, and standard tool use. We will dive into hybrid approaches, innovative LLM-tool interactions, and API-driven strategies that boost efficiency, reliability, and autonomy in agentic AI systems. Additionally, we'll highlight how supervisors can utilize HITL to approve, refine, and assume control of agent workflows, enabling more robust and flexible AI operations.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Create hyper-personalized voice interactions with Amazon Nova Sonic (AIM374)

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing

We've all experienced it—that moment when an AI assistant creates more problems than it solves because of its robotic responses and lack of contextual awareness. It operates in a vacuum, making customer experiences less than delightful. In this session, you will learn how to create meaningful and hyper-personalized customer experiences with Amazon Nova Sonic and design patterns to enable new use cases. You will discover how Amazon Nova Sonic's tool use and function calling capabilities enable integration with external APIs and internal and external data sources to provide context that makes for delightful voice-based customer interactions.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Modernizing Your React App: Compiler, useEffectEvent, Activity & Friends

2025-12-04 · React Berlin December Meetup

talk

React

React has changed a lot in the last year: React 19, 19.1 and now 19.2 brought a stable React Compiler, new hooks like useEffectEvent, the `` API, and better SSR primitives such as Partial Pre-rendering and more. In this talk we’ll take a demo React app that’s full of effects, memoization and “old school” patterns, and modernize it step by step

AI Agents in 2025: Beyond Chatbots to Autonomous Workflows

2025-12-04 · #1 - London - Data & Agentic AI in Financial Services - Brainstation

talk

by Stephen Toriola (Compare the Market)

AI/ML RAG

This talk explores AI agents as the next step beyond prompt‑by‑prompt assistants. Modern AI agents use large language models plus planning, tool‑calling, and memory to execute multi‑step workflows, not just answer isolated questions. The session explains, in accessible terms, what makes something an “agent” rather than a simple chatbot: the ability to decompose tasks, call APIs or tools, and maintain context over time. It then surveys real use cases, from automating repetitive knowledge‑work tasks to orchestrating complex enterprise workflows that blend human decisions with autonomous actions. For the technical audience, the talk briefly outlines typical agent architectures and how they integrate with RAG, vector search, and existing backend services. For everyone else, it focuses on capabilities, limitations, and where AI agents are realistically being used in 2025. Attendees will understand what AI agents can and cannot do today, and how they differ from the hype.

AWS re:Invent 2025 - Deep dive into Amazon DocumentDB and its innovations (DAT444)

2025-12-03 · AWS re:Invent 2024 Watch

video

Agile/Scrum AWS Cloud Computing MongoDB

Amazon DocumentDB (with MongoDB compatibility) is a serverless, fully managed, MongoDB API-compatible document database service. In this session, deep dive into the most exciting new features Amazon DocumentDB offers including Serverless, columnar read replicas, performance upgrades, and DocumentDB 8.0 [TBC]. Learn how the implementation of these features in your organization can improve resilience, performance, and the effectiveness of your applications.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Libero: The Insurance Broker’s Best Ally

2025-11-26 · PyData Montreal Meetup #32 (in-person | en personne)

talk

by Catherine Paulin (iuvo-ai) , Etienne Boisvenue (iuvo-ai)

AI/ML CRM

When a team of insurance brokers receives more than 500 emails per day from clients, it quickly becomes difficult to keep things organized and make sense of it all. That’s where Libero comes in: a solution that summarizes and classifies all incoming emails. Beyond that, Libero also categorizes them properly within the client database (CRM). All of this, which brokers and their assistants used to do manually, is now fully automated, freeing up a tremendous amount of time every day. Through this presentation, we want to bring you into the heart of Libero’s design and the key decisions made during its development: its architecture, the challenges, the specific requirements of the insurance industry, the solution’s evolution, and more. And most importantly, we want to open up a discussion with you, the community, around this question: - How do we build and deploy AI solutions that can actually be maintained and evolve over time? This question is more important now than ever, with the rapid evolution of AI solutions, products and services that hit the market each week. The rhythm of innovation (and sometime fluff…) is astonishing! How do we continue building solutions that stay relevant and keep delivering business value? As a service provider, iuvo-ai is constantly balancing innovation with pragmatism. Every client has a different level of technical maturity, infrastructure, and internal talent. In that reality, the real challenge isn’t just getting a solution to work; it’s making sure it can live on. How do we design architectures that are flexible enough to evolve as the ecosystem changes, but simple enough for our clients to own and maintain? How do we make decisions that reduce friction when the next API version drops or when the internal IT team needs to take over? Those are the questions we wrestle with every day when bringing AI into production, and we’d love to exchange ideas and lessons learned with you 🙂

Gemini API Sprints – Hosts

2025-11-25 · Gemini API Sprints Paris

panel

by Jihene Mejri (BforBank) , Bastien Pouessel , Pritesh BHEEMANEE (Google Developer Groups (GDG))

LLM

Hosted by GDG Paris: Pritesh BHEEMANEE, Jihene Mejri, Bastien Pouessel.

Governing AI Apps & Agents with AI Gateway in Azure API Management

2025-11-21 · Microsoft Ignite 2025

talk

by Annaji sharma Ganti (Microsoft) , Galin Iliev (Microsoft)

AI/ML Azure

In this hands-on lab, discover how to govern AI Apps & Agents using AI Gateway in Azure API Management. Learn to apply governance best practices by onboarding AI models, monitoring and controlling token usage, enforcing safety and compliance, and boosting performance with semantic caching. You’ll also govern MCP-based agent architectures by creating secure, efficient servers from APIs or connecting to backend MCP servers, equipping you to deliver responsible, resilient AI solutions at scale.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Building the most intelligent agents with the latest knowledge sources

2025-11-21 · Microsoft Ignite 2025 Watch

breakout

by Eli Coon (Microsoft) , Hardik Modi (Microsoft) , Anjli Chaudhry (Microsoft) , Nick Taylor (Gilead Sciences)

Dive into the advancements and capabilities for powering your agents with the most comprehensive suite of knowledge sources. Learn how Copilot Studio enables you to build intelligent agents using native knowledge source integrations, connectors, APIs, and MCP servers which enable agents to authenticate, coordinate, and execute workflows.

From Zero to MCP Hero: Secure, Governed Integrations with MCP Servers

2025-11-21 · Microsoft Ignite 2025

theater

by Ragnar Heil (HanseVision)

Cyber Security

From Zero to MCP Hero shows how to build secure, governed MCP integrations in Copilot Studio that unlock real-time data and safe actions at scale. Attendees will learn best‑practice patterns for security, authentication, DLP, and role‑based access, plus deployment tips to reduce maintenance and ship faster. Walk away with a blueprint to connect any API via MCP—safely, compliantly, and with enterprise reliability.

talk-data.com

API

Activity Trend

Top Events

Top Speakers

Building Data Products

No Cloud? No Problem. Local RAG with Embedding Gemma

Data engineering with Python the right way: introducing the composable, Python-native data stack

When Rivers Speak: Analyzing Massive Water Quality Datasets using USGS API and Remote SSH in Positron

fastplotlib: driving scientific discovery through data visualization

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks

"Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop

Building LLM Agents Made Simple

AWS re:Invent 2025 - Customize Amazon Nova models for enhanced tool calling (AIM380)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Integrate any agent framework with Amazon Bedrock AgentCore (AIM396)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Create hyper-personalized voice interactions with Amazon Nova Sonic (AIM374)

AWSreInvent #AWSreInvent2025 #AWS

Modernizing Your React App: Compiler, useEffectEvent, Activity & Friends

AI Agents in 2025: Beyond Chatbots to Autonomous Workflows

AWS re:Invent 2025 - Deep dive into Amazon DocumentDB and its innovations (DAT444)

AWSreInvent #AWSreInvent2025 #AWS

Libero: The Insurance Broker’s Best Ally

Gemini API Sprints – Hosts

Governing AI Apps & Agents with AI Gateway in Azure API Management

Building the most intelligent agents with the latest knowledge sources

From Zero to MCP Hero: Secure, Governed Integrations with MCP Servers