Building Data Products

2026-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jean-Georges Perrin (Actian)

AI/ML API Data Contracts DevOps Cyber Security data data-engineering

As organizations grapple with fragmented data, siloed teams, and inconsistent pipelines, data products have emerged as a practical solution for delivering trusted, scalable, and reusable data assets. In Building Data Products, Jean-Georges Perrin provides a comprehensive, standards-driven playbook for designing, implementing, and scaling data products that fuel innovation and cross-functional collaboration—whether or not your organization adopts a full data mesh strategy. Drawing on extensive industry experience and practitioner interviews, Perrin shows readers how to build metadata-rich, governed data products aligned to business domains. Covering foundational concepts, real-world use cases, and emerging standards like Bitol ODPS and ODCS, this guide offers step-by-step implementation advice and practical code examples for key stages—ownership, observability, active metadata, compliance, and integration. Design data products for modular reuse, discoverability, and trust Implement standards-driven architectures with rich metadata and security Incorporate AI-driven automation, SBOMs, and data contracts Scale product-driven data strategies across teams and platforms Integrate data products into APIs, CI/CD pipelines, and DevOps practices

Data Engineering with Azure Databricks

2026-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

AI/ML Airflow Analytics Azure ADF Azure DevOps Cloud Computing Data Engineering Data Governance Data Lakehouse Databricks Delta +11 more

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

Hands-On Software Engineering with Python - Second Edition

2025-12-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brian Allbee

Agile/Scrum Cloud Computing Data Modelling Docker GitHub Pydantic Python programming-languages software-development

Grow your software engineering discipline, incorporating and mastering design, development, testing, and deployment best practices examples in a realistic Python project structure. Key Features Understand what makes Software Engineering a discipline, distinct from basic programming Gain practical insight into updating, refactoring, and scaling an existing Python system Implement robust testing, CI/CD pipelines, and cloud-ready architecture decisions Book Description Software engineering is more than coding; it’s the strategic design and continuous improvement of systems that serve real-world needs. This newly updated second edition of Hands-On Software Engineering with Python expands on its foundational approach to help you grow into a senior or staff-level engineering role. Fully revised for today’s Python ecosystem, this edition includes updated tooling, practices, and architectural patterns. You’ll explore key changes across five minor Python versions, examine new features like dataclasses and type hinting, and evaluate modern tools such as Poetry, pytest, and GitHub Actions. A new chapter introduces high-performance computing in Python, and the entire development process is enhanced with cloud-readiness in mind. You’ll follow a complete redesign and refactor of a multi-tier system from the first edition, gaining insight into how software evolves—and what it takes to do that responsibly. From system modeling and SDLC phases to data persistence, testing, and CI/CD automation, each chapter builds your engineering mindset while updating your hands-on skills. By the end of this book, you'll have mastered modern Python software engineering practices and be equipped to revise and future-proof complex systems with confidence. What you will learn Distinguish software engineering from general programming Break down and apply each phase of the SDLC to Python systems Create system models to plan architecture before writing code Apply Agile, Scrum, and other modern development methodologies Use dataclasses, pydantic, and schemas for robust data modeling Set up CI/CD pipelines with GitHub Actions and cloud build tools Write and structure unit, integration, and end-to-end tests Evaluate and integrate tools like Poetry, pytest, and Docker Who this book is for This book is for Python developers with a basic grasp of software development who want to grow into senior or staff-level engineering roles. It’s ideal for professionals looking to deepen their understanding of software architecture, system modeling, testing strategies, and cloud-aware development. Familiarity with core Python programming is required, as the book focuses on applying engineering principles to maintain, extend, and modernize real-world systems.

AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing

Amazon Bedrock AgentCore Evaluations provides developers with a unified way to test and validate AI agent performance. In this session, you’ll learn how to apply pre-built metrics for key dimensions such as task success, response quality, and tool accuracy, or define custom success criteria tailored to your needs. See how Evaluations integrates into CI/CD pipelines to catch regressions early and supports online evaluation in production by sampling and scoring live traces to surface real-world issues. Finally, learn how Evaluations helps teams deploy reliable agents faster, reduce operational risk, and continuously assess an agent’s performance at scale through practical implementation patterns.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

The agentic enterprise: Transforming SDLC and app testing with UiPath

2025-11-20 · Microsoft Ignite 2025

theater

by Michael Leroux (UiPath) , Shane Elliott (uipath.com)

AI/ML Azure Azure DevOps DevOps GitHub Microsoft

As development velocity increases, testing and operations teams must innovate or fall behind. AI-powered agents are reshaping how software is designed, tested, and deployed. Discover how UiPath and Microsoft are enabling organizations to integrate autonomous AI agents into Azure DevOps and GitHub to deliver faster, smarter, and more resilient applications. This session explores how agentic automation drives adaptive SDLC, continuous delivery, and measurable efficiency in application testing.

How to secure your CI/CD process against attackers with GitHub

2025-11-19 · Microsoft Ignite 2025

theater

by Josh Johanning (GitHub)

GitHub Cyber Security

Modern CI/CD pipelines and software supply chains are critical to delivering quickly, but they are also prime targets for attackers. Many organizations are unaware of the many ways that their approaches may be exposing them to risk. We’ll walk through three common ways attackers can compromise your CI/CD processes and how to spot and fix these risks using GitHub Advanced Security. Get practical guidance to strengthen your workflows, secure your supply chain, and stay ahead of evolving threats.

Building and deploying data agents in Microsoft Fabric

2025-11-19 · Microsoft Ignite 2025

theater

by Amir Jafari (Microsoft) , Shreyas Canchi Radhakrishna (Microsoft)

AI/ML Git Microsoft Fabric

This comprehensive session covers the complete lifecycle of intelligent data agents in Microsoft Fabric, from initial configuration to enterprise-wide deployment. Learn proven best practices for building context-aware agents with curated data sources and effective instructions. Discover how to implement robust CI/CD pipelines powered by Git for managing configurations and deployments. The session also demonstrates consumption patterns of data agents across the AI ecosystem.

Hybrid workload compliance from policy to practice on Azure

2025-11-18 · Microsoft Ignite 2025

theater

by Pal Lakatos-Toth (Microsoft)

Azure Cloud Computing GitHub Linux Cyber Security

Security standards often live in policy documents but aren’t consistently enforced. Learn how Azure Policy and Machine Configuration let you deploy built-in CIS benchmark templates for Linux and Windows, customize them to your needs, and apply them across Azure and hybrid / multi-cloud servers (via Azure Arc). We’ll also show how to integrate these policies into CI/CD pipelines (e.g., GitHub Actions) for continuous compliance checks, turning written guidelines into real-world security posture.

CI/CD for Fabric: Accelerating Lakehouse to production in 25 minutes

2025-11-18 · Microsoft Ignite 2025

theater

by Lee Benjamin (Microsoft) , Daniel Coelho (Microsoft)

Data Lakehouse Git Microsoft Fabric

Bring software-engineering discipline to your data. Learn how Microsoft Fabric integrates Git and deployment pipelines to take a branch from validation to production—fast and safe. Learn how to use parameterized deployments for tables and views, run automated checks, and avoid common pitfalls like broken shortcuts. Walk away with practical patterns to operationalize your Lakehouse with confidence and speed.

AI-Driven Software Testing : Transforming Software Testing with Artificial Intelligence and Machine Learning

2025-10-21 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Srinivasa Rao Bittla

AI/ML Analytics Cloud Computing ai-ml artificial-intelligence-ai artificial intelligence (ai) data

AI-Driven Software Testing explores how Artificial Intelligence (AI) and Machine Learning (ML) are revolutionizing quality engineering (QE), making testing more intelligent, efficient, and adaptive. The book begins by examining the critical role of QE in modern software development and the paradigm shift introduced by AI/ML. It traces the evolution of software testing, from manual approaches to AI-powered automation, highlighting key innovations that enhance accuracy, speed, and scalability. Readers will gain a deep understanding of quality engineering in the age of AI, comparing traditional and AI-driven testing methodologies to uncover their advantages and challenges. Moving into practical applications, the book delves into AI-enhanced test planning, execution, and defect management. It explores AI-driven test case development, intelligent test environments, and real-time monitoring techniques that streamline the testing lifecycle. Additionally, it covers AI’s impact on continuous integration and delivery (CI/CD), predictive analytics for failure prevention, and strategies for scaling AI-driven testing across cloud platforms. Finally, it looks ahead to the future of AI in software testing, discussing emerging trends, ethical considerations, and the evolving role of QE professionals in an AI-first world. With real-world case studies and actionable insights, AI-Driven Software Testing is an essential guide for QE engineers, developers, and tech leaders looking to harness AI for smarter, faster, and more reliable software testing. What you will learn: • What are the key principles of AI/ML-driven quality engineering • What is intelligent test case generation and adaptive test automation • Explore predictive analytics for defect prevention and risk assessment • Understand integration of AI/ML tools in CI/CD pipelines Who this book is for: Quality Engineers looking to enhance software testing with AI-driven techniques. Data Scientists exploring AI applications in software quality assurance and engineering. Software Developers – Engineers seeking to integrate AI/ML into testing and automation workflows.

Towards a more perfect pipeline: CI/CD in the dbt Platform

2025-10-15 · dbt Coalesce 2025 Watch

talk

by Aaiden Witten (United Services Automobile Association) , Michael Sturm (United Services Automobile Association) , Timothy Shiveley (United Services Automobile Association)

Data Quality dbt

In this session we’ll show how we integrated CI/CD dbt jobs to validate data and run tests on every merge request. Attendees will walk away with a blueprint for implementing CI/CD for dbt, lessons learned from our journey and best practices to keep data quality high without slowing down development.

Training: Building a Data Quality Framework with dbt

2025-10-15 · dbt Coalesce 2025

training

by Christine Berger (Fishtown Analytics) , Stephen Thibeault (dbt Labs)

Data Quality dbt

Delve into the core concepts and applications of data quality with dbt. With a focus on practical implementation, you'll learn to deploy custom data tests, unit testing, and linting to ensure the reliability and accuracy of your data operations. After this course, you will be able to: Recognize scenarios that call for testing data quality Implement efficient data testing methods to ensure reliability (data tests, unit tests) Navigate other quality checks in dbt (linting, CI, compare) Prerequisites for this course include: dbt Fundamentals What to bring: You will need to bring your own laptop to complete the hands-on exercises. We will provide all the other sandbox environments for dbt and data platform. Duration: 2 hours Fee: $200 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes

Data Generation for Automated Tests

2025-10-15 · Testing Oracle Generation Using AI

talk

AI/ML

Learn programmatic data generation using factory patterns, database seeding, and AI techniques that integrate with CI/CD pipelines while maintaining data integrity.

Data Generation for Automated Tests

2025-10-15 · Testing Oracle Generation Using AI

talk

AI/ML

Learn programmatic data generation using factory patterns, database seeding, and AI techniques that integrate with CI/CD pipelines while maintaining data integrity.

Zero-footprint SQL testing: From framework to culture shift

2025-10-14 · dbt Coalesce 2025 Watch

talk

by Kushal Thakkar (Anthropic) , Gurmeet Saran (Anthropic)

SQL

We built a zero-footprint SQL testing framework using mock data and the full power of the pytest ecosystem to catch syntactic and semantic issues before they reach production. More than just a tool, it helped shift our team’s mindset by integrating into CI/CD, encouraging contract-driven development, and promoting testable SQL. In this session, we’ll share our journey, key lessons learned, and how we open-sourced the framework to make it available for everyone.

Training: Building a Data Quality Framework with dbt

2025-10-14 · dbt Coalesce 2025

training

by Christine Berger (Fishtown Analytics) , Stephen Thibeault (dbt Labs)

Data Quality dbt

Delve into the core concepts and applications of data quality with dbt. With a focus on practical implementation, you'll learn to deploy custom data tests, unit testing, and linting to ensure the reliability and accuracy of your data operations. After this course, you will be able to: Recognize scenarios that call for testing data quality Implement efficient data testing methods to ensure reliability (data tests, unit tests) Navigate other quality checks in dbt (linting, CI, compare) Prerequisites for this course include: dbt Fundamentals What to bring: You will need to bring your own laptop to complete the hands-on exercises. We will provide all the other sandbox environments for dbt and data platform. Duration: 2 hours Fee: $200 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes

From silos to AI-ready: How Okta built governed analytics at scale

2025-10-14 · dbt Coalesce 2025 Watch

talk

by Pooja Crahen (Okta)

AI/ML Analytics dbt Snowflake

Okta is rebuilding its data platform, moving from disconnected pipelines to a scalable, well-governed foundation with Snowflake and dbt. In this session, you’ll learn how we improved CI/CD, enabled self-service, and are now testing a prototype to make dbt models ready for AI.

Streamlit at Scale: From Code Duplication to a Unified Framework

2025-10-14 · Snowflake World Tour - Stockholm

session

What will be covered:

•The Challenge: The significant inefficiencies and risks associated with our old, fragmented approach to data application development. •The Solution: The strategic move to a unified framework, which provides a single blueprint for all applications and eliminates the chaos of code duplication. •The Enabler: How a robust CI/CD workflow automates the entire deployment process, ensuring consistency and reliability. •The Outcome: Outline the key advantages of this new approach, emphasizing accelerated development, enhanced code quality, and the remarkable efficiency in scaling our applications.

Belron's mesh-ready control plane powered by the dbt Platform

2025-10-09 · Snowflake World Tour London

session

Analytics dbt Snowflake

Operating across multiple countries with many business units, Belron built a repeatable framework on Snowflake using the dbt Platform to prevent scaling chaos. This solution codifies the analytics development lifecycle (ADLC) through modular code, environments, CI/CD, and automated testing. This session will share reusable patterns for new rollouts and provide a practical decision framework for choosing between dbt Projects on Snowflake and the dbt Platform for enterprise control, including governance, cross-project mesh, and observability. You'll also learn about the adoption metrics Belron uses to measure success. You'll learn: reusable ADLC guardrails, a dbt product decision tree, and how standardization enables a sustainable mesh.

Making your Data AI ready with DataOps

2025-10-09 · Snowflake World Tour London

session

AI/ML Data Engineering DataOps Snowflake

AI is only as good as the data it runs on. Yet Gartner predicts in 2026, over 60% of AI projects will fail to deliver value - because the underlying data isn’t truly AI-ready. MIT is even more concerned! “Good enough” data simply isn’t enough.

At this World Tour launch event, DataOps.live reveal Momentum, the next generation of its DataOps automation platform designed to operationalize trusted AI at enterprise scale on Snowflake. Based on experiences from building over 9,000 Data Products to date, Momentum introduces breakthrough capabilities including AI-Ready Data Scoring to ensure data is fit for AI use cases, Data Product Lineage for end-to-end visibility, and a Data Engineering Agent that accelerates building reusable data products. Combined with automated CI/CD, continuous observability, and governance enforcement, Momentum closes the AI-readiness gap by embedding collaboration, metadata, and automation across the entire data lifecycle. Backed by Snowflake Ventures and trusted by leading enterprises, including AstraZeneca, Disney and AT&T, DataOps.live is the proven catalyst for scaling AI-ready data. In this session, you’ll unpack what AI-ready data really means, learn essential practices, discover a faster, easier, and more impactful way to make your AI initiatives succeed. Be the first to see Momentum in action - the future of AI-ready data.

talk-data.com

CI/CD

Activity Trend

Top Events

Top Speakers

Building Data Products

Data Engineering with Azure Databricks

Hands-On Software Engineering with Python - Second Edition

AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)

AWSreInvent #AWSreInvent2025 #AWS

The agentic enterprise: Transforming SDLC and app testing with UiPath

How to secure your CI/CD process against attackers with GitHub

Building and deploying data agents in Microsoft Fabric

Hybrid workload compliance from policy to practice on Azure

CI/CD for Fabric: Accelerating Lakehouse to production in 25 minutes

AI-Driven Software Testing : Transforming Software Testing with Artificial Intelligence and Machine Learning

Towards a more perfect pipeline: CI/CD in the dbt Platform

Training: Building a Data Quality Framework with dbt

Data Generation for Automated Tests

Data Generation for Automated Tests

Zero-footprint SQL testing: From framework to culture shift

Training: Building a Data Quality Framework with dbt

From silos to AI-ready: How Okta built governed analytics at scale

Streamlit at Scale: From Code Duplication to a Unified Framework

Belron's mesh-ready control plane powered by the dbt Platform

Making your Data AI ready with DataOps