Data Contracts in Practice

2026-02-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ryan Collingwood

Data Contracts Data Governance Data Quality JSON Python SQL data data-engineering

In 'Data Contracts in Practice', Ryan Collingwood provides a detailed guide to managing and formalizing data responsibilities within organizations. Through practical examples and real-world use cases, you'll learn how to systematically address data quality, governance, and integration challenges using data contracts. What this Book will help me do Learn to identify and formalize expectations in data interactions, improving clarity among teams. Master implementation techniques to ensure data consistency and quality across critical business processes. Understand how to effectively document and deploy data contracts to bolster data governance. Explore solutions for proactively addressing and managing data changes and requirements. Gain real-world skills through practical examples using technologies like Python, SQL, JSON, and YAML. Author(s) Ryan Collingwood is a seasoned expert with over 20 years of experience in product management, data analysis, and software development. His holistic techno-social approach, designed to address both technical and organizational challenges, brings a unique perspective to improving data processes. Ryan's writing is informed by his extensive hands-on experience and commitment to enabling robust data ecosystems. Who is it for? This book is ideal for data engineers, software developers, and business analysts working to enhance organizational data integration. Professionals with a familiarity of system design, JSON, and YAML will find it particularly beneficial. Enterprise architects and leadership roles looking to understand data contract implementation and their business impacts will also greatly benefit. Basic understanding of Python and SQL is recommended to maximize learning.

The Death of Manual Docs: A DevOps Approach to Power Platform Documentation

2025-11-12 · M365UK Extend Copilot Studio, Recall in Windows11 & Power Platform docs Nov 2025

talk

by Ian Tweedie (Capgemini)

Azure DevOps Git ci/cd data dictionaries er-diagrams markdown option-set tables power platform documentation extension security-role matrices word documents workflow summaries

Ditch the hand-cranked Word specs and kill your documentation debt for good. In this 45-minute demo you’ll see the Power Platform Documentation Extension turn every pipeline run into living, version-controlled docs—complete with ER-diagrams, data dictionaries, security-role matrices, option-set tables and workflow summaries. We’ll wire the extension into Azure DevOps, commit Markdown/Branded Word Documents artefacts back to Git. By session-end you’ll have a reusable YAML snippet that can be added to any Power Platform CI/CD flow.

Build secure scalable agents

2025-10-16 · AI Camp NYC: GenAI, LLMs and Agent

talk

Docker cagent mcp gateway

Currently developers face critical production challenges when building intelligent agent-based applications: managing agent lifecycles, handling authentication, and maintaining consistency across AI clients. During this demonstration, we will show how to use Docker's latest tools to build AI agents in containers, allowing you to build, share, and scale intelligent systems with the same reliability as containerized applications. This talk will feature Docker's cagent project which brings declarative YAML to multi-agent systems, letting developers create specialized AI agents that collaborate like microservices, and Docker's mcp-gateway to provide agents with scalable access to external APIs and data sources through a unified MCP gateway.

Future of Data Engineering in an Agentic World

2025-09-25 · Big Data LDN 2025

Face To Face

by Cyril Sonnefraud (Matillion) , Joe Herbert (Matillion)

API Data Engineering ETL/ELT Snowflake

This session will provide a Maia demo with roadmap teasers. The demo will showcase Maia's core capabilities: authoring pipelines in business language, multiplying productivity by accelerating tasks, and enabling self-service. It demonstrates how Maia takes natural language prompts and translates them into YAML-based, human-readable Data Pipeline Language (DPL), generating graphical pipelines. Expect to see Maia interacting with Snowflake metadata to sample data and suggest transformations, as well as its ability to troubleshoot and debug pipelines in real-time. The session will also cover how Maia can create custom connectors from REST API documentation in seconds, a task that traditionally takes days . Roadmap teasers will likely include the upcoming Semantic Layer, a Pipeline Reviewing Agent, and enhanced file type support for various legacy ETL tools and code conversions.

Structured Automation with Agentic AI: Lessons from Community Operations

2025-09-24 · PyData Rhein-Main I Security Risks in AI & Structured Automation with Agentic AI

talk

by Alexander C. S. Hendorf (opotoc GmbH)

AI/ML GitHub LLM Pydantic

This talk presents a technical case study of applying agentic AI systems to automate community operations at PyCon DE & PyData, treated as an open-source testbed. The key lesson is simple: AI only works when put on a leash. Reliable results required good architecture, a clear plan, and structured data models — from YAML and Pydantic schemas to reproducible pipelines with GitHub Actions. With that foundation, LLM agents supported logistics, FAQs, video processing, and scheduling; without it, they failed. By contrasting successes and failure modes across different coding agents, the talk demonstrates that robust design, validation, and controlled context are prerequisites for making agentic AI usable in real-world workflows.

Beyond theory: Practical lessons from 4 years of data platform evolution

2025-09-11 · Data Expo NL 2025

talk

by Sander Kerstens (Vanderlande)

Data Quality

After four years of building Vanderlande's data platform, we've learned that theoretical purity often collides with practical reality. This presentation shares our journey from rigid architectural principles to pragmatic solutions that truly scale. Discover how we're simplifying layer structures, standardizing with YAML, rethinking data quality implementation, and finding the right balance between data mesh theory and practical data products that deliver value.

How We Automate Chaos: Agentic AI and Community Ops at PyCon DE & PyData

2025-09-02 · PyData Berlin 2025 Watch

talk

by Alexander C. S. Hendorf (opotoc GmbH)

AI/ML LLM

Using AI agents and automation, PyCon DE & PyData volunteers have transformed chaos into streamlined conference ops. From YAML files to LLM-powered assistants, they automate speaker logistics, FAQs, video processing, and more while keeping humans focused on creativity. This case study reveals practical lessons on making AI work in real-world scenarios: structured workflows, validation, and clear context beat hype. Live demos and open-source tools included.

How AI is Finally Delivering Digital Transformation's Promise w/ Younes Hairej

2025-08-01 · Data Unchained

podcast_episode

by Younes Hairej (Aokumo Inc)

AI/ML DevOps GenAI Kubernetes MLOps

What does AI transformation really look like inside a 180-year-old company? In this episode of Data Unchained, we are joined by Younes Hairej, founder and CEO of Aokumo Inc, a trailblazing company helping enterprises in Japan and beyond bridge the gap between business intent and AI execution. From deploying autonomous AI agents that eliminate the need for dashboards and YAML, to revitalizing siloed, analog systems in manufacturing, Younes shares what it takes to modernize legacy infrastructure without starting over. Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US

ArtificialIntelligence #EnterpriseAI #AITransformation #Kubernetes #DevOps #GenAI #DigitalTransformation #OpenSourceAI #DataInfrastructure #BusinessInnovation #AIInJapan #LegacyModernization #MetadataStrategy #AIOrchestration #CloudNative #AIAutomation #DataGovernance #MLOps #IntelligentAgents #TechLeadership

Hosted on Acast. See acast.com/privacy for more information.

#86 What’s Next for Kubernetes? KubeCon 2025 Recap with Nick Schouten

2025-07-17 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

by Nick Schouten (dataroots)

AI/ML Git Kubernetes LLM MLOps

Send us a text Welcome to the cozy corner of the tech world! Datatopics is your go-to spot for relaxed discussions around tech, news, data, and society. In this episode of Data Topics, we sit down with Nick Schouten — data engineer at dataroots — for a full recap of KubeCon Europe 2025 and a deep dive into the current and future state of Kubernetes. We talk through what’s actually happening in the Kubernetes ecosystem — from platform engineering trends to AI infra challenges — and why some teams are doubling down while others are stepping away. Here’s what we cover: What Kubernetes actually is, and how to explain it beyond the buzzwordWhen Kubernetes is the right choice (e.g., hybrid environments, GPU-heavy workloads) — and when it’s overkillHow teams are trying to host LLMs and AI models on Kubernetes, and the blockers they’re hitting (GPUs, complexity, cost)GitOps innovations spotted at KubeCon — like tools that convert UI clicks into Git commits for infrastructure-as-codeWhy observability is still one of Kubernetes’ biggest weaknesses, and how a wave of new startups are trying to solve itThe push to improve developer experience for ML and data teams (no more YAML overload)The debate around abstraction vs control — and how some teams are turning away from Kubernetes entirely in favor of simpler toolsWhat “vibe coding” means in an LLM-driven world, and how voice-to-code workflows are changing how we write infrastructureWhether the future of Kubernetes is more “visible and accessible,” or further under the hoodIf you're a data engineer, MLOps practitioner, platform lead, or simply trying to stay ahead of the curve in infrastructure and AI — this episode is packed with relevant insights from someone who's hands-on with both the tools and the teaching.

Building multi-tenant Airflow at scale: From Chaos to Control

2025-07-03 · Amsterdam Apache Airflow® Meetup at Booking.com!

talk

by Omid Karami (Booking.com)

Airflow

Running Airflow at scale for thousands of workflows across multiple teams introduces challenges around standardization, governance, and isolation. At Booking, we've built a multi-tenant Airflow platform that serves over 4,000 workflows using a custom DSL defined in workflow.yaml files. In this talk, I'll show how we use automated DAG generation to bring structure to complexity, how we achieved horizontal scalability by decoupling orchestration from execution, and how reusable step templates help us enforce governance--without sacrificing workflow isolation. You'll leave with a blueprint for taming Airflow at scale.

Dynamic DAGs and Data Quality using DAGFactory

2025-07-01 · Airflow Summit 2025

session

by Gangfeng Huang , Ashir Alam

Airflow BI Cloud Computing Data Quality GCP GitHub Cloud Composer Python

We have a similar pattern of DAGs running for different data quality dimensions like accuracy, timeliness, & completeness. To do this again and again, we would be duplicating and potentially introducing human error while doing copy paste of code or making people write same code again. To solve for this, we are doing few things: Run DAGs via DagFactory to dynamically generate DAGs using just some YAML code for all the steps we want to run in our DQ checks. Hide this behind a UI which is hooked to github PR open step, now the user just provides some inputs or selects from dropdown in UI and a YAML DAG is generated for them. This highlights the potential for DAGFactory to hide Airflow Python code from users and make it more accessible to Data Analysts and Business Intelligence along with normal Software Engg, along with reducing human error. YAML is the perfect format to be able to generate code, create a PR and DagFactory is the perfect fir for that. All of this is running in GCP Cloud Composer.

From Complexity to Simplicity with TaskHarbor: Trendyol's Path to a Unified Orchestration Platform

2025-07-01 · Airflow Summit 2025

session

by Salih Goktug Kose , Burak Özdemir

Airflow Cloud Computing GCP Kubernetes Terraform

At Trendyol, Turkey’s leading e-commerce company, Apache Airflow powers our task orchestration, handling DAGs with 500+ tasks, complex interdependencies, and diverse environments. Managing on-prem Airflow instances posed challenges in scalability, maintenance, and deployment. To address these, we built TaskHarbor, a fully managed orchestration platform with a hybrid architecture—combining Airflow on GKE with on-prem resources for optimal performance and efficiency. This talk covers how we: Enabled seamless DAG synchronization across environments using GCS Fuse. Optimized workload distribution via GCP’s HTTPS & TCP Load Balancers. Automated infrastructure provisioning (GKE, CloudSQL, Kubernetes) using Terraform. Simplified Airflow deployments by replacing Helm YAML files with a custom templating tool, reducing configurations to 10-15 lines. Built a fully automated deployment pipeline, ensuring zero developer intervention. We enhanced efficiency, reliability, and automation in hybrid orchestration by embracing a scalable, maintainable, and cloud-native strategy. Attendees will obtain practical insights into architecting Airflow at scale and optimizing deployments.

Semiconductor (Chip) Design Workflow Orchestration with Airflow

2025-07-01 · Airflow Summit 2025

session

by Nicholas Redd , Dheeraj Turaga

Airflow CI/CD

The design of Qualcomm’s Snapdragon System-On-Chip (SoCs) involves several hundred complex workflows orchestrated across multiple data centers, taking the design from RTL to GDS. In the Snapdragon Oryon Custom CPU team, we introduced Airflow about 2 years ago to orchestrate design, verification, emulation, CI/CD, and physical implementation of our CPUs. Use Case: • Standardization and Templatization: We standardize and templatize common workflows, allowing designers to verify their designs by customizing YAML parameters. • Custom Shell Operators: We created custom shell operators (tcshrc) to source project environments and work with internal tooling. • Smart Retries: We use pre/post-execute hooks to trigger smart retries on failure. • Dynamic Celery Workers: We auto-create Celery workers on the fly on our High-Performance Compute (HPC) clusters to launch and manage Electronic Design Automation (EDA) workloads. • Hybrid Executor Strategy: We use a hybrid executor strategy (CeleryExecutor and EdgeExecutor) to orchestrate tasks across multiple data centers. • EdgeExecutor for Remote Testing: We leverage EdgeExecutor to access post-silicon hardware in remote locations.

Pulumi Components

2025-05-14 · Designing Reusable Infrastructure as Code

talk

by Rob Smith (East of England Co-Op) , Josh Kodroff (Pulumi)

Pulumi infrastructure as code

In this session, we’ll introduce the concept of Pulumi Components - packages that can be authored in one language and consumed in any other language. This enables platform engineering teams to create powerful patterns for reuse across their organization, such as sharing infrastructure libraries written in common programming languages that can easily be instantiated from a simple YAML file.

Kubernetes + Pulumi: Pulumi Kubernetes provider, Docker provider, and GitOps with Pulumi Operator

2024-11-18 · Workshop: Pulumi and Kubernetes - Better Together

workshop

by Josh Kodroff (Pulumi)

Kubernetes Pulumi gitops helm pulumi docker provider pulumi kubernetes provider

Hands-on workshop on using Pulumi to deploy and manage Kubernetes applications, including the Pulumi Kubernetes provider, Pulumi Docker provider, integration with YAML manifests and Helm charts, and running Pulumi IaC programs in a GitOps fashion.

Big Data on Kubernetes

2024-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Neylson Crepalde

Airflow BI Big Data Docker Kafka Kubernetes Python Spark SQL data data-engineering streaming-messaging

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.

How we use Airflow at Booking to Orchestrate Big Data Workflows

2024-07-01 · Airflow Summit 2024

session

by Mayank Chopra , Madhav Khakhar (Booking.com) , Alexander Shmidt

Airflow Astronomer Big Data Kubernetes

The talk will cover how we use Airflow at the heart of our Workflow Management Platform(WFM) at Booking.com, enabling our internal users to orchestrate big data workflows on Booking Data Exchange(BDX). High level overview of the talk: Adapting open source Airflow helm chart to spin up Airflow installation in Booking Kubernetes Service (BKS) Coming up with Workflow definition format (yaml) Conversion of workflow.yaml to workflow.py DAGs Usage of Deferrable operators to provide standard step templates to users Workspaces (collection of workflows), using it to ensure role based access to DAG permissions for users Using okta for authentication Alerting, monitoring, logging Plans to shift to Astronomer

Streamlining DAG Creation with YAML in Large Organizations

2024-07-01 · Airflow Summit 2024

session

by Howie Wang

Airflow Python

As organizations grow, the task of creating and managing Airflow DAGs efficiently becomes a challenge. In this talk, we will delve into innovative approaches to streamlining Airflow DAG creation using YAML. By leveraging YAML configuration, we allow users to dynamically generate Airflow DAGs without requiring Python expertise or deep knowledge of Airflow primitives. We will showcase the significant benefits of this approach, including eliminating duplicate configurations, simplifying DAG management for a large group of workflows, and ultimately enhancing productivity within large organizations. Join us to learn practical strategies to optimize workflow orchestration, reduce development overhead, and facilitate seamless collaboration across teams.

YAML all the way down: State machines and CI/CD for Google Apigee

2024-04-09 · Google Cloud Next '24

session

by Hugh Greenish (Marsh McLennan)

API CI/CD Cloud Computing GCP Git Terraform

Marsh McLennan runs a complex Apigee Hybrid configuration, with 36 organizations operating in six global data centers. Keeping all of this in sync across production and nonproduction environments is a challenge. While the infrastructure itself is deployed with Terraform, Marsh McLennan wanted to apply the same declarative approach to the entire environment. See how it used Apigee's management APIs to build a state machine to keep the whole system running smoothly, allowing APIs to flow seamlessly from source control through to production.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

ChatGPT and YAML: a Router configuration use case

2023-11-08 · AI and Deep Learning for Enterprise #11

talk

LLM

talk-data.com

YAML

Activity Trend

Top Events

Top Speakers

Data Contracts in Practice

The Death of Manual Docs: A DevOps Approach to Power Platform Documentation

Build secure scalable agents

Future of Data Engineering in an Agentic World

Structured Automation with Agentic AI: Lessons from Community Operations

Beyond theory: Practical lessons from 4 years of data platform evolution

How We Automate Chaos: Agentic AI and Community Ops at PyCon DE & PyData

How AI is Finally Delivering Digital Transformation's Promise w/ Younes Hairej

#86 What’s Next for Kubernetes? KubeCon 2025 Recap with Nick Schouten

Building multi-tenant Airflow at scale: From Chaos to Control

Dynamic DAGs and Data Quality using DAGFactory

From Complexity to Simplicity with TaskHarbor: Trendyol's Path to a Unified Orchestration Platform

Semiconductor (Chip) Design Workflow Orchestration with Airflow

Pulumi Components

Kubernetes + Pulumi: Pulumi Kubernetes provider, Docker provider, and GitOps with Pulumi Operator

Big Data on Kubernetes

How we use Airflow at Booking to Orchestrate Big Data Workflows

Streamlining DAG Creation with YAML in Large Organizations

YAML all the way down: State machines and CI/CD for Google Apigee

ChatGPT and YAML: a Router configuration use case