talk-data.com talk-data.com

Topic

CI/CD

Continuous Integration/Continuous Delivery (CI/CD)

devops automation software_development ci_cd

262

tagged

Activity Trend

21 peak/qtr
2020-Q1 2026-Q1

Activities

262 activities · Newest first

Quand on parle d’IA, on pense souvent à des cas d’usage précis : comment utiliser l’intelligence artificielle comme une extension du système d’information pour répondre à un besoin particulier.

Mais la vraie révolution n’est-elle pas ailleurs ? Positionner l’IA au cœur du système d’information transforme en profondeur la relation que l’on entretient avec celui-ci. Elle fait évoluer le SI d’un simple outil fonctionnel vers un environnement capable d’anticiper, de recommander et de simplifier l’ensemble des processus métier.

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

AI is only as good as the data it runs on. Yet Gartner predicts in 2026, over 60% of AI projects will fail to deliver value - because the underlying data isn’t truly AI-ready. “Good enough” data isn’t enough.

In this exclusive BDL launch session, DataOps.live reveal Momentum, the next generation of its DataOps automation platform designed to operationalize trusted AI at enterprise scale.

Based on experiences from building over 9000 Data Products to date, Momentum introduces breakthrough capabilities including AI-Ready Data Scoring to ensure data is fit for AI use cases, Data Product Lineage for end-to-end visibility, and a Data Engineering Agent that accelerates building reusable data products. Combined with automated CI/CD, continuous observability, and governance enforcement, Momentum closes the AI-readiness gap by embedding collaboration, metadata, and automation across the entire data lifecycle.

Backed by Snowflake Ventures and trusted by leading enterprises, including AstraZeneca, Disney and AT&T, DataOps.live is the proven catalyst for scaling AI-ready data. In this session, you’ll unpack what AI-ready data really means, learn essential practices, discover a faster, easier, and more impactful way to make your AI initiatives succeed.

Be the first to see Momentum in action - the future of AI-ready data.

As NiFi scales, so do the NiFi Data flow deployment headaches. CI/CD helps, but incomplete automation still leaves teams tied to the NiFi UI for adjusting parameters, updating controller services, and managing variables/parameter contexts by hand. This slows releases, increases operational risk, and strains engineering time.

This talk explores a game-changing, centralized platform, Data Flow Manager (DFM), that brings true end-to-end automation to NiFi data flow deployments. Configure, validate, and deploy Nifi data flows across dev, staging, and production environments without ever logging into the NiFi UI. Everything is handled in one place, with full integration into your existing CI/CD pipelines.

We’ll cover a few out-of-the-box features of Data Flow Manager – scheduled NiFi data flow deployments, centralized Access control management, data flow Sanity checks, Audit logging, and Monitoring NiFi-specific metrics, creating predictable, scalable, and error-free NiFi data flow deployments across environments. The goal is simple: reduce operational overhead, eliminate manual errors, and bring predictability to NiFi data pipelines at scale.

Extreme weather events threaten industries and economic stability. NOAA’s National Centers for Environmental Information (NCEI) addresses this through the Industry Proving Grounds (IPG), which modernizes data delivery by collaborating with sectors like re/insurance and retail to develop practical, data-driven solutions. This presentation explores IPG’s technical innovations, including implementing Polars for efficient data processing, AWS for scalability, and CI/CD pipelines for streamlined deployment. These tools enhance data accessibility, reduce latency, and support real-time decision-making. By integrating scientific computing, cloud technology, and DevOps, NCEI improves climate resilience and provides a model for leveraging open-source tools to address global challenges.

The SciPy Proceedings (https://proceedings.scipy.org) have long served as a cornerstone for publishing research in the scientific python community; with over 330 peer-reviewed articles being published over the last 17 years. In 2024, the SciPy Proceedings underwent a significant transformation, adopting MyST Markdown (https://mystmd.org) and Curvenote (https://curvenote.com) to enhance accessibility, interactivity, and reproducibility — including publishing of Jupyter Notebooks. The new proceedings articles are web-first, providing features such as deep-dive links for cross-references and previews of GItHub content, interactive 3D visualizations, and rich-rendering of Jupyter Notebooks. In this talk, we will (1) present the new authoring & reading capabilities introduced in 2024; (2) highlight connections to prominent open-science initiatives and their impact on advancing computational research publishing; and (3) demonstrate the underlying technologies and how they enhance integrations with SciPy packages and how to use these tools in your own communication workflows.

Our presentation will give an overview of the revised authoring process for SciPy Proceedings; how we improve metadata standards in a similar way to code-linting and continuous integration; and the integration of live previews of the articles, including auto-generated PDFs and JATS XML (a standard used in scientific publishing). The peer-review process for the proceedings currently happens using GitHub’s peer-review commenting in a similar fashion to the Journal of Open Source Software; we will demonstrate this process as well as showcase opportunities for working with distributed review services such as PREreview (https://prereview.org). The open publishing pipeline has streamlined the submission, review, and revision processes while maintaining high scientific quality and improving the completeness of scholarly metadata. Finally, we will present how this work connects into other high-profile scientific publishing initiatives that have incorporated Jupyter Notebooks and live computational figures as well as interactive displays of large-scale data. These initiatives include Notebooks Now! by the American Geophysical Union, which is focusing on ensuring that Jupyter Notebooks can be properly integrated into the scholarly record; and the Microscopy Society of America’s work on interactive publishing and publishing of large-scale microscopy data with interactive visualizations. These initiatives and the SciPy Proceedings are enabled by recent improvements in open-source tools including MyST Markdown, JupyterLab, BinderHub, and Curvenote, which enable new ways to share executable research content. These initiatives collectively aim to improve both the reproducibility, interactivity, and the accessibility of research by providing improved connections between data, software and narrative research articles.

By embracing open science principles and modern technologies, the SciPy Proceedings exemplify how computational research can be more transparent, reproducible, and accessible. The shift to computational publishing, especially in the context of the scientific python community, opens new opportunities for researchers to publish not only their final results but also the computational workflows, datasets, and interactive visualizations that underpin them. This transformation aligns with broader efforts in open science infrastructure, such as integrating persistent identifiers (DOIs, ORCID, ROR), and adopting FAIR (Findable, Accessible, Interoperable, Reusable) principles for computational content. Building on these foundations, as well as open tools like MyST Markdown and Curvenote, provides a scalable model for open scientific publishing that bridges the gap between computational research and scholarly communication, fostering a more collaborative, iterative, and continuous approach to scientific knowledge dissemination.

This BoF aims to host discussion about best practices for maintaining executable tutorials that are reproducible and reliable. The BoF is intended to be a platform to collect tips and tricks of CI/CD practices, too. The moderators recently put together a repository that builds on their experiences of maintaining numerous tutorial repositories https://scientific-python.github.io/executable-tutorials/ that covers some of the use cases but we are well aware that there are still user scenarios and use cases that are not well covered.

The BoF is a complement for both the Teaching&Learning and Maintainers track, none of the talks in those tracks seem to focus on the technical challenges around tutorials.

Reproducibility is a major underpinning of the scientific method. In scientific computing, this also includes the ability to reproduce your dependencies. Yet, in 2025 this still remains a challenging topic.

Pixi is a modern package manager built on the Conda ecosystem. It integrates very well with all existing packages on conda-forge. Pixi makes package management reproducible, fast and painless – so that scientists can go back to coding instead of dealing with “dependency hell”. Pixi improves the mix Conda and PyPI package management by integrating with uv by astral.sh and streamlines automation with a cross-platform task runner. These features combined with a powerful lockfile make creating reproducible projects trivial.

This talk is for people who are interested in new, fast ways to set up their software (dev) environments on different systems – think your coworker's computer, CI, containers, and more.

DAGnostics seamlessly integrates Airflow Cluster Policy hooks to enforce governance from local DAG authoring through CI pipelines to production runtime. Learn how it closes validation gaps, collapses feedback loops from hours to seconds, and ensures consistent policies across stages. We examine current runtime-only enforcement and fractured CI checks, then unveil our architecture: a pluggable policy registry via Airflow entry points, local static analysis for pre-commit validation, GitHub Actions CI integration, and runtime hook enforcement. See real-world use cases: alerting standards, resource quotas, naming conventions, and exemption handling. Next, dive into implementation: authoring policies in Python, auto-discovery, cross-environment enforcement, upstream contribution, and testing strategies. We share LinkedIn’s metrics—2,000+ DAG repos, 10,000+ daily executions supporting trunk-based development across isolated teams/use-cases, and 78% fewer runtime violations—and lessons learned scaling policy-as-code at enterprise scale. Leave with a blueprint to adopt DAGnostics and strengthen your Airflow governance while preserving full compatibility with existing systems.

This session showcases Okta’s innovative approach to data pipeline orchestration with dbt and Airflow. How we’ve implemented dynamically generated airflow dags workflows based on dbt’s dependency graph. This allows us to enforce strict data quality standards by automatically executing downstream model tests before upstream model deployments, effectively preventing error cascades. The entire CI/CD pipeline, from dbt model changes to production DAG deployment, is fully automated. The result? Accelerated development cycles, reduced operational overhead, and bulletproof data reliability

At the enterprise level, managing Airflow deployments across multiple teams can become complex, leading to bottlenecks and slowed development cycles. We will share our journey of decentralizing Airflow repositories to empower data engineering teams with multi-tenancy, clean folder structures, and streamlined DevOps processes. We dive into how restructuring our Airflow architecture and utilizing repository templates allowed teams to generate new data pipelines effortlessly. This approach enables engineers to focus on business logic without worrying about underlying Airflow configurations. By automating deployments and reducing manual errors through CI/CD pipelines, we minimized operational overhead. However, this transformation wasn’t without challenges. We’ll discuss obstacles we faced, such as maintaining code consistency, variables, and utility functions across decentralized repositories; ensuring compliance in a multi-tenant environment; and managing the learning curve associated with new workflows. Join us to discover practical insights on how decentralizing Airflow repositories can boost team productivity and adapt to evolving business needs with minimal effort.

Vinted is the biggest second-hand marketplace in Europe with multiple business verticals. Our data ecosystem has over 20 decentralized teams responsible for generating, transforming, and building Data Products from petabytes of data. This creates a daring environment where inter-team dependencies, varied expertise with scheduling tools, and diverse use cases need to be managed efficiently. To tackle these challenges, we have centralized our approach by leveraging Apache Airflow to orchestrate data dependencies across teams. In this session, we will present how we utilize a code generator to streamline the creation of Airflow code for numerous dbt repositories, dockerized jobs, and Vertex-AI pipelines. With this approach, we simplify the complexity and offer our users the flexibility required to accommodate their use cases. We will share our sensor-callback strategy, which we developed to manage task dependencies, overcoming the limitations of traditional dataset triggers. This approach requires a data asset registry to monitor global dependencies and SLOs, and serves as a safeguard during CI processes for detecting potential breaking changes.

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews. What You’ll Learn: Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability. Integrating LLM Analysis: Explore how we integrated LLM-based analysis into our pipeline. Learn how we designed the database, queries, and ingestion to Postgres. Extending Airflow UI: We developed a custom Airflow UI plugin that filters and visualizes DAG runs by customer, product, and marketplace, delivering clear insights for faster troubleshooting. Leveraging Airflow REST API: Discover how we leveraged the API to trigger DAGs on demand, elevating the UX by tracking mapped DAG progress and computing ETAs. CI/CD and Cost Management: Get practical tips for deploying DAGs with CI/CD.

This session details practical strategies for introducing Apache Airflow in strict, compliance-heavy organizations. Learn how on-premise deployment and hybrid tooling can help modernize legacy workflows when public cloud solutions and container technologies are restricted. Discover how cross-platform engineering teams can collaborate securely using CI/CD bridges, and what it takes to meet rigorous security and governance standards. Key lessons address navigating resistance to change, achieving production sign-off, and avoiding common compliance pitfalls, relevant to anyone automating in public sector settings.

The design of Qualcomm’s Snapdragon System-On-Chip (SoCs) involves several hundred complex workflows orchestrated across multiple data centers, taking the design from RTL to GDS. In the Snapdragon Oryon Custom CPU team, we introduced Airflow about 2 years ago to orchestrate design, verification, emulation, CI/CD, and physical implementation of our CPUs. Use Case: • Standardization and Templatization: We standardize and templatize common workflows, allowing designers to verify their designs by customizing YAML parameters. • Custom Shell Operators: We created custom shell operators (tcshrc) to source project environments and work with internal tooling. • Smart Retries: We use pre/post-execute hooks to trigger smart retries on failure. • Dynamic Celery Workers: We auto-create Celery workers on the fly on our High-Performance Compute (HPC) clusters to launch and manage Electronic Design Automation (EDA) workloads. • Hybrid Executor Strategy: We use a hybrid executor strategy (CeleryExecutor and EdgeExecutor) to orchestrate tasks across multiple data centers. • EdgeExecutor for Remote Testing: We leverage EdgeExecutor to access post-silicon hardware in remote locations.

Have you ever wondered why Apache Airflow builds are asymptotically() green? That thrive for “perennial green build” is not magic, it’s the result of continuous, often unseen engineering effort within our CI/CD pipelines & dev environments. This dedication ensures that maintainers can work efficiently & contributors can onboard smoothly. To tackle the ever growing contributor base, we have a CI/CD team run by volunteers putting in significant work in the foundational tooling. In this talk, we reveal some innovative solutions we have implemented like: Handling GitHub Actions pull_request_target challenges Restructuring the repo for better clarity Slack bot for CI failure alerts A cherry picker workflow for releases Pre-commit hooks Faster website and image builds Tackling the new GitHub API rate limits Solving chicken-and-egg build issues during releases Join us to understand the “why” & “how” behind these infra components. You’ll gain insights into the continuous effort required to support a thriving open-source project like Airflow and, hopefully, be inspired to contribute to these areas. () asymptotically = we fix failures as quickly as we can when they happen

In the rapidly evolving field of data engineering and data science, efficiency and ease of use are crucial. Our innovative solution offers a user-friendly interface to manage and schedule custom PySpark, PySQL, Python, and SQL code, streamlining the process from development to production. Using Airflow at the backend, this tool eliminates the complexities of infrastructure management, version control, CI/CD processes, and workflow orchestration.The intuitive UI allows users to upload code, configure job parameters, and set schedules effortlessly, without the need for additional scripting or coding. Additionally, users have the flexibility to bring their own custom artifactory solution and run their code. In summary, our solution significantly enhances the orchestration and scheduling of custom code, breaking down traditional barriers and empowering organizations to maximize their data’s potential and drive innovation efficiently. Whether you are an individual data scientist or part of a large data engineering team, this tool provides the resources needed to streamline your workflow and achieve your goals faster than ever before.

In this season of the Analytics Engineering podcast, Tristan is digging deep into the world of developer tools and databases. There are few more widely used developer tools than Docker. From its launch back in 2013, Docker has completely changed how developers ship applications.  In this episode, Tristan talks to Solomon Hykes, the founder and creator of Docker. They trace Docker's rise from startup obscurity to becoming foundational infrastructure in modern software development. Solomon explains the technical underpinnings of containerization, the pivotal shift from platform-as-a-service to open-source engine, and why Docker's developer experience was so revolutionary.  The conversation also dives into his next venture Dagger, and how it aims to solve the messy, overlooked workflows of software delivery. Bonus: Solomon shares how AI agents are reshaping how CI/CD gets done and why the next revolution in DevOps might already be here. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

Building robust, production-grade data pipelines goes beyond writing transformation logic — it requires rigorous testing, version control, automated CI/CD workflows and a clear separation between development and production. In this talk, we’ll demonstrate how Lakeflow, paired with Databricks Asset Bundles (DABs), enables Git-based workflows, automated deployments and comprehensive testing for data engineering projects. We’ll share best practices for unit testing, CI/CD automation, data quality monitoring and environment-specific configurations. Additionally, we’ll explore observability techniques and performance tuning to ensure your pipelines are scalable, maintainable and production-ready.