CI/CD

Comment l'IA révolutionne les interactions avec les systèmes d'information ?

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Mehdi Nafe (Novelis) , Walid Dahhane (Novelis)

AI/ML

Quand on parle d’IA, on pense souvent à des cas d’usage précis : comment utiliser l’intelligence artificielle comme une extension du système d’information pour répondre à un besoin particulier.

Mais la vraie révolution n’est-elle pas ailleurs ? Positionner l’IA au cœur du système d’information transforme en profondeur la relation que l’on entretient avec celui-ci. Elle fait évoluer le SI d’un simple outil fonctionnel vers un environnement capable d’anticiper, de recommander et de simplifier l’ensemble des processus métier.

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

2025-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dustin Dorsey (Onix) , Cameron Cyr

Cloud Computing dbt DWH Git Modern Data Stack Python SQL data data-engineering data-warehouse storage-repositories

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

Making your Data AI ready with DataOps

2025-09-25 · Big Data LDN 2025

Face To Face

by Guy Adams (DataOps.live)

AI/ML Data Engineering DataOps Snowflake

AI is only as good as the data it runs on. Yet Gartner predicts in 2026, over 60% of AI projects will fail to deliver value - because the underlying data isn’t truly AI-ready. “Good enough” data isn’t enough.

In this exclusive BDL launch session, DataOps.live reveal Momentum, the next generation of its DataOps automation platform designed to operationalize trusted AI at enterprise scale.

Based on experiences from building over 9000 Data Products to date, Momentum introduces breakthrough capabilities including AI-Ready Data Scoring to ensure data is fit for AI use cases, Data Product Lineage for end-to-end visibility, and a Data Engineering Agent that accelerates building reusable data products. Combined with automated CI/CD, continuous observability, and governance enforcement, Momentum closes the AI-readiness gap by embedding collaboration, metadata, and automation across the entire data lifecycle.

Backed by Snowflake Ventures and trusted by leading enterprises, including AstraZeneca, Disney and AT&T, DataOps.live is the proven catalyst for scaling AI-ready data. In this session, you’ll unpack what AI-ready data really means, learn essential practices, discover a faster, easier, and more impactful way to make your AI initiatives succeed.

Be the first to see Momentum in action - the future of AI-ready data.

No More NiFi UI: Simplify and Automate NiFi Data Flow Deployments with Complete CI/CD Integration

2025-09-24 · Big Data LDN 2025

Face To Face

by Manish Gurnani (Ksolves India Limited)

As NiFi scales, so do the NiFi Data flow deployment headaches. CI/CD helps, but incomplete automation still leaves teams tied to the NiFi UI for adjusting parameters, updating controller services, and managing variables/parameter contexts by hand. This slows releases, increases operational risk, and strains engineering time.

This talk explores a game-changing, centralized platform, Data Flow Manager (DFM), that brings true end-to-end automation to NiFi data flow deployments. Configure, validate, and deploy Nifi data flows across dev, staging, and production environments without ever logging into the NiFi UI. Everything is handled in one place, with full integration into your existing CI/CD pipelines.

We’ll cover a few out-of-the-box features of Data Flow Manager – scheduled NiFi data flow deployments, centralized Access control management, data flow Sanity checks, Audit logging, and Monitoring NiFi-specific metrics, creating predictable, scalable, and error-free NiFi data flow deployments across environments. The goal is simple: reduce operational overhead, eliminate manual errors, and bring predictability to NiFi data pipelines at scale.

From Legacy to Leading-Edge: Revamping NCEI Software for the Cloud Era

2025-07-11 · SciPy 2025

talk

by Sarah Purpura

AWS Cloud Computing DevOps Polars

Extreme weather events threaten industries and economic stability. NOAA’s National Centers for Environmental Information (NCEI) addresses this through the Industry Proving Grounds (IPG), which modernizes data delivery by collaborating with sectors like re/insurance and retail to develop practical, data-driven solutions. This presentation explores IPG’s technical innovations, including implementing Polars for efficient data processing, AWS for scalability, and CI/CD pipelines for streamlined deployment. These tools enhance data accessibility, reduce latency, and support real-time decision-making. By integrating scientific computing, cloud technology, and DevOps, NCEI improves climate resilience and provides a model for leveraging open-source tools to address global challenges.

SciPy Proceedings: An Exemplar for Publishing Computational Open Science

2025-07-11 · SciPy 2025

talk

by Rowan Cockett

GitHub Python SciPy XML

The SciPy Proceedings (https://proceedings.scipy.org) have long served as a cornerstone for publishing research in the scientific python community; with over 330 peer-reviewed articles being published over the last 17 years. In 2024, the SciPy Proceedings underwent a significant transformation, adopting MyST Markdown (https://mystmd.org) and Curvenote (https://curvenote.com) to enhance accessibility, interactivity, and reproducibility — including publishing of Jupyter Notebooks. The new proceedings articles are web-first, providing features such as deep-dive links for cross-references and previews of GItHub content, interactive 3D visualizations, and rich-rendering of Jupyter Notebooks. In this talk, we will (1) present the new authoring & reading capabilities introduced in 2024; (2) highlight connections to prominent open-science initiatives and their impact on advancing computational research publishing; and (3) demonstrate the underlying technologies and how they enhance integrations with SciPy packages and how to use these tools in your own communication workflows.

Our presentation will give an overview of the revised authoring process for SciPy Proceedings; how we improve metadata standards in a similar way to code-linting and continuous integration; and the integration of live previews of the articles, including auto-generated PDFs and JATS XML (a standard used in scientific publishing). The peer-review process for the proceedings currently happens using GitHub’s peer-review commenting in a similar fashion to the Journal of Open Source Software; we will demonstrate this process as well as showcase opportunities for working with distributed review services such as PREreview (https://prereview.org). The open publishing pipeline has streamlined the submission, review, and revision processes while maintaining high scientific quality and improving the completeness of scholarly metadata. Finally, we will present how this work connects into other high-profile scientific publishing initiatives that have incorporated Jupyter Notebooks and live computational figures as well as interactive displays of large-scale data. These initiatives include Notebooks Now! by the American Geophysical Union, which is focusing on ensuring that Jupyter Notebooks can be properly integrated into the scholarly record; and the Microscopy Society of America’s work on interactive publishing and publishing of large-scale microscopy data with interactive visualizations. These initiatives and the SciPy Proceedings are enabled by recent improvements in open-source tools including MyST Markdown, JupyterLab, BinderHub, and Curvenote, which enable new ways to share executable research content. These initiatives collectively aim to improve both the reproducibility, interactivity, and the accessibility of research by providing improved connections between data, software and narrative research articles.

By embracing open science principles and modern technologies, the SciPy Proceedings exemplify how computational research can be more transparent, reproducible, and accessible. The shift to computational publishing, especially in the context of the scientific python community, opens new opportunities for researchers to publish not only their final results but also the computational workflows, datasets, and interactive visualizations that underpin them. This transformation aligns with broader efforts in open science infrastructure, such as integrating persistent identifiers (DOIs, ORCID, ROR), and adopting FAIR (Findable, Accessible, Interoperable, Reusable) principles for computational content. Building on these foundations, as well as open tools like MyST Markdown and Curvenote, provides a scalable model for open scientific publishing that bridges the gap between computational research and scholarly communication, fostering a more collaborative, iterative, and continuous approach to scientific knowledge dissemination.

Reliable executable tutorials -- CI/CD challenges

2025-07-11 · SciPy 2025

talk

by Brigitta Sipőcz

GitHub Python

This BoF aims to host discussion about best practices for maintaining executable tutorials that are reproducible and reliable. The BoF is intended to be a platform to collect tips and tricks of CI/CD practices, too. The moderators recently put together a repository that builds on their experiences of maintaining numerous tutorial repositories https://scientific-python.github.io/executable-tutorials/ that covers some of the use cases but we are well aware that there are still user scenarios and use cases that are not well covered.

The BoF is a complement for both the Teaching&Learning and Maintainers track, none of the talks in those tracks seem to focus on the technical challenges around tutorials.

Reproducible Science Made Easy: Package Management with Pixi

2025-07-10 · SciPy 2025

talk

by Wolf Vollprecht , Ruben Arts

Reproducibility is a major underpinning of the scientific method. In scientific computing, this also includes the ability to reproduce your dependencies. Yet, in 2025 this still remains a challenging topic.

Pixi is a modern package manager built on the Conda ecosystem. It integrates very well with all existing packages on conda-forge. Pixi makes package management reproducible, fast and painless – so that scientists can go back to coding instead of dealing with “dependency hell”. Pixi improves the mix Conda and PyPI package management by integrating with uv by astral.sh and streamlines automation with a cross-platform task runner. These features combined with a powerful lockfile make creating reproducible projects trivial.

This talk is for people who are interested in new, fast ways to set up their software (dev) environments on different systems – think your coworker's computer, CI, containers, and more.