talk-data.com talk-data.com

Topic

CI/CD

Continuous Integration/Continuous Delivery (CI/CD)

devops automation software_development ci_cd

262

tagged

Activity Trend

21 peak/qtr
2020-Q1 2026-Q1

Activities

262 activities · Newest first

In this session, Chad Sanderson, CEO of Gable.ai and author of the upcoming O’Reilly book: "Data Contracts," tackles the necessity of modern data management in an age of hyper iteration, experimentation, and AI. He will explore why traditional data management practices fail and how the cloud has fundamentally changed data development. The talk will cover a modern application of data management best practices, including data change detection, data contracts, observability, and CI/CD tests, and outline the roles of data producers and consumers. 

Attendees will leave with a clear understanding of modern data management's components and how to leverage them for better data handling and decision-making.

NCR Voyix Retail Analytics AI team offers ML products for retailers while embracing Airflow as its MLOps Platform. As the team is small and there have been twice as many data scientists as engineers, we encountered challenges in making Airflow accessible to the scientists: As they come from diverse programming backgrounds, we needed an architecture enabling them to develop production-ready ML workflows without prior knowledge of Airflow. Due to dynamic product demands, we had to implement a mechanism to interchange Airflow operators effortlessly. As workflows serve multiple customers, they should be easily configurable and simultaneously deployable. We came up with the following architecture to deal with the above: Enabling our data scientists to formulate ML workflows as structured Python files. Seamlessly converting the workflows into Airflow DAGs while aggregating their steps to be executed on different Airflow operators. Deploying DAGs via CI/CD’s UI to the DAGs folder for all customers while considering definitions for each in their configuration files. In this session, we will cover Airflow’s evolution in our team and review the concepts of our architecture.

DAG integrity is critical. So are coding conventions, consistency in standards for the group. In this talk, we will share the various lessons learned for testing/verifying our DAGs as part of our GitHub workflows [ for testing as part of the pull request process, and for automated deployment - eventually to production - once merged ]. We will dig into how we have unlocked additional efficiencies, catch errors before they get deployed, and generally how we are better off for having both Airflow & plenty of checks in our CI, before we merge/deploy.

Airflow operators are a core feature of Apache Airflow and it’s extremely important that we maintain high quality of operators, prevent regressions and on the other hand we help developers with automated tests results to double check if introduced changes don’t cause regressions or backward incompatible changes and we provide Airflow release managers with information whether a given version of a provider should be released or not yet. Recently a new approach to assuring production quality was implemented for AWS, Google and Astronomer-provided operators - standalone Continuous Integration processes were configured for them and test results dashboards show the results of the last test runs. What has been working well for these operator providers might be a pattern to follow for others - during this presentation, AWS, Google and Astronomer engineers are going to share the information about the internals of Test Dashboards implemented for AWS, Google and Astronomer-provided operators. This approach might be a a ‘blueprint’ to follow for other providers.

Using various operators to perform daily routines. Integration with Technologies: Redis: Acts as a caching mechanism to optimize data retrieval and processing speed, enhancing overall pipeline performance. MySQL: Utilized for storing metadata and managing task state information within Airflow’s backend database. Tableau: Integrates with Airflow to generate interactive visualizations and dashboards, providing valuable insights into the processed data. Amazon Redshift: Panasonic leverages Redshift for scalable data warehousing, seamlessly integrating it with Airflow for data loading and analytics. Foundry: Integrated with Airflow to access and process data stored within Foundry’s data platform, ensuring data consistency and reliability. Plotly Dashboards: Employed for creating custom, interactive web-based dashboards to visualize and analyze data processed through Airflow pipelines. GitLab CI/CD Pipelines: Utilized for version control and continuous integration/continuous deployment (CI/CD) of Airflow DAGs (Directed Acyclic Graphs), ensuring efficient development and deployment of workflows.

Airflow version upgrades can be challenging. Maybe you upgrade and your dags fail to parse (that’s an easy fix). Or maybe you upgrade and everything looks fine, but when your dag runs, you can no longer connect to mysql because the TLS version changed. In this talk I will provide concrete strategies that users can put into practice to make version upgrades safer and less painful. Topics may include: What semver means and what it implies for the upgrade process Using integration test dags, unit tests, and a test cluster to smoke out problems Strategies around constraints files / pinning, and managing providers vs core versions Using db clean prior to upgrade to reduce table size Rollback strategies What to do about warnings (e.g. deprecation warnings)? I’ll also focus on keeping it simple. Sometimes things like “integration tests” and “CI” can be scary for people. Even without having set up anything automated, there are still things you can do to make management of upgrades a little less painful and risky.

Apache Airflow relies on a silent symphony behind the scenes: its CI/CD (Continuous Integration/Continuous Delivery) and development tooling. This presentation explores the critical role these tools play in keeping Airflow efficient and innovative. We’ll delve into how robust CI/CD ensures bug fixes and improvements are seamlessly integrated, while well-maintained development tools empower developers to contribute effectively. Airflow’s power comes from a well-oiled machine – its CI/CD and development tools. This presentation dives into the world of these often-overlooked heroes. We’ll explore how seamless CI/CD pipelines catch and fix issues early, while robust development tools empower efficient coding and collaboration. Discover how you can use and contribute to a thriving Airflow ecosystem by ensuring these crucial tools stay in top shape.

Data Engineering with Google Cloud Platform - Second Edition

Data Engineering with Google Cloud Platform is your ultimate guide to building scalable data platforms using Google Cloud technologies. In this book, you will learn how to leverage products such as BigQuery, Cloud Composer, and Dataplex for efficient data engineering. Expand your expertise and gain practical knowledge to excel in managing data pipelines within the Google Cloud ecosystem. What this Book will help me do Understand foundational data engineering concepts using Google Cloud Platform. Learn to build and manage scalable data pipelines with tools such as Dataform and Dataflow. Explore advanced topics like data governance and secure data handling in Google Cloud. Boost readiness for Google Cloud data engineering certification with real-world exam guidance. Master cost-effective strategies and CI/CD practices for data engineering on Google Cloud. Author(s) Adi Wijaya, the author of this book, is a Data Strategic Cloud Engineer at Google with extensive experience in data engineering and the Google Cloud ecosystem. With his hands-on expertise, he emphasizes practical solutions and in-depth knowledge sharing, guiding readers through the intricacies of Google Cloud for data engineering success. Who is it for? This book is ideal for data analysts, IT practitioners, software engineers, and data enthusiasts aiming to excel in data engineering. Whether you're a beginner tackling fundamental concepts or an experienced professional exploring Google Cloud's advanced capabilities, this book is designed for you. It bridges your current skills with modern data engineering practices on Google Cloud, making it a valuable resource at any stage of your career.

In this session, we'll dive into deploying Java apps using Google Cloud's serverless platform. Designed for Java developers, it offers practical insights into consideration, challenges, tips and tricks for deploying JVM applications in Serverless platforms. We’ll also cover other best practices across different part of the application lifecycle, such as CI/CD pipelines, security, and observability. Through interactive demos, learn to build, secure, and monitor Java applications efficiently.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Learn how Cloud Deploy advances your CI/CD pipeline. Frequent and safe deployment is crucial for software development. Cloud Deploy stands out as a fully managed application delivery platform on Google Cloud for strategic delivery. It supports progressive delivery including canary, verification, and automation. This session will delve into the capabilities of Cloud Deploy and how technology startup Ubie is leveraging it to enhance its development processes for both dev and platform teams.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Have you ever wondered how a data company does data? In this session, Isaac Obezo, Staff Data Engineer at Starburst, will take you for a peek behind the curtain into Starburst’s own data architecture built to support batch processing of telemetry data within Galaxy data pipelines. Isaac will walk you through our architecture utilizing tools like git, dbt, and Starburst Galaxy to create a CI/CD process allowing our data engineering team to iterate quickly to deploy new models, develop and land data, and create and improve existing models in the data lake. Isaac will also discuss Starburst’s mentality toward data quality, the use of data products, and the process toward delivering quality analytics.

Join the team from Moody's Analytics as they take you on a personal journey of optimizing their data pipelines for data quality and governance. Like many data practitioners, Ryan understands the frustration and anxiety that comes with accidentally introducing bad code into production pipelines—he's spent countless hours putting out fires caused by these unexpected changes. In this session, Ryan will recount his experiences with a previous data stack that lacked standardized testing methods and visibility into the impact of code changes on production data. He'll also share how their new data stack is safeguarded by Datafold's data diffing and continuous integration (CI) capabilities, which enables his team to work with greater confidence, peace of mind, and speed.

Spend less time prepping data and more time gaining insights with Gemini in BigQuery. In this session, you'll discover how to visually transform your data with AI for streamlined analysis. Witness a live demo of BigQuery data preparation. Seattle Children's will demonstrate the transformative effect of AI on data engineer productivity and accelerating development. Plus, get a sneak peek into the exciting roadmap of features including expanded connectivity, continuous integration and delivery workflows, and robust data quality.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Learn how to get the most out of your CI/CD tools through new features. Find out the latest product updates to create secure and reliable application repositories, builds, and deployments.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

In this session, Chad Sanderson, CEO of Gable.ai and author of the upcoming O’Reilly book: "Data Contracts," tackles the necessity of modern data management in an age of hyper iteration, experimentation, and AI. He will explore why traditional data management practices fail and how the cloud has fundamentally changed data development. The talk will cover a modern application of data management best practices, including data change detection, data contracts, observability, and CI/CD tests, and outline the roles of data producers and consumers. Attendees will leave with a clear understanding of modern data management's components and how to leverage them for better data handling and decision-making.

Rethinking Ingestion: CI/CD for Data Lakes by Einat Orr

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Learn to design and deploy the Google Cloud global front end to protect, scale, and deliver web experiences from infrastructure running on-cloud or on-premise. Get an overview of the global front end with load balancing, CDN, and web protection including DDoS mitigation. Then move into programmability with service extension callouts and designing for scenarios across clouds and on-premise. We will cover integration into continuous integration/development workflows, show a demo, and learn from a customer about their use case and lessons learned.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Marsh McLennan runs a complex Apigee Hybrid configuration, with 36 organizations operating in six global data centers. Keeping all of this in sync across production and nonproduction environments is a challenge. While the infrastructure itself is deployed with Terraform, Marsh McLennan wanted to apply the same declarative approach to the entire environment. See how it used Apigee's management APIs to build a state machine to keep the whole system running smoothly, allowing APIs to flow seamlessly from source control through to production.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Fundamentals of Analytics Engineering

Master the art and science of analytics engineering with 'Fundamentals of Analytics Engineering.' This book takes you on a comprehensive journey from understanding foundational concepts to implementing end-to-end analytics solutions. You'll gain not just theoretical knowledge but practical expertise in building scalable, robust data platforms to meet organizational needs. What this Book will help me do Design and implement effective data pipelines leveraging modern tools like Airbyte, BigQuery, and dbt. Adopt best practices for data modeling and schema design to enhance system performance and develop clearer data structures. Learn advanced techniques for ensuring data quality, governance, and observability in your data solutions. Master collaborative coding practices, including version control with Git and strategies for maintaining well-documented codebases. Automate and manage data workflows efficiently using CI/CD pipelines and workflow orchestrators. Author(s) Dumky De Wilde, alongside six co-authors-experienced professionals from various facets of the analytics field-delivers a cohesive exploration of analytics engineering. The authors blend their expertise in software development, data analysis, and engineering to offer actionable advice and insights. Their approachable ethos makes complex concepts understandable, promoting educational learning. Who is it for? This book is a perfect fit for data analysts and engineers curious about transitioning into analytics engineering. Aspiring professionals as well as seasoned analytics engineers looking to deepen their understanding of modern practices will find guidance. It's tailored for individuals aiming to boost their career trajectory in data engineering roles, addressing fundamental to advanced topics.