DevOps

Building Data Products

2026-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jean-Georges Perrin (Actian)

AI/ML API CI/CD Data Contracts Cyber Security data data-engineering

As organizations grapple with fragmented data, siloed teams, and inconsistent pipelines, data products have emerged as a practical solution for delivering trusted, scalable, and reusable data assets. In Building Data Products, Jean-Georges Perrin provides a comprehensive, standards-driven playbook for designing, implementing, and scaling data products that fuel innovation and cross-functional collaboration—whether or not your organization adopts a full data mesh strategy. Drawing on extensive industry experience and practitioner interviews, Perrin shows readers how to build metadata-rich, governed data products aligned to business domains. Covering foundational concepts, real-world use cases, and emerging standards like Bitol ODPS and ODCS, this guide offers step-by-step implementation advice and practical code examples for key stages—ownership, observability, active metadata, compliance, and integration. Design data products for modular reuse, discoverability, and trust Implement standards-driven architectures with rich metadata and security Incorporate AI-driven automation, SBOMs, and data contracts Scale product-driven data strategies across teams and platforms Integrate data products into APIs, CI/CD pipelines, and DevOps practices

Data Engineering with Azure Databricks

2026-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

AI/ML Airflow Analytics Azure ADF Azure DevOps CI/CD Cloud Computing Data Engineering Data Governance Data Lakehouse Databricks +11 more

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

Mastering Snowflake DataOps with DataOps.live: An End-to-End Guide to Modern Data Management

2025-10-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ronald L. Steelman Jr.

Data Management DataOps dbt Git Snowflake data data-engineering

This practical, in-depth guide shows you how to build modern, sophisticated data processes using the Snowflake platform and DataOps.live —the only platform that enables seamless DataOps integration with Snowflake. Designed for data engineers, architects, and technical leaders, it bridges the gap between DataOps theory and real-world implementation, helping you take control of your data pipelines to deliver more efficient, automated solutions. . You’ll explore the core principles of DataOps and how they differ from traditional DevOps, while gaining a solid foundation in the tools and technologies that power modern data management—including Git, DBT, and Snowflake. Through hands-on examples and detailed walkthroughs, you’ll learn how to implement your own DataOps strategy within Snowflake and maximize the power of DataOps.live to scale and refine your DataOps processes. Whether you're just starting with DataOps or looking to refine and scale your existing strategies, this book—complete with practical code examples and starter projects—provides the knowledge and tools you need to streamline data operations, integrate DataOps into your Snowflake infrastructure, and stay ahead of the curve in the rapidly evolving world of data management. What You Will Learn Explore the fundamentals of DataOps , its differences from DevOps, and its significance in modern data management Understand Git’s role in DataOps and how to use it effectively Know why DBT is preferred for DataOps and how to apply it Set up and manage DataOps.live within the Snowflake ecosystem Apply advanced techniques to scale and evolve your DataOps strategy Who This Book Is For Snowflake practitioners—including data engineers, platform architects, and technical managers—who are ready to implement DataOps principles and streamline complex data workflows using DataOps.live.

High Performance with MongoDB

2025-09-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ger Hartnett , Asya Kamsky , Alex Bevilacqua

MongoDB data data-engineering nosql-databases

Practical strategies to help you design, optimize, and operate MongoDB deployments for performance, resilience, and growth Key Features Identify and fix performance bottlenecks with practical diagnostic and optimization strategies Optimize schema design, indexing, storage, and system resources for real-world workloads Scale confidently with in-depth coverage of replication, sharding, and cluster management techniques Purchase of the print or Kindle book includes a free PDF eBook Book Description With data as the new competitive edge, performance has become the need of the hour. As applications handle exponentially growing data and user demand for speed and reliability rises, three industry experts distill their decades of experience to offer you guidance on designing, building, and operating databases that deliver fast, scalable, and resilient experiences. MongoDB’s document model and distributed architecture provide powerful tools for modern applications, but unlocking their full potential requires a deep understanding of architecture, operational patterns, and tuning best practices. This MongoDB book takes a hands-on approach to diagnosing common performance issues and applying proven optimization strategies from schema design and indexing to storage engine tuning and resource management. Whether you’re optimizing a single replica set or scaling a sharded cluster, this book provides the tools to maximize deployment performance. Its modular chapters let you explore query optimization, connection management, and monitoring or follow a complete learning path to build a rock-solid performance foundation. With real-world case studies, code examples, and proven best practices, you’ll be ready to troubleshoot bottlenecks, scale efficiently, and keep MongoDB running at peak performance in even the most demanding production environments. What you will learn Diagnose and resolve common performance bottlenecks in deployments Design schemas and indexes that maximize throughput and efficiency Tune the WiredTiger storage engine and manage system resources for peak performance Leverage sharding and replication to scale and ensure uptime Monitor, debug, and maintain deployments proactively to prevent issues Improve application responsiveness through client driver configuration Who this book is for This book is for developers, database administrators, system architects, and DevOps engineers focused on performance optimization of MongoDB. Whether you’re building high-throughput applications, managing deployments in production, or scaling distributed systems, you’ll gain actionable insights. Basic knowledge of MongoDB is assumed, with chapters designed progressively to support learners at all levels.

PHP, MySQL, & JavaScript All-In-One For Dummies, 2nd Edition

2025-07-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Blum

HTML JavaScript MySQL data data-engineering relational-databases

Learn the essentials of creating web apps with some of the most popular programming languages PHP, MySQL, & JavaScript All-in-One For Dummies bundles the essentials of coding in some of the most in-demand web development languages. You'll learn to create your own data-driven web applications and interactive web content. The three powerful languages covered in this book form the backbone of top online apps like Wikipedia and Etsy. Paired with the basics of HTML and CSS—also covered in this All-in-One Dummies guide—you can make dynamic websites with a variety of elements. This book makes it easy to get started. You'll also find coverage of advanced skills, as well as resources you'll appreciate when you're ready to level up. Get beginner-friendly instructions and clear explanations of how to program websites in common languages Understand the basics of object-oriented programming, interacting with databases, and connecting front- and back-end code Learn how to work according to popular DevOps principles, including containers and microservices Troubleshoot problems in your code and avoid common web development mistakes This All-in-One is a great value for new programmers looking to pick up web development skills, as well as those with more experience who want to expand to building web apps.

CockroachDB: The Definitive Guide, 2nd Edition

2025-03-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jesse Seldess , Ben Darnell , Rob Reid , Guy Harrison

Cloud Computing Data Modelling SQL cockroachdb data data-engineering relational-databases

CockroachDB is the distributed SQL database that handles the demands of today's data-driven applications. The second edition of this popular hands-on guide shows software developers, architects, and DevOps/SRE teams how to use CockroachDB for applications that scale elastically and provide seamless delivery for end users while remaining indestructible. Data professionals will learn how to migrate existing applications to CockroachDB's performant, cloud-native data architecture. You'll also quickly discover the benefits of strong data correctness and consistency guarantees, plus optimizations for delivering ultra-low latencies to globally distributed end users. Uncover the power of distributed SQL Learn how to start, manage, and optimize projects in CockroachDB Explore best practices for data modeling, schema design, and distributed infrastructure Discover strategies for migrating data into CockroachDB See how to read, write, and run ACID transactions across distributed systems Maximize resiliency in multiregion clusters Secure, monitor, and fine-tune your CockroachDB deployment for peak performance

Platform Engineering

2024-10-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ian Nowland , Camille Fournier

Agile/Scrum Cloud Computing it-operations platform-engineering

Until recently, infrastructure was the backbone of organizations operating software they developed in-house. But now that cloud vendors run the computers, companies can finally bring the benefits of agile custom-centricity to their own developers. Adding product management to infrastructure organizations is now all the rage. But how's that possible when infrastructure is still the operational layer of the company? This practical book guides engineers, managers, product managers, and leaders through the shifts that modern platform-led organizations require. You'll learn what platform engineering is—and isn't—and what benefits and value it brings to developers and teams. You'll understand what it means to approach a platform as a product and learn some of the most common technical and managerial barriers to success. With this book, you'll: Cultivate a platform-as-product, developer-centric mindset Learn what platform engineering teams are and are not Start the process of adopting platform engineering within your organization Discover what it takes to become a product manager for a platform team Understand the challenges that emerge when you scale platforms Automate processes and self-service infrastructure to speed development and improve developer experience Build out, hire, manage, and advocate for a platform team

Data Engineering with Databricks Cookbook

2024-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pulkit Chadha

Big Data Cloud Computing Data Engineering Data Governance Databricks DataOps Delta Python Spark SQL Data Streaming data +1 more

In "Data Engineering with Databricks Cookbook," you'll learn how to efficiently build and manage data pipelines using Apache Spark, Delta Lake, and Databricks. This recipe-based guide offers techniques to transform, optimize, and orchestrate your data workflows. What this Book will help me do Master Apache Spark for data ingestion, transformation, and analysis. Learn to optimize data processing and improve query performance with Delta Lake. Manage streaming data processing with Spark Structured Streaming capabilities. Implement DataOps and DevOps workflows tailored for Databricks. Enforce data governance policies using Unity Catalog for scalable solutions. Author(s) Pulkit Chadha, the author of this book, is a Senior Solutions Architect at Databricks. With extensive experience in data engineering and big data applications, he brings practical insights into implementing modern data solutions. His educational writings focus on empowering data professionals with actionable knowledge. Who is it for? This book is ideal for data engineers, data scientists, and analysts who want to deepen their knowledge in managing and transforming large datasets. Readers should have an intermediate understanding of SQL, Python programming, and basic data architecture concepts. It is especially well-suited for professionals working with Databricks or similar cloud-based data platforms.

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

2023-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Elad Eldor

Cloud Computing DataOps Kafka Data Streaming data data-engineering streaming-messaging

This book provides Kafka administrators, site reliability engineers, and DataOps and DevOps practitioners with a list of real production issues that can occur in Kafka clusters and how to solve them. The production issues covered are assembled into a comprehensive troubleshooting guide for those engineers who are responsible for the stability and performance of Kafka clusters in production, whether those clusters are deployed in the cloud or on-premises. This book teaches you how to detect and troubleshoot the issues, and eventually how to prevent them. Kafka stability is hard to achieve, especially in high throughput environments, and the purpose of this book is not only to make troubleshooting easier, but also to prevent production issues from occurring in the first place. The guidance in this book is drawn from the author's years of experience in helping clients and internal customers diagnose and resolve knotty production problems and stabilize their Kafka environments. The book is organized into recipe-style troubleshooting checklists that field engineers can easily follow when under pressure to fix an unstable cluster. This is the book you will want by your side when the stakes are high, and your job is on the line. What You Will Learn Monitor and resolve production issues in your Kafka clusters Provision Kafka clusters with the lowest costs and still handle the required loads Perform root cause analyses of issues affecting your Kafka clusters Know the ways in which your Kafka cluster can affect its consumers and producers Prevent or minimize data loss and delays in data streaming Forestall production issues through an understanding of common failure points Create checklists for troubleshooting your Kafka clusters when problems occur Who This Book Is For Site reliability engineers tasked with maintaining stability of Kafka clusters, Kafka administrators who troubleshoot production issues around Kafka, DevOps and DataOps experts who are involved with provisioning Kafka (whether on-premises or in the cloud), developers of Kafka consumers and producers who wish to learn more about Kafka

Streaming Data Mesh

2023-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stephen Mooney , Hubert Dulay

Data Governance Kafka MLOps Data Streaming data data-engineering streaming-architecture streaming-messaging

Data lakes and warehouses have become increasingly fragile, costly, and difficult to maintain as data gets bigger and moves faster. Data meshes can help your organization decentralize data, giving ownership back to the engineers who produced it. This book provides a concise yet comprehensive overview of data mesh patterns for streaming and real-time data services. Authors Hubert Dulay and Stephen Mooney examine the vast differences between streaming and batch data meshes. Data engineers, architects, data product owners, and those in DevOps and MLOps roles will learn steps for implementing a streaming data mesh, from defining a data domain to building a good data product. Through the course of the book, you'll create a complete self-service data platform and devise a data governance system that enables your mesh to work seamlessly. With this book, you will: Design a streaming data mesh using Kafka Learn how to identify a domain Build your first data product using self-service tools Apply data governance to the data products you create Learn the differences between synchronous and asynchronous data services Implement self-services that support decentralized data

Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

2022-09-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Josephine Bush

AWS Amazon RDS Azure Cloud Computing Microsoft SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Know how to track changes and key events in your SQL Server databases in support of application troubleshooting, regulatory compliance, and governance. This book shows how to use key features in SQL Server ,such as SQL Server Audit and Extended Events, to track schema changes, permission changes, and changes to your data. You’ll even learn how to track queries run against specific tables in a database. Not all changes and events can be captured and tracked using SQL Server Audit and Extended Events, and the book goes beyond those features to also show what can be captured using common criteria compliance, change data capture, temporal tables, or querying the SQL Server log. You will learn how to audit just what you need to audit, and how to audit pretty much anything that happens on a SQL Server instance. This book will also help you set up cloud auditing with an emphasis on Azure SQL Database, Azure SQL Managed Instance, and AWS RDS SQL Server. You don’t need expensive, third-party auditing tools to make auditing work for you, and to demonstrate and provide value back to your business. This book will help you set up an auditing solution that works for you and your needs. It shows how to collect the audit data that you need, centralize that data for easy reporting, and generate audit reports using built-in SQL Server functionality for use by your own team, developers, and organization’s auditors. What You Will Learn Understand why auditing is important for troubleshooting, compliance, and governance Track changes and key events using SQL Server Audit and Extended Events Track SQL Server configuration changes for governance and troubleshooting Utilize change data capture and temporal tables to track data changes in SQL Server tables Centralize auditing data from all yourdatabases for easy querying and reporting Configure auditing on Azure SQL, Azure SQL Managed Instance, and AWS RDS SQL Server Who This Book Is For Database administrators who need to know what’s changing on their database servers, and those who are making the changes; database-savvy DevOps engineers and developers who are charged with troubleshooting processes and applications; developers and administrators who are responsible for generating reports in support of regulatory compliance reporting and auditing

CockroachDB: The Definitive Guide

2022-04-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jesse Seldess , Ben Darnell , Guy Harrison

Cloud Computing Data Modelling SQL cockroachdb data data-engineering relational-databases

Get the lowdown on CockroachDB, the distributed SQL database built to handle the demands of today's data-driven cloud applications. In this hands-on guide, software developers, architects, and DevOps/SRE teams will learn how to use CockroachDB to create applications that scale elastically and provide seamless delivery for end users while remaining indestructible. Teams will also learn how to migrate existing applications to CockroachDB's performant, cloud native data architecture. If you're familiar with distributed systems, you'll quickly discover the benefits of strong data correctness and consistency guarantees as well as optimizations for delivering ultra low latencies to globally distributed end users. You'll learn how to: Design and build applications for distributed infrastructure, including data modeling and schema design Migrate data into CockroachDB Read and write data and run ACID transactions across distributed infrastructure Plan a CockroachDB deployment for resiliency across single region and multi-region clusters Secure, monitor, and optimize your CockroachDB deployment

Data Engineering on Azure

2021-08-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vlad Riscutia

AI/ML Analytics Azure Big Data Cloud Computing Data Engineering Data Governance Data Management Data Modelling Data Quality Microsoft data +1 more

Build a data platform to the industry-leading standards set by Microsoft’s own infrastructure. In Data Engineering on Azure you will learn how to: Pick the right Azure services for different data scenarios Manage data inventory Implement production quality data modeling, analytics, and machine learning workloads Handle data governance Using DevOps to increase reliability Ingesting, storing, and distributing data Apply best practices for compliance and access control Data Engineering on Azure reveals the data management patterns and techniques that support Microsoft’s own massive data infrastructure. Author Vlad Riscutia, a data engineer at Microsoft, teaches you to bring an engineering rigor to your data platform and ensure that your data prototypes function just as well under the pressures of production. You'll implement common data modeling patterns, stand up cloud-native data platforms on Azure, and get to grips with DevOps for both analytics and machine learning. About the Technology Build secure, stable data platforms that can scale to loads of any size. When a project moves from the lab into production, you need confidence that it can stand up to real-world challenges. This book teaches you to design and implement cloud-based data infrastructure that you can easily monitor, scale, and modify. About the Book In Data Engineering on Azure you’ll learn the skills you need to build and maintain big data platforms in massive enterprises. This invaluable guide includes clear, practical guidance for setting up infrastructure, orchestration, workloads, and governance. As you go, you’ll set up efficient machine learning pipelines, and then master time-saving automation and DevOps solutions. The Azure-based examples are easy to reproduce on other cloud platforms. What's Inside Data inventory and data governance Assure data quality, compliance, and distribution Build automated pipelines to increase reliability Ingest, store, and distribute data Production-quality data modeling, analytics, and machine learning About the Reader For data engineers familiar with cloud computing and DevOps. About the Author Vlad Riscutia is a software architect at Microsoft. Quotes A definitive and complete guide on data engineering, with clear and easy-to-reproduce examples. - Kelum Prabath Senanayake, Echoworx An all-in-one Azure book, covering all a solutions architect or engineer needs to think about. - Albert Nogués, Danone A meaningful journey through the Azure ecosystem. You’ll be building pipelines and joining components quickly! - Todd Cook, Appen A gateway into the world of Azure for machine learning and DevOps engineers. - Krzysztof Kamyczek, Luxoft

Developing Modern Database Applications with PostgreSQL

2021-08-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Quan Ha Le , Marcelo Diaz

API Cloud Computing Linux data data-engineering postgresql relational-databases

In "Developing Modern Database Applications with PostgreSQL", you will master the art of building database applications with the highly available and scalable PostgreSQL. Walk through a series of real-world projects that fully explore both the developmental and administrative aspects of PostgreSQL, all tied together through the example of a banking application. What this Book will help me do Set up high-availability PostgreSQL clusters using modern best practices. Monitor and tune database performance to handle enterprise-level workloads seamlessly. Automate testing and implement test-driven development strategies for robust applications. Leverage PostgreSQL along with DevOps pipelines to deploy applications on cloud platforms. Develop APIs and geospatial databases using popular tools like PostgREST and PostGIS. Author(s) The authors of this book, None Le and None Diaz, are experienced professionals in database technologies and software development. With a passion for PostgreSQL and its applications in modern computing, they bring a wealth of expertise and a practical approach to this book. Their methods focus on real-world applicability, ensuring that readers gain hands-on skills and practical knowledge. Who is it for? This book is perfect for database developers, administrators, and architects who want to advance their expertise in PostgreSQL. It is also suitable for software engineers and IT professionals aiming to tackle end-to-end database development projects. A basic knowledge of PostgreSQL and Linux will help you dive into the hands-on projects easily. If you're looking to take your PostgreSQL skills to the next level, this book is for you.

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

2021-08-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ron C. L'Esteve

Analytics Azure ADF Azure DevOps CI/CD Cloud Computing Cosmos Data Engineering Data Governance Data Lake Databricks ETL/ELT +9 more

Build efficient and scalable batch and real-time data ingestion pipelines, DevOps continuous integration and deployment pipelines, and advanced analytics solutions on the Azure Data Platform. This book teaches you to design and implement robust data engineering solutions using Data Factory, Databricks, Synapse Analytics, Snowflake, Azure SQL database, Stream Analytics, Cosmos database, and Data Lake Storage Gen2. You will learn how to engineer your use of these Azure Data Platform components for optimal performance and scalability. You will also learn to design self-service capabilities to maintain and drive the pipelines and your workloads. The approach in this book is to guide you through a hands-on, scenario-based learning process that will empower you to promote digital innovation best practices while you work through your organization’s projects, challenges, and needs. The clear examples enable you to use this book as a reference and guide for building data engineering solutions in Azure. After reading this book, you will have a far stronger skill set and confidence level in getting hands on with the Azure Data Platform. What You Will Learn Build dynamic, parameterized ELT data ingestion orchestration pipelines in Azure Data Factory Create data ingestion pipelines that integrate control tables for self-service ELT Implement a reusable logging framework that can be applied to multiple pipelines Integrate Azure Data Factory pipelines with a variety of Azure data sources and tools Transform data with Mapping Data Flows in Azure Data Factory Apply Azure DevOps continuous integration and deployment practices to your Azure Data Factory pipelines and development SQL databases Design and implement real-time streaming and advanced analytics solutions using Databricks, Stream Analytics, and Synapse Analytics Get started with a variety of Azure data services through hands-on examples Who This Book Is For Data engineers and data architects who are interested in learning architectural and engineering best practices around ELT and ETL on the Azure Data Platform, those who are creating complex Azure data engineering projects and are searching for patterns of success, and aspiring cloud and data professionals involved in data engineering, data governance, continuous integration and deployment of DevOps practices, and advanced analytics who want a full understanding of the many different tools and technologies that Azure Data Platform provides

Data Pipelines with Apache Airflow

2021-05-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Julian de Ruiter (Xebia) , Bas Harenslak (Astronomer)

AI/ML Airflow Cloud Computing Data Management Python Snowflake apache-airflow data data-engineering

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack. About the Technology Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task. About the Book Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs. What's Inside Build, test, and deploy Airflow pipelines as DAGs Automate moving and transforming data Analyze historical datasets using backfilling Develop custom components Set up Airflow in production environments About the Reader For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills. About the Authors Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer. Quotes An Airflow bible. Useful for all kinds of users, from novice to expert. - Rambabu Posa, Sai Aashika Consultancy An easy-to-follow exploration of the benefits of orchestrating your data pipeline jobs with Airflow. - Daniel Lamblin, Coupang The one reference you need to create, author, schedule, and monitor workflows with Apache Airflow. Clear recommendation. - Thorsten Weber, bbv Software Services AG By far the best resource for Airflow. - Jonathan Wood, LexisNexis

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

2021-02-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Leonard

Azure ADF Azure DevOps Cloud Computing ETL/ELT Microsoft SQL SSIS data data-engineering etl

Build custom SQL Server Integration Services (SSIS) tasks using Visual Studio Community Edition and C#. Bring all the power of Microsoft .NET to bear on your data integration and ETL processes, and for no added cost over what you’ve already spent on licensing SQL Server. New in this edition is a demonstration deploying a custom SSIS task to the Azure Data Factory (ADF) Azure-SSIS Integration Runtime (IR). All examples in this new edition are implemented in C#. Custom task developers are shown how to implement custom tasks using the widely accepted and default language for .NET development. Why are custom components necessary? Because even though the SSIS catalog of built-in tasks and components is a marvel of engineering, gaps remain in the available functionality. One such gap is a constraint of the built-in SSIS Execute Package Task, which does not allow SSIS developers to select SSIS packages from other projects in the SSIS Catalog. Examples in this bookshow how to create a custom Execute Catalog Package task that allows SSIS developers to execute tasks from other projects in the SSIS Catalog. Building on the examples and patterns in this book, SSIS developers may create any task to which they aspire, custom tailored to their specific data integration and ETL needs. What You Will Learn Configure and execute Visual Studio in the way that best supports SSIS task development Create a class library as the basis for an SSIS task, and reference the needed SSIS assemblies Properly sign assemblies that you create in order to invoke them from your task Implement source code control via Azure DevOps, or your own favorite tool set Troubleshoot and execute custom tasks as part of your own projects Create deployment projects (MSIs) for distributing code-complete tasks Deploy custom tasks to Azure Data Factory Azure-SSIS IRs in the cloud Create advanced editors for custom task parameters Who This Book Is For For database administrators and developers who are involved in ETL projects built around SQL Server Integration Services (SSIS). Readers do not need a background in software development with C#. Most important is a desire to optimize ETL efforts by creating custom-tailored tasks for execution in SSIS packages, on-premises or in ADF Azure-SSIS IRs.

SQL Server Data Automation Through Frameworks: Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory

2020-10-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kent Bradshaw , Andy Leonard

Azure ADF Cloud Computing ETL/ELT Microsoft SQL SSIS data data-engineering microsoft-sql-server relational-databases

Learn to automate SQL Server operations using frameworks built from metadata-driven stored procedures and SQL Server Integration Services (SSIS). Bring all the power of Transact-SQL (T-SQL) and Microsoft .NET to bear on your repetitive data, data integration, and ETL processes. Do this for no added cost over what you’ve already spent on licensing SQL Server. The tools and methods from this book may be applied to on-premises and Azure SQL Server instances. The SSIS framework from this book works in Azure Data Factory (ADF) and provides DevOps personnel the ability to execute child packages outside a project—functionality not natively available in SSIS. Frameworks not only reduce the time required to deliver enterprise functionality, but can also accelerate troubleshooting and problem resolution. You'll learn in this book how frameworks also improve code quality by using metadata to drive processes. Much of the work performed by data professionals can be classified as “drudge work”—tasks that are repetitive and template-based. The frameworks-based approach shown in this book helps you to avoid that drudgery by turning repetitive tasks into "one and done" operations. Frameworks as described in this book also support enterprise DevOps with built-in logging functionality. What You Will Learn Create a stored procedure framework to automate SQL process execution Base your framework on a working system of stored procedures and execution logging Create an SSIS framework to reduce the complexity of executing multiple SSIS packages Deploy stored procedure and SSIS frameworks to Azure Data Factory environments in the cloud Who This Book Is For Database administrators and developers who are involved in enterprise data projects built around stored procedures and SQL Server Integration Services (SSIS). Readersshould have a background in programming along with a desire to optimize their data efforts by implementing repeatable processes that support enterprise DevOps.

MongoDB Topology Design: Scalability, Security, and Compliance on a Global Scale

2020-09-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nicholas Cottrell

Cloud Computing Docker GDPR/CCPA Kubernetes MongoDB Cyber Security data data-engineering nosql-databases

Create a world-class MongoDB cluster that is scalable, reliable, and secure. Comply with mission-critical regulatory regimes such as the European Union’s General Data Protection Regulation (GDPR). Whether you are thinking of migrating to MongoDB or need to meet legal requirements for an existing self-managed cluster, this book has you covered. It begins with the basics of replication and sharding, and quickly scales up to cover everything you need to know to control your data and keep it safe from unexpected data loss or downtime. This book covers best practices for stable MongoDB deployments. For example, a well-designed MongoDB cluster should have no single point of failure. The book covers common use cases when only one or two data centers are available. It goes into detail about creating geopolitical sharding configurations to cover the most stringent data protection regulation compliance. The book also covers different tools and approaches for automating and monitoring a cluster with Kubernetes, Docker, and popular cloud provider containers. What You Will Learn Get started with the basics of MongoDB clusters Protect and monitor a MongoDB deployment Deepen your expertise around replication and sharding Keep effective backups and plan ahead for disaster recovery Recognize and avoid problems that can occur in distributed databases Build optimal MongoDB deployments within hardware and data center limitations Who This Book Is For Solutions architects, DevOps architects and engineers, automation and cloud engineers, and database administrators who are new to MongoDB and distributed databases or who need to scale up simple deployments. This book is a complete guide to planning a deployment for optimal resilience, performance, and scaling, and covers all the details required to meet the new set of data protection regulations such as the GDPR. This book is particularly relevant for large global organizations such as financial and medical institutions, as well as government departments that need to control data in the whole stack and are prohibited from using managed cloud services.

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 4

2020-04-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

Cloud Computing IBM Cyber Security data data-engineering

IBM Storage for Red Hat OpenShift is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift environment. This blueprint includes Red Hat OpenShift Container Platform and uses Container Storage Interface (CSI) standards. IBM Storage brings enterprise data services to containers. In this blueprint, learn how to: · Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! · Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform utilizing new open source Container Storage interface (CSI) drivers · Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform is designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

talk-data.com

Activity Trend

Top Events

Top Speakers

Building Data Products

Data Engineering with Azure Databricks

Mastering Snowflake DataOps with DataOps.live: An End-to-End Guide to Modern Data Management

High Performance with MongoDB

PHP, MySQL, & JavaScript All-In-One For Dummies, 2nd Edition

CockroachDB: The Definitive Guide, 2nd Edition

Platform Engineering

Data Engineering with Databricks Cookbook

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

Streaming Data Mesh

Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

CockroachDB: The Definitive Guide

Data Engineering on Azure

Developing Modern Database Applications with PostgreSQL

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

Data Pipelines with Apache Airflow

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

SQL Server Data Automation Through Frameworks: Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory

MongoDB Topology Design: Scalability, Security, and Compliance on a Global Scale

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 4