O'Reilly Data Engineering Books

Data-driven Models in Inverse Problems

2024-11-18 O'Reilly Amazon

book

Tatiana A. Bubba

data data-engineering data-models

Advances in learning-based methods are revolutionizing several fields in applied mathematics, including inverse problems, resulting in a major paradigm shift towards data-driven approaches. This volume, which is inspired by this cutting-edge area of research, brings together contributors from the inverse problem community and shows how to successfully combine model- and data-driven approaches to gain insight into practical and theoretical issues.

Apache Spark for Machine Learning

2024-11-01 O'Reilly Amazon

book

Deepak Gowda

data data-engineering apache-spark AI/ML Big Data Computer Science

Dive into the power of Apache Spark as a tool for handling and processing big data required for machine learning. With this book, you will explore how to configure, execute, and deploy machine learning algorithms using Spark's scalable architecture and learn best practices for implementing real-world big data solutions. What this Book will help me do Understand the integration of Apache Spark with large-scale infrastructures for machine learning applications. Employ data processing techniques for preprocessing and feature engineering efficiently with Spark. Master the implementation of advanced supervised and unsupervised learning algorithms using Spark. Learn to deploy machine learning models within Spark ecosystems for optimized performance. Discover methods for analyzing big data trends and machine learning model tuning for improved accuracy. Author(s) The author, Deepak Gowda, is an experienced data scientist with over ten years of expertise in machine learning and big data. His career spans industries such as supply chain, cybersecurity, and more where he has utilized Apache Spark extensively. Deepak's teaching style is marked by clarity and practicality, making complex concepts approachable. Who is it for? Apache Spark for Machine Learning is tailored for data engineers, machine learning practitioners, and computer science students looking to advance their ability to process, analyze, and model using large datasets. If you're already familiar with basic machine learning and want to scale your solutions using Spark, this book is ideal for your studies and professional growth.

Apache Airflow Best Practices

2024-10-31 O'Reilly Amazon

book

Dylan Intorf , Dylan Storey , Kendrick van Doorn

data data-engineering apache-airflow Airflow AWS Cloud Computing

"Apache Airflow Best Practices" is your go-to guide for mastering data workflow orchestration using Apache Airflow. This book introduces you to core concepts and features of Airflow and helps you efficiently design, deploy, and manage workflows. With detailed examples and hands-on tutorials, you'll learn how to tackle real-world challenges in data engineering. What this Book will help me do Understand and utilize the features and updates introduced in Apache Airflow 2.x. Design and implement robust, scalable, and efficient data pipelines and workflows. Learn best practices for deploying Apache Airflow in cloud environments such as AWS and GCP. Extend Airflow's functionality with custom plugins and advanced configuration. Monitor, maintain, and scale your Airflow deployment effectively for high availability. Author(s) Dylan Intorf, Dylan Storey, and Kendrick van Doorn are seasoned professionals in data engineering, data strategy, and software development. Between them, they bring decades of experience working in diverse industries like finance, tech, and life sciences. They bring their expertise into this practical guide to help practitioners understand and master Apache Airflow. Who is it for? This book is tailored for data professionals such as data engineers, scientists, and system administrators, offering valuable insights for new learners and experienced users. If you're starting with workflow orchestration, seeking to optimize your current Airflow implementation, or scaling efforts, this book aligns with your goals. Readers should have a basic knowledge of Python programming and data engineering principles.

Building Modern Data Applications Using Databricks Lakehouse

2024-10-31 O'Reilly Amazon

book

Will Girten

data data-engineering storage-repositories data-lake AI/ML CI/CD

This book, "Building Modern Data Applications Using Databricks Lakehouse," provides a comprehensive guide for data professionals to master the Databricks platform. You'll learn to effectively build, deploy, and monitor robust data pipelines with Databricks' Delta Live Tables, empowering you to manage and optimize cloud-based data operations effortlessly. What this Book will help me do Understand the foundations and concepts of Delta Live Tables and its role in data pipeline development. Learn workflows to process and transform real-time and batch data efficiently using the Databricks lakehouse architecture. Master the implementation of Unity Catalog for governance and secure data access in modern data applications. Deploy and automate data pipeline changes using CI/CD, leveraging tools like Terraform and Databricks Asset Bundles. Gain advanced insights in monitoring data quality and performance, optimizing cloud costs, and managing DataOps tasks effectively. Author(s) Will Girten, the author, is a seasoned Solutions Architect at Databricks with over a decade of experience in data and AI systems. With a deep expertise in modern data architectures, Will is adept at simplifying complex topics and translating them into actionable knowledge. His books emphasize real-time application and offer clear, hands-on examples, making learning engaging and impactful. Who is it for? This book is geared towards data engineers, analysts, and DataOps professionals seeking efficient strategies to implement and maintain robust data pipelines. If you have a basic understanding of Python and Apache Spark and wish to delve deeper into the Databricks platform for streamlining workflows, this book is tailored for you.

Delta Lake: The Definitive Guide

2024-10-31 O'Reilly Amazon

book

Denny Lee , Scott Haines , Tristen Wentling , Prashanth Babu

data data-engineering storage-repositories delta-lake Flink Data Engineering

Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale. This book helps you: Understand key data reliability challenges and how Delta Lake solves them Explain the critical role of Delta transaction logs as a single source of truth Learn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and Trino Architect data lakehouses with the medallion architecture Optimize Delta Lake performance with features like deletion vectors and liquid clustering

Handling and Mapping Geographic Information

2024-10-29 O'Reilly Amazon

book

Hélène Mathian , Claire Cunty

data data-engineering location-data geographic-information-system-gis geographic information system (gis)

With the increasing proliferation of data and the systematization of geographic information referencing, maps are now a major concern – not only for specialists, but also for urban planning and development organizations and the general public. However, while producing a map may seem straightforward, the actual process of transforming data into a useful map with a specific purpose is characterized by a series of precise operations that require knowledge in a variety of fields: statistics, geography, cartography and so on. Handling and Mapping Geographic Information presents a wide range of operations based on a variety of examples. Each chapter adopts a different approach, explaining the methodological choices made in relation to the theme and the pursued objective. This approach, encompassing the entire map production process, will enable all readers, whether students, researchers, teachers or planners, to understand the multiple roles that maps can play in the analysis of geographical data.

LLM Engineer's Handbook

2024-10-22 O'Reilly Amazon

book

Paul Iusztin , Maxime Labonne

data ai-ml artificial-intelligence-ai generative-ai prompt-engineering AI/ML

The "LLM Engineer's Handbook" is your comprehensive guide to mastering Large Language Models from concept to deployment. Written by leading experts, it combines theoretical foundations with practical examples to help you build, refine, and deploy LLM-powered solutions that solve real-world problems effectively and efficiently. What this Book will help me do Understand the principles and approaches for training and fine-tuning Large Language Models (LLMs). Apply MLOps practices to design, deploy, and monitor your LLM applications effectively. Implement advanced techniques such as retrieval-augmented generation (RAG) and preference alignment. Optimize inference for high performance, addressing low-latency and high availability for production systems. Develop robust data pipelines and scalable architectures for building modular LLM systems. Author(s) Paul Iusztin and Maxime Labonne are experienced AI professionals specializing in natural language processing and machine learning. With years of industry and academic experience, they are dedicated to making complex AI concepts accessible and actionable. Their collaborative authorship ensures a blend of theoretical rigor and practical insights tailored for modern AI practitioners. Who is it for? This book is tailored for AI engineers, NLP professionals, and LLM practitioners who wish to deepen their understanding of Large Language Models. Ideal readers possess some familiarity with Python, AWS, and general AI concepts. If you aim to apply LLMs to real-world scenarios or enhance your expertise in AI-driven systems, this handbook is designed for you.

Aerospike: Up and Running

2024-10-15 O'Reilly Amazon

book

Albert Autin , V. Srinivasan , Paige Roberts , Tim Faulkes

data data-engineering nosql-databases NoSQL Cyber Security SQL

If you're a developer looking to build a distributed, resilient, scalable, high-performance application, you may be evaluating distributed SQL and NoSQL solutions. Perhaps you're considering the Aerospike database. This practical book shows developers, architects, and engineers how to get the highly scalable and extremely low-latency Aerospike database up and running. You will learn how to power your globally distributed applications and take advantage of Aerospike's hybrid memory architecture with the real-time performance of in-memory plus dependable persistence. After reading this book, you'll be able to build applications that can process up to tens of millions of transactions per second for millions of concurrent users on any scale of data. This practical guide provides: Step-by-step instructions on installing and connecting to Aerospike A clear explanation of the programming models available All the advice you need to develop your Aerospike application Coverage of issues such as administration, connectors, consistency, and security Code examples and tutorials to get you up and running quickly And more

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

2024-10-12 O'Reilly Amazon

book

Jason Yip , Nikhil Gupta

data data-engineering Databricks databricks-data-engineer-associate AI/ML Analytics

This book is your comprehensive guide to building robust Generative AI solutions using the Databricks Data Intelligence Platform. Databricks is the fastest-growing data platform offering unified analytics and AI capabilities within a single governance framework, enabling organizations to streamline their data processing workflows, from ingestion to visualization. Additionally, Databricks provides features to train a high-quality large language model (LLM), whether you are looking for Retrieval-Augmented Generation (RAG) or fine-tuning. Databricks offers a scalable and efficient solution for processing large volumes of both structured and unstructured data, facilitating advanced analytics, machine learning, and real-time processing. In today's GenAI world, Databricks plays a crucial role in empowering organizations to extract value from their data effectively, driving innovation and gaining a competitive edge in the digital age. This book will not only help you master the Data Intelligence Platform but also help power your enterprise to the next level with a bespoke LLM unique to your organization. Beginning with foundational principles, the book starts with a platform overview and explores features and best practices for ingestion, transformation, and storage with Delta Lake. Advanced topics include leveraging Databricks SQL for querying and visualizing large datasets, ensuring data governance and security with Unity Catalog, and deploying machine learning and LLMs using Databricks MLflow for GenAI. Through practical examples, insights, and best practices, this book equips solution architects and data engineers with the knowledge to design and implement scalable data solutions, making it an indispensable resource for modern enterprises. Whether you are new to Databricks and trying to learn a new platform, a seasoned practitioner building data pipelines, data science models, or GenAI applications, or even an executive who wants to communicate the value of Databricks to customers, this book is for you. With its extensive feature and best practice deep dives, it also serves as an excellent reference guide if you are preparing for Databricks certification exams. What You Will Learn Foundational principles of Lakehouse architecture Key features including Unity Catalog, Databricks SQL (DBSQL), and Delta Live Tables Databricks Intelligence Platform and key functionalities Building and deploying GenAI Applications from data ingestion to model serving Databricks pricing, platform security, DBRX, and many more topics Who This Book Is For Solution architects, data engineers, data scientists, Databricks practitioners, and anyone who wants to deploy their Gen AI solutions with the Data Intelligence Platform. This is also a handbook for senior execs who need to communicate the value of Databricks to customers. People who are new to the Databricks Platform and want comprehensive insights will find the book accessible.

Data Engineering Best Practices

2024-10-11 O'Reilly Amazon

book

Richard J. Schiller , David Larochelle

data data-engineering Agile/Scrum AI/ML Analytics Big Data

Unlock the secrets to building scalable and efficient data architectures with 'Data Engineering Best Practices.' This book provides in-depth guidance on designing, implementing, and optimizing cloud-based data pipelines. You will gain valuable insights into best practices, agile workflows, and future-proof designs. What this Book will help me do Effectively plan and architect scalable data solutions leveraging cloud-first strategies. Master agile processes tailored to data engineering for improved project outcomes. Implement secure, efficient, and reliable data pipelines optimized for analytics and AI. Apply real-world design patterns and avoid common pitfalls in data flow and processing. Create future-ready data engineering solutions following industry-proven frameworks. Author(s) Richard J. Schiller and David Larochelle are seasoned data engineering experts with decades of experience crafting efficient and secure cloud-based infrastructures. Their collaborative writing distills years of real-world expertise into practical advice aimed at helping engineers succeed in a rapidly evolving field. Who is it for? This book is ideal for data engineers, ETL specialists, and big data professionals seeking to enhance their knowledge in cloud-based solutions. Some familiarity with data engineering, ETL pipelines, and big data technologies is helpful. It suits those keen on mastering advanced practices, improving agility, and developing efficient data pipelines. Perfect for anyone looking to future-proof their skills in data engineering.

Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric

2024-10-10 O'Reilly Amazon

book

Bob Ward

data data-engineering relational-databases azure-sql-database AI/ML Azure

Access detailed content and examples on Azure SQL, a set of cloud services that allows for SQL Server to be deployed in the cloud. This book teaches the fundamentals of deployment, configuration, security, performance, and availability of Azure SQL from the perspective of these same tasks and capabilities in SQL Server. This distinct approach makes this book an ideal learning platform for readers familiar with SQL Server on-premises who want to migrate their skills toward providing cloud solutions to an enterprise market that is increasingly cloud-focused. If you know SQL Server, you will love this book. You will be able to take your existing knowledge of SQL Server and translate that knowledge into the world of cloud services from the Microsoft Azure platform, and in particular into Azure SQL. This book provides information never seen before about the history and architecture of Azure SQL. Author Bob Ward is a leading expert with access to and support from the Microsoft engineering team that built Azure SQL and related database cloud services. He presents powerful, behind-the-scenes insights into the workings of one of the most popular database cloud services in the industry. This book also brings you the latest innovations for Azure SQL including Azure Arc, Hyperscale, generative AI applications, Microsoft Copilots, and integration with the Microsoft Fabric. What You Will Learn Know the history of Azure SQL Deploy, configure, and connect to Azure SQL Choose the correct way to deploy SQL Server in Azure Migrate existing SQL Server instances to Azure SQL Monitor and tune Azure SQL’s performance to meet your needs Ensure your data and application are highly available Secure your data from attack and theft Learn the latest innovations for Azure SQL including Hyperscale Learn how to harness the power of AI for generative data-driven applications and Microsoft Copilots for assistance Learn how to integrate Azure SQL with the unified data platform, the Microsoft Fabric Who This Book Is For This book is designed to teach SQL Server in the Azure cloud to the SQL Server professional. Anyone who operates, manages, or develops applications for SQL Server will benefit from this book. Readers will be able to translate their current knowledge of SQL Server—especially of SQL Server 2019 and 2022—directly to Azure. This book is ideal for database professionals looking to remain relevant as their customer base moves into the cloud.

Financial Data Engineering

2024-10-09 O'Reilly Amazon

book

Tamer Khraisha

data data-engineering API Data Engineering Data Governance

Today, investment in financial technology and digital transformation is reshaping the financial landscape and generating many opportunities. Too often, however, engineers and professionals in financial institutions lack a practical and comprehensive understanding of the concepts, problems, techniques, and technologies necessary to build a modern, reliable, and scalable financial data infrastructure. This is where financial data engineering is needed. A data engineer developing a data infrastructure for a financial product possesses not only technical data engineering skills but also a solid understanding of financial domain-specific challenges, methodologies, data ecosystems, providers, formats, technological constraints, identifiers, entities, standards, regulatory requirements, and governance. This book offers a comprehensive, practical, domain-driven approach to financial data engineering, featuring real-world use cases, industry practices, and hands-on projects. You'll learn: The data engineering landscape in the financial sector Specific problems encountered in financial data engineering The structure, players, and particularities of the financial data domain Approaches to designing financial data identification and entity systems Financial data governance frameworks, concepts, and best practices The financial data engineering lifecycle from ingestion to production The varieties and main characteristics of financial data workflows How to build financial data pipelines using open source tools and APIs Tamer Khraisha, PhD, is a senior data engineer and scientific author with more than a decade of experience in the financial sector.

Platform Engineering

2024-10-08 O'Reilly Amazon

book

Ian Nowland , Camille Fournier

it-operations DevOps platform-engineering Agile/Scrum Cloud Computing

Until recently, infrastructure was the backbone of organizations operating software they developed in-house. But now that cloud vendors run the computers, companies can finally bring the benefits of agile custom-centricity to their own developers. Adding product management to infrastructure organizations is now all the rage. But how's that possible when infrastructure is still the operational layer of the company? This practical book guides engineers, managers, product managers, and leaders through the shifts that modern platform-led organizations require. You'll learn what platform engineering is—and isn't—and what benefits and value it brings to developers and teams. You'll understand what it means to approach a platform as a product and learn some of the most common technical and managerial barriers to success. With this book, you'll: Cultivate a platform-as-product, developer-centric mindset Learn what platform engineering teams are and are not Start the process of adopting platform engineering within your organization Discover what it takes to become a product manager for a platform team Understand the challenges that emerge when you scale platforms Automate processes and self-service infrastructure to speed development and improve developer experience Build out, hire, manage, and advocate for a platform team

Data Security Blueprints

2024-10-02 O'Reilly Amazon

book

Federico Castanedo

data data-engineering AI/ML Data Science Cyber Security

Once you decide to implement a data security strategy, it can be difficult to know where to start. With so many potential threats and challenges to resolve, teams often try to fix everything at once. But this boil-the-ocean approach is difficult to manage efficiently and ultimately leads to frustration, confusion, and halted progress. There's a better way to go. In this report, data science and AI leader Federico Castanedo shows you what to look for in a data security platform that will deliver the speed, scale, and agility you need to be successful in today's fast-paced, distributed data ecosystems. Unlike other resources that focus solely on data security concepts, this guide provides a road map for putting those concepts into practice. This report reveals: The most common data security use cases and their potential challenges What to look for in a data security solution that's built for speed and scale Why increasingly decentralized data architectures require centralized, dynamic data security mechanisms How to implement the steps required to put common use cases into production Methods for assessing risks—and controls necessary to mitigate those risks How to facilitate cross-functional collaboration to put data security into practice in a scalable, efficient way You'll examine the most common data security use cases that global enterprises across every industry aim to achieve, including the specific steps needed for implementation as well as the potential obstacles these use cases present. Federico Castanedo is a data science and AI leader with extensive experience in academia, industry, and startups. Having held leadership positions at DataRobot and Vodafone, he has a successful track record of leading high-performing data science teams and developing data science and AI products with business impact.

Advanced interactive interfaces with Access: Building Interactive Interfaces with VBA

2024-10-01 O'Reilly Amazon

book

Alessandro Grimaldi

data data-engineering database-management-tools microsoft-access VBA

Explore and learn advanced techniques for working with graphical, interactive interfaces that can be built in Access. This book starts with best practices and tips to write code using VBA, and covers how to implement them in a real-world scenario. You will learn how to create and use VBA classes. An introduction to the binary code and the "bit vector" technique is discussed, followed by the implementation of a drag-and-drop engine. You also will learn how to design a timeline, and make it scrollable. What You Will Learn Write readable, easy-to-maintain code Add a drag-and-drop engine to an Access application Apply variations to the drag-and-drop technique to create different graphical effects Embed a scrollable timeline in an Access application, on which objects can be dynamically placed Who This Book Is For VBA developers

In-Memory Analytics with Apache Arrow - Second Edition

2024-09-30 O'Reilly Amazon

book

Matthew Topol

data data-engineering apache-arrow Analytics Arrow Dremio

Dive into efficient data handling with 'In-Memory Analytics with Apache Arrow.' This book explores Apache Arrow, a powerful open-source project that revolutionizes how tabular and hierarchical data are processed. You'll learn to streamline data pipelines, accelerate analysis, and utilize high-performance tools for data exchange. What this Book will help me do Understand and utilize the Apache Arrow in-memory data format for your data analysis needs. Implement efficient and high-speed data pipelines using Arrow subprojects like Flight SQL and Acero. Enhance integration and performance in analysis workflows by using tools like Parquet and Snowflake with Arrow. Master chaining and reusing computations across languages and environments with Arrow's cross-language support. Apply in real-world scenarios by integrating Apache Arrow with analytics systems like Dremio and DuckDB. Author(s) Matthew Topol, the author of this book, brings 15 years of technical expertise in the realm of data processing and analysis. Having worked across various environments and languages, Matthew offers insights into optimizing workflows using Apache Arrow. His approachable writing style ensures that complex topics are comprehensible. Who is it for? This book is tailored for developers, data engineers, and data scientists eager to enhance their analytic toolset. Whether you're a beginner or have experience in data analysis, you'll find the concepts actionable and transformative. If you are curious about improving the performance and capabilities of your analytic pipelines or tools, this book is for you.

Take Control of Securing Your Apple Devices

2024-09-30 O'Reilly Amazon

book

Glenn Fleishman

data data-engineering data-security-privacy data security & privacy Cloud Computing Cyber Security

Keep your Mac, iPhone, and iPad safe! Version 1.1.1, published September 28, 2025 Secure your Mac, iPhone, or iPad against attacks from the internet, physical intrusion, and more with the greatest of ease. Glenn Fleishman guides you through protecting yourself from phishing, email, and other exploits, as well as network-based invasive behavior. Learn about built-in privacy settings, the Secure Enclave, FileVault, hardware encryption keys, sandboxing, privacy settings, Advanced Data Protection, Lockdown Mode, resetting your password when all hope seems lost, and much more. The digital world is riddled with danger, even as Apple has done a fairly remarkable job at keeping our Macs, iPhones, and iPads safe. But the best security strategy is staying abreast of past risks and anticipating future ones. This book gives you all the insight and directions you need to ensure your Apple devices and their data are safe. It's up to date with macOS 26 Tahoe, iOS 26, and iPadOS 26. You’ll learn about the enhanced Advanced Data Protection option for iCloud services, allowing you to keep all your private data inaccessible not just to thieves and unwarranted government intrusion, but even to Apple! Also get the rundown on Lockdown Mode to deter direct network and phishing attacks, passkeys and hardware secure keys for the highest level of security for Apple Account and website logins, and Mac-specific features such as encrypted startup volumes and FileVault’s login protection process. Security and privacy are tightly related, and this book helps you understand how macOS, iOS, and iPadOS have increasingly compartmentalized and protected your personal data, and how to allow only the apps you want to access specific folders, your contacts, and other information. Here’s what this book has to offer:

Master the privacy settings on your Mac, iPhone, and iPad Calculate your level of risk and your tolerance for it Use Apple’s Stolen Device Protection feature for iPhone that deflects thieves who extract your passcode through coercion or misdirection. Learn why you’re asked to give permission for apps to access folders and personal data on your Mac Moderate access to your audio, video, screen actions, and other hardware inputs and outputs Get to know the increasing layers of system security deployed over the past few years Prepare against a failure or error that might lock you out of your device Share files and folders securely over a network and through cloud services Upgrade your iCloud data protection to use end-to-end encryption Control other low-level security options to reduce the risk of someone gaining physical access to your Mac—or override them to install system extensions Understand FileVault encryption and protection for Mac, and avoid getting locked out Investigate the security of a virtual private network (VPN) to see whether you should use one Learn how the Secure Enclave in Macs with a T2 chip or M-series Apple silicon affords hardware-level protections Dig into ransomware, the biggest potential threat to Mac users (though rare in practice) Discover recent security and privacy technologies, such as Lockdown Mode and passkeys Learn why your iPhone may restart automatically if it's been idle for several days

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

2024-09-27 O'Reilly Amazon

book

Pavan Kumar Narayanan

data ai-ml machine-learning AI/ML Airflow Analytics

This book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists

Beginning MongoDB Atlas with .NET: Flexible and Scalable Document Data Storage for .NET Developers

2024-09-17 O'Reilly Amazon

book

Luce Carter

data data-engineering nosql-databases MongoDB API Cloud Computing

This book is a tutorial on MongoDB customized for developers working in Microsoft .NET 6, .NET 7, and beyond. It explains the differences between relational database systems and the document model supported by MongoDB, and shows how to build .NET applications that run against a MongoDB database, especially one in the cloud. Author Luce Carter kicks things off by teaching you how to determine when to use a document database versus a relational engine. After that, she walks you through building a Microsoft .NET project combining the MongoDB Atlas cloud database as a service solution with a .NET. application. In the process, you will learn how to create, read, update, and delete data in MongoDB from any .NET project. You will come away from this book with a solid understanding of MongoDB’s Developer Data Platform and how to use it from your .NET applications. You’ll be able to connect to MongoDB in the cloud and take advantage of the flexibility and scalability that MongoDB’s document storage model provides, and you’ll understand how to craft your applications to run using document storage and the MongoDB database engine. What You Will Learn Know when to use the MongoDB document model Build .NET applications that connect to MongoDB for data storage Create MongoDB clusters on the MongoDB Atlas cloud platform Store data in MongoDB Atlas Create, Read, Update, and Delete (CRUD) data from .NET Web API projects Test your CRUD endpoints using RESTful operations Validate schemas to help protect against breaking changes Who This Book Is For .NET developers who are looking for an alternative to relational databases, and those looking for a flexible and scalable document storage solution for use from .NET applications. Additionally, anyone wanting to learn MongoDB in the context of .NET and C# will benefit from this book.

Implementing Data Mesh

2024-09-04 O'Reilly Amazon

book

Jean-Georges Perrin , Eric Broda

data data-engineering database-architecture data-mesh Fabric

As data continues to grow and become more complex, organizations seek innovative solutions to manage their data effectively. Data mesh is one solution that provides a new approach to managing data in complex organizations. This practical guide offers step-by-step guidance on how to implement data mesh in your organization. In this book, Jean-Georges Perrin and Eric Broda focus on the key components of data mesh and provide practical advice supported by code. Data engineers, architects, and analysts will explore a simple and intuitive process for identifying key data mesh components and data products. You'll learn a consistent set of interfaces and access methods that make data products easy to consume. This approach ensures that your data products are easily accessible and the data mesh ecosystem is easy to navigate. This book helps you: Identify, define, and build data products that interoperate within an enterprise data mesh Build a data mesh fabric that binds data products together Build and deploy data products in a data mesh Establish the organizational structure to operate data products, data platforms, and data fabric Learn an innovative architecture that brings data products and data fabric together into the data mesh About the authors: Jean-Georges "JG" Perrin is a technology leader focusing on building innovative and modern data platforms. Eric Broda is a technology executive, practitioner, and founder of a boutique consulting firm that helps global enterprises realize value from data.

Amazon DynamoDB - The Definitive Guide

2024-08-30 O'Reilly Amazon

book

Mike Mackay , Aman Dhingra

data data-engineering nosql-databases DynamoDB AWS Cloud Computing

Master Amazon DynamoDB, the serverless NoSQL database designed for lightning-fast performance and scalability, with this definitive guide. You'll delve into its features, learn advanced concepts, and acquire practical skills to harness DynamoDB for modern application development. What this Book will help me do Understand AWS DynamoDB fundamentals for real-world applications. Model and optimize NoSQL databases with advanced techniques. Integrate DynamoDB into scalable, high-performance architectures. Utilize DynamoDB indexing, caching, and analytical features effectively. Plan and execute RDBMS to NoSQL data migrations successfully. Author(s) None Dhingra, an AWS DynamoDB solutions expert, and None Mackay, a seasoned NoSQL architect, bring their combined expertise straight from Amazon Web Services to guide you step-by-step in mastering DynamoDB. Combining comprehensive technical knowledge with approachable explanations, they empower readers to implement practical and efficient data strategies. Who is it for? This book is ideal for software developers and architects seeking to deepen their knowledge about AWS solutions like DynamoDB, engineering managers aiming to incorporate scalable NoSQL solutions into their projects, and data professionals transitioning from RDBMS towards a serverless data approach. Individuals with basic knowledge in cloud computing or database systems and those ready to advance in DynamoDB will find this book particularly beneficial.

Full Stack FastAPI, React, and MongoDB - Second Edition

2024-08-23 O'Reilly Amazon

book

Shubham Ranjan , Marko Aleksendrić , Rachelle Palmer , Shrey Batra

data data-engineering nosql-databases MongoDB AI/ML API

Full Stack FastAPI, React, and MongoDB guides you step-by-step through creating web applications using the FARM stack. This hands-on resource teaches you how to integrate FastAPI, a modern Python framework, React for front-end development, and MongoDB for data storage to build and deploy powerful, scalable web applications. What this Book will help me do Master the essentials of MongoDB, including creating and managing document-based databases. Gain proficiency in building APIs using FastAPI and Python for robust backend systems. Develop dynamic frontends using React, integrating seamlessly with a FastAPI backend. Securely authenticate and authorize users using JSON Web Tokens in your applications. Explore advanced features like integrating AI models and building with Next.js for production-ready development. Author(s) Marko Aleksendrić, Shrey Batra, Rachelle Palmer, and Shubham Ranjan combine their expertise in web development and software engineering in this book. Together, they bring years of professional experience and a passion for teaching developers to create modern web applications effectively using cutting-edge tools. Who is it for? Intermediate web developers who possess foundational JavaScript and Python skills are the ideal audience for this book. If you want to advance your skills by mastering modern web application development with the FARM stack, this book will guide you comprehensively. With practical, real-world examples, it is designed for developers aiming to build production-grade applications.

Streaming Databases

2024-08-15 O'Reilly Amazon

book

Ralph Matthias Debusmann , Hubert Dulay

data data-engineering streaming-messaging streaming-architecture Analytics Data Streaming

Real-time applications are becoming the norm today. But building a model that works properly requires real-time data from the source, in-flight stream processing, and low latency serving of its analytics. With this practical book, data engineers, data architects, and data analysts will learn how to use streaming databases to build real-time solutions. Authors Hubert Dulay and Ralph M. Debusmann take you through streaming database fundamentals, including how these databases reduce infrastructure for real-time solutions. You'll learn the difference between streaming databases, stream processing, and real-time online analytical processing (OLAP) databases. And you'll discover when to use push queries versus pull queries, and how to serve synchronous and asynchronous data emanating from streaming databases. This guide helps you: Explore stream processing and streaming databases Learn how to build a real-time solution with a streaming database Understand how to construct materialized views from any number of streams Learn how to serve synchronous and asynchronous data Get started building low-complexity streaming solutions with minimal setup

Big Data, 2nd Edition

2024-08-01 O'Reilly Amazon

book

Hassan A. Karimi

data data-engineering location-data geographic-information-system-gis geographic information system (gis) Big Data

This revised new edition provides up-to-date knowledge on the latest developments related to these three fields for solving geoinformatics problems. There are seven new chapters, and each of them focuses on a separate real-world problem to which deep learning is applied.

MuleSoft Platform Architect's Guide

2024-07-31 O'Reilly Amazon

book

Jitendra Bafna , Jim Andrews

data data-engineering streaming-messaging enterprise-service-bus mule-esb API

The "MuleSoft Platform Architect's Guide" is your essential resource for mastering API-driven solutions using MuleSoft Anypoint Platform. This book enables you to design, deploy, and operate scalable, secure, and high-performance API architectures in enterprise settings while preparing for MuleSoft Platform Architect certification. What this Book will help me do Design robust API integration solutions using MuleSoft Anypoint Platform. Successfully deploy applications to CloudHub and Runtime Fabric environments. Monitor and operate APIs with advanced management tools. Implement scalable solutions aligned with business outcomes. Prepare confidently for the MuleSoft Platform Architect certification. Author(s) Jitendra Bafna is a Senior Solution Architect with years of experience optimizing MuleSoft implementations. Jim Andrews, a MuleSoft Evangelist, has dedicated his career to guiding others in achieving enterprise-ready API solutions. Together, they share practical knowledge, step-by-step guidance, and expertise in API and integration mastery. Who is it for? This book is perfect for IT architects and senior developers experienced in API development, especially those familiar with MuleSoft. It's tailored for professionals aiming to master Anypoint Platform or pursue MuleSoft Platform Architect certification. Readers should have basic experience with integration platforms and a willingness to explore advanced API design.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Data-driven Models in Inverse Problems

Apache Spark for Machine Learning

Apache Airflow Best Practices

Building Modern Data Applications Using Databricks Lakehouse

Delta Lake: The Definitive Guide

Handling and Mapping Geographic Information

LLM Engineer's Handbook

Aerospike: Up and Running

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

Data Engineering Best Practices

Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric

Financial Data Engineering

Platform Engineering

Data Security Blueprints

Advanced interactive interfaces with Access: Building Interactive Interfaces with VBA

In-Memory Analytics with Apache Arrow - Second Edition

Take Control of Securing Your Apple Devices

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

Beginning MongoDB Atlas with .NET: Flexible and Scalable Document Data Storage for .NET Developers

Implementing Data Mesh

Amazon DynamoDB - The Definitive Guide

Full Stack FastAPI, React, and MongoDB - Second Edition

Streaming Databases

Big Data, 2nd Edition

MuleSoft Platform Architect's Guide