O'Reilly Data Engineering Books

Snowflake: The Definitive Guide, 2nd Edition

2027-05-25 O'Reilly Amazon

book

Joyce Kaye Avila

data data-engineering Snowflake AI/ML Analytics Cloud Computing

Snowflake is reshaping data management by integrating AI, analytics, and enterprise workloads into a single cloud platform. Snowflake: The Definitive Guide is a comprehensive resource for data architects, engineers, and business professionals looking to harness Snowflake's evolving capabilities, including Cortex AI, Snowpark, and Polaris Catalog for Apache Iceberg. This updated edition provides real-world strategies and hands-on activities for optimizing performance, securing data, and building AI-driven applications. With hands-on SQL examples and best practices, this book helps readers process structured and unstructured data, implement scalable architectures, and integrate Snowflake's AI tools seamlessly. Whether you're setting up accounts, managing access controls, or leveraging generative AI, this guide equips you with the expertise to maximize Snowflake's potential. Implement AI-powered workloads with Snowflake Cortex Explore Snowsight and Streamlit for no-code development Ensure security with access control and data governance Optimize storage, queries, and computing costs Design scalable data architectures for analytics and machine learning

The Data Engineer's Guide to Microsoft Fabric

2027-05-25 O'Reilly Amazon

book

Christian Henrik Reich

data data-science analytics-platforms microsoft-fabric Data Engineering Data Lakehouse

Modern data engineering is evolving; and with Microsoft Fabric, the entire data platform experience is being redefined. This essential book offers a fresh, hands-on approach to navigating this shift. Rather than being an introduction to features, this guide explains how Fabric's key components—Lakehouse, Warehouse, and Real-Time Intelligence—work under the hood and how to put them to use in realistic workflows. Written by Christian Henrik Reich, a data engineering expert with experience that extends from Databricks to Fabric, this book is a blend of foundational theory and practical implementation of lakehouse solutions in Fabric. You'll explore how engines like Apache Spark and Fabric Warehouse collaborate with Fabric's Real-Time Intelligence solution in an integrated platform, and how to build ETL/ELT pipelines that deliver on speed, accuracy, and scale. Ideal for both new and practicing data engineers, this is your entry point into the fabric of the modern data platform. Acquire a working knowledge of lakehouses, warehouses, and streaming in Fabric Build resilient data pipelines across real-time and batch workloads Apply Python, Spark SQL, T-SQL, and KQL within a unified platform Gain insight into architectural decisions that scale with data needs Learn actionable best practices for engineering clean, efficient, governed solutions

Universal Data Modeling

2027-05-25 O'Reilly Amazon

book

Jun Shan

data data-engineering data-models AI/ML Analytics Data Modelling

Most data professionals work with multiple datasets scattered across teams, systems, and formats. But without a clear modeling strategy, the result is often chaos: mismatched schemas, fragile pipelines, and a constant fight to make sense of the noise. This essential guide offers a better way by introducing a practical framework for designing high-quality data models that work across platforms while supporting the growing demands of AI, analytics, and real-time systems. Author Jun Shan bridges the gap between disconnected modeling approaches and the need for a unified, system-agnostic methodology. Whether you're building a new data platform or rethinking legacy infrastructure, Universal Data Modeling gives you the clarity, patterns, and tools to model data that's consistent, resilient, and ready to scale. Connect conceptual, logical, and physical modeling phases with confidence Apply best-fit techniques across relational, semistructured, and NoSQL formats Improve data quality, clarity, and maintainability across your organization Support modern design paradigms like data mesh and data products Translate domain knowledge into models that empower teams Build flexible, scalable models that stand the test of technology change

PostgreSQL: Up and Running, 4th Edition

2027-02-25 O'Reilly Amazon

book

Leo S. Hsu , Regina O. Obe

data data-engineering relational-databases postgresql JSON SQL

Thinking of migrating to PostgreSQL? This concise introduction helps you understand and use this open source database system. Not only will you learn about the new enterprise class features in versions 16 to 18, but you'll also discover all that PostgreSQL has to offer—much more than a relational database system. As an open source product, it has hundreds of plug-ins, expanding the capability of PostgreSQL beyond all other database systems. With examples throughout, this book shows you how to perform tasks that are difficult or impossible in other databases. The revised fourth edition covers the latest features of Postgres, such as ISO-SQL constructs rarely found in other databases, foreign data wrapper (FDW) enhancements, JSON constructs, multirange data types, query parallelization, and replication. If you're an experienced PostgreSQL user, you'll pick up gems you may have missed before. Learn basic administration tasks such as role management, database creation, backup, and restore Use psql command-line utility and the pgAdmin graphical administration tool Explore PostgreSQL tables, constraints, and indexes Learn powerful SQL constructs not generally found in other databases Use several different languages to write database functions and stored procedures Tune your queries to run as fast as your hardware will allow Query external and variegated data sources with foreign data wrappers Learn how to use built-in replication to replicate data

AI Engineering Interviews

2026-12-25 O'Reilly Amazon

book

Mina Ghashami , Ali Torkamani

data ai-ml artificial-intelligence-ai generative-ai AI/ML GenAI

Generative AI is rapidly spreading across industries, and companies are actively hiring people who can design, build, and deploy these systems. But to land one of these roles, you'll have to get through the interview first. Generative AI Interviews walks you through every stage of the interview process, giving you an insider's perspective that will help you build confidence and stand out. This handy guide features 300 real-world interview questions organized by difficulty level, each with a clear outline of what makes a good answer, common pitfalls to avoid, and key points you shouldn't miss. What sets this book apart from others is Mina Ghashami and Ali Torkamani's knack for simplifying complex concepts into intuitive explanations, accompanied by compelling illustrations that make learning engaging. If you're looking for a guide to cracking GenAI interviews, this is it. Master GenAI interviews for roles from fundamental to advanced Explore 300 real industry interview questions with model answers and breakdowns Learn a step-by-step approach to explaining architecture, training, inference, and evaluation Get actionable insights that will help you stand out in even the most competitive hiring process

Context Engineering with DSPy

2026-12-25 O'Reilly Amazon

book

Mike Taylor

data ai-ml artificial-intelligence-ai generative-ai AI/ML LLM

AI agents need the right context at the right time to do a good job. Too much input increases cost and harms accuracy, while too little causes instability and hallucinations. Context Engineering with DSPy introduces a practical, evaluation-driven way to design AI systems that remain reliable, predictable, and easy to maintain as they grow. AI engineer and educator Mike Taylor explains DSPy in a clear, approachable style, showing how its modular structure, portable programs, and built-in optimizers help teams move beyond guesswork. Through real examples and step-by-step guidance, you'll learn how DSPy's signatures, modules, datasets, and metrics work together to solve context engineering problems that evolve as models change and workloads scale. This book supports AI engineers, data scientists, machine learning practitioners, and software developers building AI agents, retrieval-augmented generation (RAG) systems, and multistep reasoning workflows that hold up in production. Understand the core ideas behind context engineering and why they matter Structure LLM pipelines with DSPy's maintainable, reusable components Apply evaluation-driven optimizers like GEPA and MIPROv2 for measurable improvements Create reproducible RAG and agentic workflows with clear metrics Develop AI systems that stay robust across providers, model updates, and real-world constraints

Building Data Products

2026-11-25 O'Reilly Amazon

book

Jean-Georges Perrin

data data-engineering AI/ML API CI/CD Data Contracts

As organizations grapple with fragmented data, siloed teams, and inconsistent pipelines, data products have emerged as a practical solution for delivering trusted, scalable, and reusable data assets. In Building Data Products, Jean-Georges Perrin provides a comprehensive, standards-driven playbook for designing, implementing, and scaling data products that fuel innovation and cross-functional collaboration—whether or not your organization adopts a full data mesh strategy. Drawing on extensive industry experience and practitioner interviews, Perrin shows readers how to build metadata-rich, governed data products aligned to business domains. Covering foundational concepts, real-world use cases, and emerging standards like Bitol ODPS and ODCS, this guide offers step-by-step implementation advice and practical code examples for key stages—ownership, observability, active metadata, compliance, and integration. Design data products for modular reuse, discoverability, and trust Implement standards-driven architectures with rich metadata and security Incorporate AI-driven automation, SBOMs, and data contracts Scale product-driven data strategies across teams and platforms Integrate data products into APIs, CI/CD pipelines, and DevOps practices

Evals for AI Engineers

2026-10-25 O'Reilly Amazon

book

Hamel Husain , Shreya Shankar

data ai-ml artificial-intelligence-ai artificial intelligence (ai) AI/ML LLM

Stop using guesswork to find out how your AI applications are performing. Evals for AI Engineers equips you with the proven tools and processes required to systematically test, measure, and enhance the reliability of AI applications, especially those using LLMs. Written by AI engineers with extensive experience in real-world consulting (across 35+ AI products) and cutting-edge research, this practical resource will help you move from assumptions to robust, data-driven evaluation. Ideal for software engineers, technical product managers, and technical leads, this hands-on guide dives into techniques like error analysis, synthetic data generation, automated LLM-as-a-judge systems, production monitoring, and cost optimization. You'll learn how to debug LLM behavior, design test suites based on synthetic and real data, and build data flywheels that improve over time. Whether you're starting without user data or scaling a production system, you'll gain the skills to build AI you can trust—with processes that are repeatable, measurable, and aligned with real-world outcomes. Run systematic error analyses to uncover, categorize, and prioritize failure modes Build, implement, and automate evaluation pipelines using code-based and LLM-based metrics Optimize AI performance and costs through smart evaluation and feedback loops Apply key principles and techniques for monitoring AI applications in production

Practical Statistics for Data Scientists, 3rd Edition

2026-08-25 O'Reilly Amazon

book

Andrew Bruce , Peter Bruce , Peter Gedeck

data data-science data-science-tasks statistics Data Science Python

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. And many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

Elasticsearch Query Language the Definitive Guide

2026-06-26 O'Reilly Amazon

book

Bahaaldine Azarmi , Alexis Charveriat , Stephen Brown , Farbod Shirzadian , Alejandro Sanchez

data data-engineering search elasticsearch Analytics BI

Streamline your workflow with ESQL enhance data analysis with real-time insights, and speed up aggregations and visualizations Key Features Apply ESQL efficiently in analytics, observability, and cybersecurity Optimize performance and scalability for high-demand environments Discover how to visualize and debug ESQL queries Purchase of the print or Kindle book includes a free PDF eBook Book Description Built to simplify high-scale data analytics in Elasticsearch, this practical guide will take you from foundational concepts to advanced applications across search, observability, and security. It will help you overcome common challenges such as efficiently querying large datasets, applying advanced analytics without deep prior knowledge, and resolving for a unique and consolidated query language. Written by senior experts at Elastic with extensive field experience, this book delivers actionable guidance rooted in solving today’s data challenges at scale. After introducing ESQL and its architecture, the chapters explore real-world applications across various domains, including analytics, raw log analysis, observability, and cybersecurity. Advanced topics such as scaling, optimization, and future developments are also covered to help you maximize your ESQL capabilities. By the end of this book, you’ll be able to leverage ESQL for comprehensive data management and analysis, optimizing your workflows and enhancing your productivity with Elasticsearch. What you will learn Gain a solid understanding of ESQL and its architecture Use ESQL for data analysis and performance monitoring Apply ESQL in cybersecurity for threat detection and incident response Find out how to perform advanced searches using ESQL Prepare for future ESQL developments Showcase ESQL in action through real-world, persona-driven use cases Who this book is for If you’re an Elasticsearch user, this book is essential for your growth. Whether you’re a data analyst looking to build analytics on top of Elasticsearch, an SRE monitoring the health of your IT system, or a cybersecurity analyst, this book will give you a complete understanding of how ESQL is built and used. Additionally, database administrators, business intelligence professionals, and operational intelligence professionals will find this book invaluable. Even with a beginner-level knowledge of Elasticsearch, you’ll be able to get started and make the most of this comprehensive guide.

Observability Engineering, 2nd Edition

2026-06-25 O'Reilly Amazon

book

Charity Majors , Liz Fong-Jones , George Miranda

it-operations monitoring observability Analytics

Observability is the only way to engineer, manage, and improve the business-critical systems that customers depend on every day—and as the complexity of software grows, so does the need for observability. With this thoroughly revised second edition, authors Charity Majors, Liz Fong-Jones, and George Miranda take inventory of the current state of the field and explain how practitioners can evolve their observability practices from collecting separate, disparate signals to unified data workflows. This book is for any software engineering team, large or small, that must understand the unique customer experience in order to ship quality code and features that customers want, at the right velocity. You'll discover the value that observable systems bring and learn concrete steps you can follow to achieve an observability-driven development practice yourself. And four completely new chapters explore recent trends such as large language models, frontend observability, cost optimization/performance engineering, and practical open source tooling. Understand the impact observability has across the entire software development lifecycle Learn how and why different functional teams use observability with service-level objectives Implement modern observability practices in your organization Maximize the cost-effectiveness of observability tooling Produce quality code for context-aware system debugging and maintenance Use data-rich analytics to quickly find answers when maintaining site reliability

Data Engineering for Multimodal AI

2026-05-25 O'Reilly Amazon

book

Vasundra Srinivasan

data data-engineering AI/ML Cloud Computing Data Engineering Data Governance

A shift is underway in how organizations approach data infrastructure for AI-driven transformation. As multimodal AI systems and applications become increasingly sophisticated and data hungry, data systems must evolve to meet these complex demands. Data Engineering for Multimodal AI is one of the first practical guides for data engineers, machine learning engineers, and MLOps specialists looking to rapidly master the skills needed to build robust, scalable data infrastructures for multimodal AI systems and applications. You'll follow the entire lifecycle of AI-driven data engineering, from conceptualizing data architectures to implementing data pipelines optimized for multimodal learning in both cloud native and on-premises environments. And each chapter includes step-by-step guides and best practices for implementing key concepts. Design and implement cloud native data architectures optimized for multimodal AI workloads Build efficient and scalable ETL processes for preparing diverse AI training data Implement real-time data processing pipelines for multimodal AI inference Develop and manage feature stores that support multiple data modalities Apply data governance and security practices specific to multimodal AI projects Optimize data storage and retrieval for various types of multimodal ML models Integrate data versioning and lineage tracking in multimodal AI workflows Implement data-quality frameworks to ensure reliable outcomes across data types Design data pipelines that support responsible AI practices in a multimodal context

High Performance Spark, 2nd Edition

2026-05-25 O'Reilly Amazon

book

Rachel Warren , Adi Polak , Holden Karau

data data-engineering apache-spark AI/ML Data Science Kubernetes

Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau, Rachel Warren, and Anya Bida walk you through the secrets of the Spark code base, and demonstrate performance optimizations that will help your data pipelines run faster, scale to larger datasets, and avoid costly antipatterns. Ideal for data engineers, software engineers, data scientists, and system administrators, the second edition of High Performance Spark presents new use cases, code examples, and best practices for Spark 3.x and beyond. This book gives you a fresh perspective on this continually evolving framework and shows you how to work around bumps on your Spark and PySpark journey. With this book, you'll learn how to: Accelerate your ML workflows with integrations including PyTorch Handle key skew and take advantage of Spark's new dynamic partitioning Make your code reliable with scalable testing and validation techniques Make Spark high performance Deploy Spark on Kubernetes and similar environments Take advantage of GPU acceleration with RAPIDS and resource profiles Get your Spark jobs to run faster Use Spark to productionize exploratory data science projects Handle even larger datasets with Spark Gain faster insights by reducing pipeline running times

PostgreSQL 18 for Developers

2026-04-14 O'Reilly Amazon

book

Vibhor Kumar , Marc Linster

data data-engineering SQL AI/ML Analytics Data Management

Developing intelligent applications that integrate AI, analytics, and transactional capabilities using the latest release of the world's most popular open-source database Key Features Practical examples demonstrating how to use Postgres to develop intelligent applications Best practices for developers of intelligent data management applications Includes the latest PostgreSQL 18 features for AI, analytics, and transactions ures for AI, analytics, and transactions Book Description In today’s data-first world, businesses need applications that blend transactions, analytics, and AI to power real-time insights at scale. Mastering PostgreSQL 18 for AI-Powered Enterprise Apps is your essential guide to building intelligent, high-performance systems with the latest features of PostgreSQL 18. Through hands-on examples and expert guidance, you’ll learn to design architectures that unite OLTP and OLAP, embed AI directly into apps, and optimize for speed, scalability, and reliability. Discover how to apply cutting-edge PostgreSQL tools for real-time decisions, predictive analytics, and automation. Go beyond basics with advanced strategies trusted by industry leaders. Whether you’re building data-rich applications, internal analytics platforms, or AI-driven services, this book equips you with the patterns and insights to deliver enterprise-grade innovation. Ideal for developers, architects, and tech leads driving digital transformation, this book empowers you to lead the future of intelligent applications. Harness the power of PostgreSQL 18—and unlock the full potential of your data. What you will learn How to leverage PostgreSQL 18 for building intelligent data-driven applications for the modern enterprise Data management principles and best practices for managing transactions, analytics, and AI use cases How to utilize Postgres capabilities to address architectural challenges and attain optimal performance for each use case Methods for utilizing the latest Postgres innovation to create integrated data management applications Guidelines on when to use Postgres and when to opt for specialized data management solutions Who this book is for This book is intended for developers creating intelligent, data-driven applications for the modern enterprise. It features hands-on examples that demonstrate how to use PostgreSQL as the database for business applications that integrate transactions, analytics, and AI. We explore the fundamental architectural principles of data management and detail how developers utilize PostgreSQL 18's latest capabilities to build AI-enabled applications. The book assumes a working knowledge of SQL and does not address the needs of data analysts or those looking to master SQL.

Data Engineering with Azure Databricks

2026-04-10 O'Reilly Amazon

book

Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

data data-engineering apache-spark AI/ML Airflow Analytics

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

Designing Data-Intensive Applications, 2nd Edition

2026-02-25 O'Reilly Amazon

book

Martin Kleppmann , Chris Riccomini

data data-engineering Flink GDPR/CCPA NoSQL RDBMS

Data is at the center of many challenges in system design today. Difficult issues such as scalability, consistency, reliability, efficiency, and maintainability need to be resolved. In addition, there's an overwhelming variety of tools and analytical systems, including relational databases, NoSQL datastores, plus data warehouses and data lakes. What are the right choices for your application? How do you make sense of all these buzzwords? In this second edition, authors Martin Kleppmann and Chris Riccomini build on the foundation laid in the acclaimed first edition, integrating new technologies and emerging trends. You'll be guided through the maze of decisions and trade-offs involved in building a modern data system, from choosing the right tools like Spark and Flink to understanding the intricacies of data laws like the GDPR. Peer under the hood of the systems you already use, and learn to use them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

Google Cloud Certified Professional Data Engineer Certification Guide

2026-02-20 O'Reilly Amazon

book

Ankur Roy , Sireesha Pulipati

it-operations cloud-computing cloud-platforms google-cloud gcp-certifications gcp-certifications-professional-tier

A guide to pass the GCP Professional Data Engineer exam on your first attempt and upgrade your data engineering skills on GCP. Key Features Fully understand the certification exam content and exam objectives Consolidate your knowledge of all essential exam topics and key concepts Get realistic experience of answering exam-style questions Develop practical skills for everyday use Purchase of this book unlocks access to web-based exam prep resources including mock exams, flashcards, exam tips Book Description The GCP Professional Data Engineer certification validates the fundamental knowledge required to perform data engineering tasks and use GCP services to enhance data engineering processes and further your career in the data engineering/architecting field. This book is a best-in-class study guide that fully covers the GCP Professional Data Engineer exam objectives and helps you pass the exam first time. Complete with clear explanations, chapter review questions, realistic mock exams, and pragmatic solutions, this guide will help you master the core exam concepts and build the understanding you need to go into the exam with the skills and confidence to get the best result you can. With the help of relevant examples, you'll learn fundamental data engineering concepts such as data warehousing and data security. As you progress, you'll delve into the important domains of the exam, including data pipelining, data migration, and data processing. Unlike other study guides, this book contains logical reasoning behind the choice of correct answers based in scenarios and provide you with excellent tips regarding the optimal use of each service, and gives you everything you need to pass the exam and enhance your prospects in the data engineering field. What you will learn Create data solutions and pipelines in GCP Analyze and transform data into useful information Apply data engineering concepts to real scenarios Create secure, cost-effective, valuable GCP workloads Work in the GCP environment with industry best practices Who this book is for This book is for data engineers who want a reliable source for the key concepts and terms present in the most prestigious and highly-sought-after cloud-based data engineering certification. This book will help you improve your data engineering in GCP skills to give you a better chance at earning the GCP Professional Data Engineer Certification. You will already be familiar with the Google Cloud Platform, having either explored it (professionally or personally) for at least a year. You should also have some familiarity with basic data concepts (such as types of data and basic SQL knowledge).

Data Contracts in Practice

2026-02-13 O'Reilly Amazon

book

Ryan Collingwood

data data-engineering Data Contracts Data Governance Data Quality JSON

In 'Data Contracts in Practice', Ryan Collingwood provides a detailed guide to managing and formalizing data responsibilities within organizations. Through practical examples and real-world use cases, you'll learn how to systematically address data quality, governance, and integration challenges using data contracts. What this Book will help me do Learn to identify and formalize expectations in data interactions, improving clarity among teams. Master implementation techniques to ensure data consistency and quality across critical business processes. Understand how to effectively document and deploy data contracts to bolster data governance. Explore solutions for proactively addressing and managing data changes and requirements. Gain real-world skills through practical examples using technologies like Python, SQL, JSON, and YAML. Author(s) Ryan Collingwood is a seasoned expert with over 20 years of experience in product management, data analysis, and software development. His holistic techno-social approach, designed to address both technical and organizational challenges, brings a unique perspective to improving data processes. Ryan's writing is informed by his extensive hands-on experience and commitment to enabling robust data ecosystems. Who is it for? This book is ideal for data engineers, software developers, and business analysts working to enhance organizational data integration. Professionals with a familiarity of system design, JSON, and YAML will find it particularly beneficial. Enterprise architects and leadership roles looking to understand data contract implementation and their business impacts will also greatly benefit. Basic understanding of Python and SQL is recommended to maximize learning.

Security and Privacy in 6G Communication Technology

2026-01-08 O'Reilly Amazon

book

Mandeep Singh , Amit Singhal , Puneet Kumar Aggarwal , Sushil Kumar Singh , Parita Jain

data data-engineering data-security-privacy data security & privacy Cyber Security

Future-proof your knowledge and expertise in telecommunications with this essential guide, which provides a comprehensive analysis of the critical security and privacy challenges in the transition to 6G communication. The advancement from 5G to 6G communication represents a quantum leap in wireless technology, promising unprecedented speeds, ultra-low latency, and ubiquitous connectivity. As the industry embarks on this journey, it encounters a host of technical challenges, particularly in ensuring the security and privacy of data transmitted across these networks. The interconnected nature of 6G systems, combined with the proliferation of Internet of Things devices and the sheer volume of data exchanged, creates a fertile ground for cyber threats and privacy breaches. This book delves into these intricate technical challenges, offering a comprehensive analysis of the security and privacy implications of 6G communication. We explore the vulnerabilities inherent in 6G networks, ranging from potential weaknesses in network protocols to the risk of unauthorized access to sensitive data. Through detailed examination and real-world examples, we provide insights into cutting-edge security measures and privacy-preserving techniques tailored specifically to the unique characteristics of 6G systems. By addressing these challenges head-on, we aim to empower engineers, researchers, and policymakers with the knowledge and tools necessary to build resilient and secure 6G networks that safeguard user privacy and data integrity in an increasingly interconnected world. By dissecting the complexities of 6G architecture and protocols, the book equips readers with a nuanced understanding of the unique security and privacy considerations that must be addressed in the design and implementation of these transformative systems.

Generative AI for Full-Stack Development: AI Empowered Accelerated Coding

2026-01-01 O'Reilly Amazon

book

Shantanu Baruah

data data-engineering nosql-databases MongoDB AI/ML GenAI

Gain cutting-edge skills in building a full-stack web application with AI assistance. This book will guide you in creating your own travel application using React and Node.js, with MongoDB as the database, while emphasizing the use of Gen AI platforms like Perplexity.ai and Claude for quicker development and more accurate debugging. The book’s step-by-step approach will help you bridge the gap between traditional web development methods and modern AI-assisted techniques, making it both accessible and insightful. It provides valuable lessons on professional web application development practices. By focusing on a practical example, the book offers hands-on experience that mirrors real-world scenarios, equipping you with relevant and in-demand skills that can be easily transferred to other projects. The book emphasizes the principles of responsive design, teaching you how to create web applications that adapt seamlessly to different screen sizes and devices. This includes using fluid grids, media queries, and optimizing layouts for usability across various platforms. You will also learn how to design, manage, and query databases using MongoDB, ensuring you can effectively handle data storage and retrieval in your applications. Most significantly, the book will introduce you to generative AI tools and prompt engineering techniques that can accelerate coding and debugging processes. This modern approach will streamline development workflows and enhance productivity. By the end of this book, you will not only have learned how to create a complete web application from backend to frontend, along with database management, but you will also have gained invaluable associated skills such as using IDEs, version control, and deploying applications efficiently and effectively with AI. What You Will Learn How to build a full-stack web application from scratch How to use generative AI tools to enhance coding efficiency and streamline the development process How to create user-friendly interfaces that enhance the overall experience of your web applications How to design, manage, and query databases using MongoDB Who This Book Is For Frontend developers, backend developers, and full-stack developers.

Modernizing SAP Business Warehouse: A Strategic Guidance to Migrating to SAP Business Data Cloud (SAP Datasphere and SAP Analytics Cloud)

2026-01-01 O'Reilly Amazon

book

Sourav Banerjee

data data-engineering SAP AI/ML Analytics Cloud Computing

The book simplifies the complexities of cloud transition and offers a clear, actionable roadmap for organizations moving from SAP BW or BW/4HANA to SAP Datasphere and SAP Analytics Cloud (as part of SAP Business Data Cloud), particularly in alignment with S/4HANA transformation. Whether you are assessing your current landscape, building a business case with ROI analysis, or creating a phased implementation strategy, this book delivers both technical and strategic guidance. It highlights short- and long-term planning considerations, outlines migration governance, and provides best practices for managing projects across hybrid SAP environments. From identifying platform gaps to facilitating stakeholder discussions, this book is an essential resource for anyone involved in the analytics modernization journey. You Will: [if !supportLists] · [endif] Learn how to assess your current SAP BW or BW/4HANA landscape and identify key migration drivers [if !supportLists] · [endif] Understand best practices for leveraging out-of-the-box cloud features and AI/ML capabilities [if !supportLists] · [endif] A step-by-step approach to planning and executing the move to SAP Business Data Cloud (Mainly SAP Datasphere and SAP Analytics Cloud) This book is for: SAP BW/BW4HANA Customers, SAP Consultants, Solution Architects and Enterprise Architects

Oracle 23AI & ADBS in Action: Exploring New Features with Hands-On Case Studies

2026-01-01 O'Reilly Amazon

book

Asim Chowdhury

data data-engineering oracle-database-solutions AI/ML Blockchain JavaScript

Unlock the power of Oracle Database 23AI and Autonomous Database Serverless (ADB-S) with this comprehensive guide to the latest innovations in performance, security, automation, and AI-driven optimization. As enterprises embrace intelligent and autonomous data platforms, understanding these capabilities is essential for data architects, developers, and DBAs. Explore cutting-edge features such as vector data types and AI-powered vector search, revolutionizing data retrieval in modern AI applications. Learn how schema privileges and the DB_DEVELOPER_ROLE simplify access control in multi-tenant environments. Dive into advanced auditing, SQL Firewall, and data integrity constraints to strengthen security and compliance. Discover AI-driven advancements like machine learning-based query execution, customer retention prediction, and AI-powered query tuning. Additional chapters cover innovations in JSON, XML, JSON-Relational Duality Views, new indexing techniques, SQL property graphs, materialized views, partitioning, lock-free transactions, JavaScript stored procedures, blockchain tables, and automated bigfile tablespace shrinking. What sets this book apart is its practical focus—each chapter includes real-world case studies and executable scripts, enabling professionals to implement these features effectively in enterprise environments. Whether you're optimizing performance or aligning IT with business goals, this guide is your key to building scalable, secure, and AI-powered solutions with Oracle 23AI and ADB-S. What You Will Learn Explore Oracle 23AI's latest features through real-world use cases Implement AI/ML-driven optimizations for smarter, autonomous database performance Gain hands-on experience with executable scripts and practical coding examples Strengthen security and compliance using advanced auditing, SQL Firewall, and blockchain tables Master high-performance techniques for query tuning, in-memory processing, and scalability Revolutionize data access with AI-powered vector search in modern AI workloads Simplify user access in multi-tenant environments using schema privileges and DB_DEVELOPER_ROLE Model and query complex data using JSON-Relational Duality Views and SQL property graphs Who this Book is For Database architects, data engineers, Oracle developers, and IT professionals seeking to leverage Oracle 23AI’s latest features for real-world applications

Practical Data Engineering with Apache Projects: Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

2026-01-01 O'Reilly Amazon

book

Dunith Danushka

data data-engineering streaming-messaging Kafka Airflow Flink

This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using Open Source solutions. Focusing on 10 real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more. Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios. At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering. In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering. You Will Learn: The foundational concepts of data engineering and practical experience in solving real-world data engineering problems How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino 10 hands-on data engineering projects Troubleshoot common challenges in data engineering projects Who is this book for: Early-career data engineers and aspiring data engineers who are looking to build a strong foundation in the field; mid-career professionals looking to transition into data engineering roles; and technology enthusiasts interested in gaining insights into data engineering practices and tools.

SAP ABAP 7.5 Optimization for HANA: AMDP, CDS and Native SQL for Peak Performance

2026-01-01 O'Reilly Amazon

book

Pratik Prakash Kasralikar

data data-engineering relational-databases sap-hana Data Modelling SAP

In the evolving landscape of SAP development, performance is no longer just a nice-to-have—it's a necessity. With the power of SAP HANA and the enhancements introduced in ABAP 7.5, developers are now equipped to rethink how applications are built, executed, and optimized. This book is your guide to that transformation. We begin by understanding the core shift: moving data-intensive operations directly into the HANA database. When implemented correctly, this "code pushdown" philosophy dramatically reduces data transfer and processing overhead. AMDP (ABAP Managed Database Procedures), our in-database processing engine, enables us to write complex logic directly in SQLScript, harnessing HANA’s parallel processing capabilities. We focus on crafting efficient AMDP procedures by adopting set-based operations and minimizing unnecessary data movement. Next, we explore Core Data Services (CDS) Views, our go-to data modeling tool. CDS Views are not just simple database views; they act as semantic layers that define how our applications interact with data. We learn to create optimized CDS Views by leveraging associations, annotations, and table functions, enabling us to build reusable, high-performance data models. These views simplify complex queries, improve data consistency, and enhance application flexibility. We then turn to Native SQL, our direct line to the HANA database. While AMDP and CDS Views provide powerful abstractions, Native SQL offers ultimate control for specialized tasks. We embed Native SQL within AMDP procedures to access database-specific features and fine-tune performance for critical operations. Along the way, we apply best practices for writing efficient queries, with a strong focus on indexing, join strategies, and precise data filtering. Throughout this journey, we emphasize the importance of rigorous testing and proactive monitoring. Just like a race car undergoes extensive testing before hitting the track, our ABAP applications require careful validation to ensure accuracy and optimal performance. We explore techniques for unit testing AMDP procedures, validating CDS Views, and monitoring query performance. We also look at strategies for detecting and addressing potential bottlenecks before they affect end users. SAP ABAP 7.5 Optimization for HANA is not just about writing faster code—it’s about fundamentally rethinking how we develop applications. By embracing code pushdown, leveraging AMDP, CDS Views, and Native SQL, and implementing robust testing and monitoring strategies, we build ABAP applications that are not only faster, but also more scalable, maintainable, and adaptable to the ever-evolving demands of modern business. You Will: Learn how to implement the "code pushdown" philosophy, moving data-intensive operations directly into the HANA database to reduce data transfer and processing overhead Understand to create optimized CDS Views, leveraging associations, annotations, and table functions to build reusable, high-performance data models that simplify complex queries and improve data consistency. Explore how to write complex logic directly in SQLScript using AMDP, harnessing HANA's parallel processing capabilities, and using Native SQL for specialized tasks, accessing database-specific features to optimize performance. This Book is For: ABAP Developers, SAP Consultants and Architects and IT Managers and Technical Leads

Engineering Lakehouses with Open Table Formats

2025-12-26 O'Reilly Amazon

book

Vinoth Govindarajan , Dipankar Mazumdar

data data-engineering storage-repositories data-lake Airflow Flink

Engineering Lakehouses with Open Table Formats introduces the architecture and capabilities of open table formats like Apache Iceberg, Apache Hudi, and Delta Lake. The book guides you through the design, implementation, and optimization of lakehouses that can handle modern data processing requirements effectively with real-world practical insights. What this Book will help me do Understand the fundamentals of open table formats and their benefits in lakehouse architecture. Learn how to implement performant data processing using tools like Apache Spark and Flink. Master advanced topics like indexing, partitioning, and interoperability between data formats. Explore data lifecycle management and integration with frameworks like Apache Airflow and dbt. Build secure lakehouses with regulatory compliance using best practices detailed in the book. Author(s) Dipankar Mazumdar and Vinoth Govindarajan are seasoned professionals with extensive experience in big data processing and software architecture. They bring their expertise from working with data lakehouses and are known for their ability to explain complex technical concepts clearly. Their collaborative approach brings valuable insights into the latest trends in data management. Who is it for? This book is ideal for data engineers, architects, and software professionals aiming to master modern lakehouse architectures. If you are familiar with data lakes or warehouses and wish to transition to an open data architectural design, this book is suited for you. Readers should have basic knowledge of databases, Python, and Apache Spark for the best experience.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Snowflake: The Definitive Guide, 2nd Edition

The Data Engineer's Guide to Microsoft Fabric

Universal Data Modeling

PostgreSQL: Up and Running, 4th Edition

AI Engineering Interviews

Context Engineering with DSPy

Building Data Products

Evals for AI Engineers

Practical Statistics for Data Scientists, 3rd Edition

Elasticsearch Query Language the Definitive Guide

Observability Engineering, 2nd Edition

Data Engineering for Multimodal AI

High Performance Spark, 2nd Edition

PostgreSQL 18 for Developers

Data Engineering with Azure Databricks

Designing Data-Intensive Applications, 2nd Edition

Google Cloud Certified Professional Data Engineer Certification Guide

Data Contracts in Practice

Security and Privacy in 6G Communication Technology

Generative AI for Full-Stack Development: AI Empowered Accelerated Coding

Modernizing SAP Business Warehouse: A Strategic Guidance to Migrating to SAP Business Data Cloud (SAP Datasphere and SAP Analytics Cloud)

Oracle 23AI & ADBS in Action: Exploring New Features with Hands-On Case Studies

Practical Data Engineering with Apache Projects: Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

SAP ABAP 7.5 Optimization for HANA: AMDP, CDS and Native SQL for Peak Performance

Engineering Lakehouses with Open Table Formats