Analytics

Snowflake: The Definitive Guide, 2nd Edition

2027-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Joyce Kaye Avila

AI/ML Cloud Computing Data Governance Data Management GenAI Iceberg Cyber Security Snowflake SQL data data-engineering

Snowflake is reshaping data management by integrating AI, analytics, and enterprise workloads into a single cloud platform. Snowflake: The Definitive Guide is a comprehensive resource for data architects, engineers, and business professionals looking to harness Snowflake's evolving capabilities, including Cortex AI, Snowpark, and Polaris Catalog for Apache Iceberg. This updated edition provides real-world strategies and hands-on activities for optimizing performance, securing data, and building AI-driven applications. With hands-on SQL examples and best practices, this book helps readers process structured and unstructured data, implement scalable architectures, and integrate Snowflake's AI tools seamlessly. Whether you're setting up accounts, managing access controls, or leveraging generative AI, this guide equips you with the expertise to maximize Snowflake's potential. Implement AI-powered workloads with Snowflake Cortex Explore Snowsight and Streamlit for no-code development Ensure security with access control and data governance Optimize storage, queries, and computing costs Design scalable data architectures for analytics and machine learning

Universal Data Modeling

2027-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jun Shan

AI/ML Data Modelling Data Quality NoSQL data data-engineering data-models

Most data professionals work with multiple datasets scattered across teams, systems, and formats. But without a clear modeling strategy, the result is often chaos: mismatched schemas, fragile pipelines, and a constant fight to make sense of the noise. This essential guide offers a better way by introducing a practical framework for designing high-quality data models that work across platforms while supporting the growing demands of AI, analytics, and real-time systems. Author Jun Shan bridges the gap between disconnected modeling approaches and the need for a unified, system-agnostic methodology. Whether you're building a new data platform or rethinking legacy infrastructure, Universal Data Modeling gives you the clarity, patterns, and tools to model data that's consistent, resilient, and ready to scale. Connect conceptual, logical, and physical modeling phases with confidence Apply best-fit techniques across relational, semistructured, and NoSQL formats Improve data quality, clarity, and maintainability across your organization Support modern design paradigms like data mesh and data products Translate domain knowledge into models that empower teams Build flexible, scalable models that stand the test of technology change

Elasticsearch Query Language the Definitive Guide

2026-06-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bahaaldine Azarmi , Alexis Charveriat , Stephen Brown , Farbod Shirzadian , Alejandro Sanchez

BI Data Analytics Data Management ELK Cyber Security data data-engineering elasticsearch search

Streamline your workflow with ESQL enhance data analysis with real-time insights, and speed up aggregations and visualizations Key Features Apply ESQL efficiently in analytics, observability, and cybersecurity Optimize performance and scalability for high-demand environments Discover how to visualize and debug ESQL queries Purchase of the print or Kindle book includes a free PDF eBook Book Description Built to simplify high-scale data analytics in Elasticsearch, this practical guide will take you from foundational concepts to advanced applications across search, observability, and security. It will help you overcome common challenges such as efficiently querying large datasets, applying advanced analytics without deep prior knowledge, and resolving for a unique and consolidated query language. Written by senior experts at Elastic with extensive field experience, this book delivers actionable guidance rooted in solving today’s data challenges at scale. After introducing ESQL and its architecture, the chapters explore real-world applications across various domains, including analytics, raw log analysis, observability, and cybersecurity. Advanced topics such as scaling, optimization, and future developments are also covered to help you maximize your ESQL capabilities. By the end of this book, you’ll be able to leverage ESQL for comprehensive data management and analysis, optimizing your workflows and enhancing your productivity with Elasticsearch. What you will learn Gain a solid understanding of ESQL and its architecture Use ESQL for data analysis and performance monitoring Apply ESQL in cybersecurity for threat detection and incident response Find out how to perform advanced searches using ESQL Prepare for future ESQL developments Showcase ESQL in action through real-world, persona-driven use cases Who this book is for If you’re an Elasticsearch user, this book is essential for your growth. Whether you’re a data analyst looking to build analytics on top of Elasticsearch, an SRE monitoring the health of your IT system, or a cybersecurity analyst, this book will give you a complete understanding of how ESQL is built and used. Additionally, database administrators, business intelligence professionals, and operational intelligence professionals will find this book invaluable. Even with a beginner-level knowledge of Elasticsearch, you’ll be able to get started and make the most of this comprehensive guide.

Observability Engineering, 2nd Edition

2026-06-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Charity Majors (Honeycomb) , Liz Fong-Jones , George Miranda

it-operations monitoring observability

Observability is the only way to engineer, manage, and improve the business-critical systems that customers depend on every day—and as the complexity of software grows, so does the need for observability. With this thoroughly revised second edition, authors Charity Majors, Liz Fong-Jones, and George Miranda take inventory of the current state of the field and explain how practitioners can evolve their observability practices from collecting separate, disparate signals to unified data workflows. This book is for any software engineering team, large or small, that must understand the unique customer experience in order to ship quality code and features that customers want, at the right velocity. You'll discover the value that observable systems bring and learn concrete steps you can follow to achieve an observability-driven development practice yourself. And four completely new chapters explore recent trends such as large language models, frontend observability, cost optimization/performance engineering, and practical open source tooling. Understand the impact observability has across the entire software development lifecycle Learn how and why different functional teams use observability with service-level objectives Implement modern observability practices in your organization Maximize the cost-effectiveness of observability tooling Produce quality code for context-aware system debugging and maintenance Use data-rich analytics to quickly find answers when maintaining site reliability

PostgreSQL 18 for Developers

2026-04-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vibhor Kumar , Marc Linster

AI/ML Data Management SQL data data-engineering postgresql

Developing intelligent applications that integrate AI, analytics, and transactional capabilities using the latest release of the world's most popular open-source database Key Features Practical examples demonstrating how to use Postgres to develop intelligent applications Best practices for developers of intelligent data management applications Includes the latest PostgreSQL 18 features for AI, analytics, and transactions ures for AI, analytics, and transactions Book Description In today’s data-first world, businesses need applications that blend transactions, analytics, and AI to power real-time insights at scale. Mastering PostgreSQL 18 for AI-Powered Enterprise Apps is your essential guide to building intelligent, high-performance systems with the latest features of PostgreSQL 18. Through hands-on examples and expert guidance, you’ll learn to design architectures that unite OLTP and OLAP, embed AI directly into apps, and optimize for speed, scalability, and reliability. Discover how to apply cutting-edge PostgreSQL tools for real-time decisions, predictive analytics, and automation. Go beyond basics with advanced strategies trusted by industry leaders. Whether you’re building data-rich applications, internal analytics platforms, or AI-driven services, this book equips you with the patterns and insights to deliver enterprise-grade innovation. Ideal for developers, architects, and tech leads driving digital transformation, this book empowers you to lead the future of intelligent applications. Harness the power of PostgreSQL 18—and unlock the full potential of your data. What you will learn How to leverage PostgreSQL 18 for building intelligent data-driven applications for the modern enterprise Data management principles and best practices for managing transactions, analytics, and AI use cases How to utilize Postgres capabilities to address architectural challenges and attain optimal performance for each use case Methods for utilizing the latest Postgres innovation to create integrated data management applications Guidelines on when to use Postgres and when to opt for specialized data management solutions Who this book is for This book is intended for developers creating intelligent, data-driven applications for the modern enterprise. It features hands-on examples that demonstrate how to use PostgreSQL as the database for business applications that integrate transactions, analytics, and AI. We explore the fundamental architectural principles of data management and detail how developers utilize PostgreSQL 18's latest capabilities to build AI-enabled applications. The book assumes a working knowledge of SQL and does not address the needs of data analysts or those looking to master SQL.

Data Engineering with Azure Databricks

2026-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

AI/ML Airflow Azure ADF Azure DevOps CI/CD Cloud Computing Data Engineering Data Governance Data Lakehouse Databricks Delta +11 more

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

Modernizing SAP Business Warehouse: A Strategic Guidance to Migrating to SAP Business Data Cloud (SAP Datasphere and SAP Analytics Cloud)

2026-01-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sourav Banerjee

AI/ML Cloud Computing SAP data data-engineering

The book simplifies the complexities of cloud transition and offers a clear, actionable roadmap for organizations moving from SAP BW or BW/4HANA to SAP Datasphere and SAP Analytics Cloud (as part of SAP Business Data Cloud), particularly in alignment with S/4HANA transformation. Whether you are assessing your current landscape, building a business case with ROI analysis, or creating a phased implementation strategy, this book delivers both technical and strategic guidance. It highlights short- and long-term planning considerations, outlines migration governance, and provides best practices for managing projects across hybrid SAP environments. From identifying platform gaps to facilitating stakeholder discussions, this book is an essential resource for anyone involved in the analytics modernization journey. You Will: [if !supportLists] · [endif] Learn how to assess your current SAP BW or BW/4HANA landscape and identify key migration drivers [if !supportLists] · [endif] Understand best practices for leveraging out-of-the-box cloud features and AI/ML capabilities [if !supportLists] · [endif] A step-by-step approach to planning and executing the move to SAP Business Data Cloud (Mainly SAP Datasphere and SAP Analytics Cloud) This book is for: SAP BW/BW4HANA Customers, SAP Consultants, Solution Architects and Enterprise Architects

Apache Hudi: The Definitive Guide

2025-10-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rebecca Bilbro , Prashant Wason , Bhavani Sudha Saktheeswaran , Shiyan Xu

Data Lakehouse Hadoop Hudi Data Streaming apache-hive data data-engineering

Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi. With this practical guide, data engineers, data architects, and software architects will discover how to seamlessly build an interoperable lakehouse from disparate data sources and deliver faster insights using your query engine of choice. Authors Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, and Rebecca Bilbro provide practical examples and insights to help you unlock the full potential of data lakehouses for different levels of analytics, from batch to interactive to streaming. You'll also learn how to evaluate storage choices and leverage built-in automated table optimizations to build, maintain, and operate production data applications. Understand the need for transactional data lakehouses and the challenges associated with building them Explore data ecosystem support provided by Apache Hudi for popular data sources and query engines Perform different write and read operations on Apache Hudi tables and effectively use them for various use cases, including batch and stream applications Apply different storage techniques and considerations such as indexing and clustering to maximize your lakehouse performance Build end-to-end incremental data pipelines using Apache Hudi for faster ingestion and fresher analytics

Advanced Snowflake

2025-10-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Muhammad Fasih Ullah

AI/ML Iceberg Snowflake data data-engineering

As Snowflake's capabilities expand, staying updated with its latest features and functionalities can be overwhelming. The platform's rapid development gave rise to advanced tools like Snowpark and the Native App Framework, which are crucial for optimizing data operations but may seem complex to navigate. In this essential book, author Muhammad Fasih Ullah offers a detailed guide to understanding these sophisticated tools, ensuring you can leverage the full potential of Snowflake for data processing, application development, and deploying machine learning models at scale. You'll gain actionable insights and structured examples to transform your understanding and skills in handling advanced data scenarios within Snowflake. By the end of this book, you will: Grasp advanced features such as Snowpark, Snowflake Native App Framework, and Iceberg tables Enhance your projects with geospatial functions for comprehensive geospatial analytics Interact with Snowflake using a variety of programming languages through Snowpark Implement and manage machine learning models effectively using Snowpark ML Develop and deploy applications within the Snowflake environment

Mastering PostgreSQL Administration: Internals, Operations, Monitoring, and Oracle Migration Strategies

2025-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Arun Kumar Samayam , Y V Ravi Kumar , Phani Kadambari

Grafana Oracle Cyber Security data data-engineering postgresql relational-databases

This book is your one-stop resource on PostgreSQL system architecture, installation, management, maintenance, and migration. It will help you address the critical needs driving successful database management today: reliability and availability, performance and scalability, security and compliance, cost-effectiveness and flexibility, disaster recovery, and real-time analytics—all in one volume. Each topic in the book is thoroughly explained by industry experts and includes step-by-step instructions for configuring the features, a discussion of common issues and their solutions, and an exploration of real-world scenarios and case studies that illustrate how concepts work in practice. You won't find the book's comprehensive coverage of advanced topics, including migration from Oracle to PostgreSQL, heterogeneous replication, and backup & recovery, in one place—online or anywhere else. What You Will Learn Install PostgreSQL using source code and yum installation Back up and recover Migrate from Oracle database to PostgreSQL using ora2pg utility Replicate from PostgreSQL to Oracle database and vice versa using Oracle GoldenGate Monitor using Grafana, PGAdmin, and command line tools Maintain with VACUUM, REINDEX, etc. Who This Book Is For Intermediate and advanced PostgreSQL users, including PostgreSQL administrators, architects, developers, analysts, disaster recovery system engineers, high availability engineers, and migration engineers

The Definitive Guide to OpenSearch

2025-09-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Soujanya Konka , Prashant Agrawal (AWS) , Jon Handler (AWS)

AWS Big Data data data-engineering search

Learn how to harness the power of OpenSearch effectively with 'The Definitive Guide to OpenSearch'. This book explores installation, configuration, query building, and visualization, guiding readers through practical use cases and real-world implementations. Whether you're building search experiences or analyzing data patterns, this guide equips you thoroughly. What this Book will help me do Understand core OpenSearch principles, architecture, and the mechanics of its search and analytics capabilities. Learn how to perform data ingestion, execute advanced queries, and produce insightful visualizations on OpenSearch Dashboards. Implement scaling strategies and optimum configurations for high-performance OpenSearch clusters. Explore real-world case studies that demonstrate OpenSearch applications in diverse industries. Gain hands-on experience through practical exercises and tutorials for mastering OpenSearch functionality. Author(s) Jon Handler, Soujanya Konka, and Prashant Agrawal, celebrated experts in search technologies and big data analysis, bring their years of experience at AWS and other domains to this book. Their collective expertise ensures that readers receive both core theoretical knowledge and practical applications to implement directly. Who is it for? This book is aimed at developers, data professionals, engineers, and systems operators who work with search systems or analytics platforms. It is especially suitable for individuals in roles handling large-scale data, who want to improve their skills or deploy OpenSearch in production environments. Early learners and seasoned experts alike will find valuable insights.

Fundamentals of Metadata Management

2025-08-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ole ⁠OLESEN-BAGNEUX (ACTIAN)

AI/ML Data Analytics Data Management Cyber Security data data-engineering metadata

Whether it's to adhere to regulations, access markets by meeting specific standards, or devise data analytics and AI strategies, companies today are busy implementing metadata repositories—metadata tools about the IT, data, information, and knowledge in your company. Until now, most of these repositories have been implemented in isolation from one another, but that practice lies at the core of problems with data management in many companies today. Author Ole Olesen-Bagneux, chief evangelist at Actian, shows you how to masterfully manage your metadata repositories by properly coordinating them. That requires a data discovery team to increase insights for all key players in enterprise data management, from the CIO and CDO to enterprise and data architects. Coordinating these repositories will help you and your organization democratize data and excel at data management. This book shows you how. Learn what metadata repositories are and what they do Explore which data to represent in these repositories Set up a data discovery team to make data searchable Learn how to manage and coordinate repositories in a meta grid Increase innovation by setting up a functional data marketplace Make information security and data protection more robust Gain a deeper understanding of your company IT landscape Activate real enterprise architecture based on evidence

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

2025-08-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Donna Strok , Dmitry Foshin , Dmitry Anoshin

BI Cloud Computing Data Analytics Databricks DWH ETL/ELT Iceberg Matillion Cyber Security Snowflake Tableau data +1 more

This book is your guide to the modern market of data analytics platforms and the benefits of using Snowflake, the data warehouse built for the cloud. As organizations increasingly rely on modern cloud data platforms, the core of any analytics framework—the data warehouse—is more important than ever. This updated 2nd edition ensures you are ready to make the most of the industry’s leading data warehouse. This book will onboard you to Snowflake and present best practices for deploying and using the Snowflake data warehouse. The book also covers modern analytics architecture, integration with leading analytics software such as Matillion ETL, Tableau, and Databricks, and migration scenarios for on-premises legacy data warehouses. This new edition includes expanded coverage of SnowPark for developing complex data applications, an introduction to managing large datasets with Apache Iceberg tables, and instructions for creating interactive data applications using Streamlit, ensuring readers are equipped with the latest advancements in Snowflake's capabilities. What You Will Learn Master key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake Integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Manage large datasets with Apache Iceberg Tables Implement continuous data loading with Snowpipe and Dynamic Tables Who This Book Is For Data professionals, business analysts, IT administrators, and existing or potential Snowflake users

Getting Started with BN4L and GTT Integrations for SAP: Freight Collaboration with SAP Business Network for Logistics and Global Track and Trace

2025-07-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Prince Tyagi , Anevershika -

SAP data data-engineering

Understand the fundamentals of SAP supply chain management and intelligence with the new Business Network for Logistics (BN4L) & Global Track and Trace (GTT) systems for SAP customers. It highlights how SAP Business Network enhances collaboration between suppliers, manufacturers, and logistics providers by leveraging shared business objects and real-time data. These integrations helps businesses achieve greater efficiency, transparency, and collaboration across their supply chain operations by leveraging intelligent insights and real-time data so they can better meet the demands of their customers. Getting Started with BN4L and GTT Integrations for SAP will not only provide you with the key concepts and definitions but also system configurations and real-world case studies giving you the skills and knowledge to start using SAP BN4L & GTT with confidence. You Will: Gain insights into the BN4 & GTT Intelligence Network and how it is used in the SAP applications. Learn the network for freight collaboration & shipment status. Learn the basics of systems administration, master data for the network, and integration scenarios with SAP S/4HANA. Understand a high level of end-to-end business processes. How it all works together and review analytics to get a better understanding of shared business objects. Who is this Book for: SAP Supply Chain Consultants, SAP Supply Chain Architect, SAP Technical Consultant and SAP Customers

Narrative SQL: Crafting Data Analysis Queries That Tell Stories

2025-07-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hamed Tabrizchi

BI Data Analytics SQL data data-engineering

This book addresses an important gap in data analytics education: the interplay between complex query-making and storytelling. While many resources cover the fundamentals of SQL queries and the technical skills required to manipulate data, few also explore moving beyond the numbers and figures to tell stories that drive strategic business decisions. By weaving together both SQL and narrative mechanics, author Hamed Tabrizchi has assembled a powerful tool for data analysts, aspiring database professionals, and business intelligence specialists. A strong foundation is laid in the first part of the book, which examines the technical skills necessary to access and manipulate data. You’ll explore foundational SQL commands, advanced querying techniques, data manipulation, data integrity, and optimization of queries for performance. The second half moves from the "how" of SQL to the "why," examining the meaning-making practices we can apply to data, and the stories data can tell. You'll learn how SQL queries can be interpreted, how to prepare data for visualization, and most importantly, how to convey the findings in a way that engages and informs the audience. In each chapter, practical exercises reinforce the techniques learned and help you apply them in real-world situations. In addition to strengthening technical skills, these exercises encourage readers to take a critical view of the data they are studying, considering the larger story it represents. Upon completing this book, you will not only be proficient in SQL, but also possess the key skill of converting data into narratives that can influence strategic direction and operational decisions in the modern workplace. What You Will Learn Advanced SQL Techniques: Master data manipulation and retrieval skills using advanced SQL queries Data Analysis Proficiency: Develop analytical skills to uncover key insights and understand significant data patterns Storytelling with Data: Learn to translate data analytics into compelling narratives for effective stakeholder communication Complex Querying Skills: Understand advanced SQL concepts such as common table expressions (CTEs), subqueries, and window functions Query Optimization: Optimize query execution time, resource usage, and scalability by mastering Indexes and Views Practical Application of Techniques: Gain hands-on experience with practical examples of advanced SQL techniques in real-world data analysis scenarios Effective Data Presentation: Discover strategies for visually presenting data stories to enhance engagement and understanding among diverse audiences Who This Book Is For Data analysts and business analysts, SQL developers, data-driven managers and executives and academics and students looking to enhance advanced querying and narrative building skills to better interpret and convey data.

Apache Kafka in Action

2025-05-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alexander Kropp , Anatoly Zelenin (DataFlow Academy)

Cloud Computing Kafka Kubernetes Microsoft Data Streaming data data-engineering streaming-messaging

Apache Kafka, start to finish. Apache Kafka in Action: From basics to production guides you through the concepts and skills you’ll need to deploy and administer Kafka for data pipelines, event-driven applications, and other systems that process data streams from multiple sources. Authors Anatoly Zelenin and Alexander Kropp have spent years using Kafka in real-world production environments. In this guide, they reveal their hard-won expert insights to help you avoid common Kafka pitfalls and challenges. Inside Apache Kafka in Action you’ll discover: Apache Kafka from the ground up Achieving reliability and performance Troubleshooting Kafka systems Operations, governance, and monitoring Kafka use cases, patterns, and anti-patterns Clear, concise, and practical, Apache Kafka in Action is written for IT operators, software engineers, and IT architects working with Kafka every day. Chapter by chapter, it guides you through the skills you need to deliver and maintain reliable and fault-tolerant data-driven applications. About the Technology Apache Kafka is the gold standard streaming data platform for real-time analytics, event sourcing, and stream processing. Acting as a central hub for distributed data, it enables seamless flow between producers and consumers via a publish-subscribe model. Kafka easily handles millions of events per second, and its rock-solid design ensures high fault tolerance and smooth scalability. About the Book Apache Kafka in Action is a practical guide for IT professionals who are integrating Kafka into data-intensive applications and infrastructures. The book covers everything from Kafka fundamentals to advanced operations, with interesting visuals and real-world examples. Readers will learn to set up Kafka clusters, produce and consume messages, handle real-time streaming, and integrate Kafka into enterprise systems. This easy-to-follow book emphasizes building reliable Kafka applications and taking advantage of its distributed architecture for scalability and resilience. What's Inside Master Kafka’s distributed streaming capabilities Implement real-time data solutions Integrate Kafka into enterprise environments Build and manage Kafka applications Achieve fault tolerance and scalability About the Reader For IT operators, software architects and developers. No experience with Kafka required. About the Authors Anatoly Zelenin is a Kafka expert known for workshops across Europe, especially in banking and manufacturing. Alexander Kropp specializes in Kafka and Kubernetes, contributing to cloud platform design and monitoring. Quotes A great introduction. Even experienced users will go back to it again and again. - Jakub Scholz, Red Hat Approachable, practical, well-illustrated, and easy to follow. A must-read. - Olena Kutsenko, Confluent A zero to hero journey to understanding and using Kafka! - Anthony Nandaa, Microsoft Thoughtfully explores a wide range of topics. A wealth of valuable information seamlessly presented and easily accessible. - Olena Babenko, Aiven Oy

Amazon Redshift Cookbook - Second Edition

2025-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anusha Challa (AWS) , Harshida Patel (AWS) , Shruti Worlikar (AWS Analytics)

AI/ML AWS Cloud Computing Data Analytics DWH ETL/ELT GenAI Redshift Cyber Security amazon-redshift data data-engineering +1 more

Amazon Redshift Cookbook provides practical techniques for utilizing AWS's managed data warehousing service effectively. With this book, you'll learn to create scalable and secure data analytics solutions, tackle data integration challenges, and leverage Redshift's advanced features like data sharing and generative AI capabilities. What this Book will help me do Create end-to-end data analytics solutions from ingestion to reporting using Amazon Redshift. Optimize the performance and security of Redshift implementations to meet enterprise standards. Leverage Amazon Redshift for zero-ETL ingestion and advanced concurrency scaling. Integrate Redshift with data lakes for enhanced data processing versatility. Implement generative AI and machine learning solutions directly within Redshift environments. Author(s) Shruti Worlikar, Harshida Patel, and Anusha Challa are seasoned data experts who bring together years of experience with Amazon Web Services and data analytics. Their combined expertise enables them to offer actionable insights, hands-on recipes, and proven strategies for implementing and optimizing Amazon Redshift-based solutions. Who is it for? This book is best suited for data analysts, data engineers, and architects who are keen on mastering modern data warehouse solutions using Redshift. Readers should have some knowledge of data warehousing and familiarity with cloud concepts. Ideal for professionals looking to migrate on-premises systems or build cloud-native analytics pipelines leveraging Redshift.

SnowPro Core Certification Study Guide

2025-02-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jatin Verma

BI Cloud Computing Data Analytics Snowflake SQL data data-engineering

The "SnowPro Core Certification Study Guide" provides a comprehensive resource for mastering Snowflake data cloud concepts and passing the SnowPro Core exam. Through detailed explanations and practical exercises, you will gain the knowledge and skills necessary to successfully implement and manage Snowflake's powerful features and integrate data solutions effectively. What this Book will help me do Efficiently load and manage data in Snowflake for modern data processing. Optimize queries and configure Snowflake's performance features for data analytics. Securely implement access control and user roles to ensure data privacy. Apply Snowflake's sharing features to collaborate within and between organizations. Prepare effectively for the SnowPro Core exam with mock tests and review tools. Author(s) Jatin Verma is a renowned expert in Snowflake technologies and a certified SnowPro Core professional. With years of hands-on experience working with data solutions, Jatin excels at breaking down complex concepts into digestible lessons. His approachable writing style and dedication to education make this book a trusted resource for both aspiring and current professionals. Who is it for? This book is perfect for data engineers, analysts, database administrators, and business intelligence professionals who are looking to gain expertise in Snowflake and achieve SnowPro Core certification. It is particularly suited for those with foundational knowledge of databases, data warehouses, and SQL, seeking to advance their skills in Snowflake and become certified professionals. By leveraging this guide, readers can solidify their Snowflake knowledge and confidently approach the SnowPro Core certification exam.

Snowflake Data Engineering

2024-12-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Maja Ferle

AI/ML API CI/CD Cloud Computing Data Analytics Data Engineering Data Governance ELK Funnel GenAI Microsoft Python +5 more

A practical introduction to data engineering on the powerful Snowflake cloud data platform. Data engineers create the pipelines that ingest raw data, transform it, and funnel it to the analysts and professionals who need it. The Snowflake cloud data platform provides a suite of productivity-focused tools and features that simplify building and maintaining data pipelines. In Snowflake Data Engineering, Snowflake Data Superhero Maja Ferle shows you how to get started. In Snowflake Data Engineering you will learn how to: Ingest data into Snowflake from both cloud and local file systems Transform data using functions, stored procedures, and SQL Orchestrate data pipelines with streams and tasks, and monitor their execution Use Snowpark to run Python code in your pipelines Deploy Snowflake objects and code using continuous integration principles Optimize performance and costs when ingesting data into Snowflake Snowflake Data Engineering reveals how Snowflake makes it easy to work with unstructured data, set up continuous ingestion with Snowpipe, and keep your data safe and secure with best-in-class data governance features. Along the way, you’ll practice the most important data engineering tasks as you work through relevant hands-on examples. Throughout, author Maja Ferle shares design tips drawn from her years of experience to ensure your pipeline follows the best practices of software engineering, security, and data governance. About the Technology Pipelines that ingest and transform raw data are the lifeblood of business analytics, and data engineers rely on Snowflake to help them deliver those pipelines efficiently. Snowflake is a full-service cloud-based platform that handles everything from near-infinite storage, fast elastic compute services, inbuilt AI/ML capabilities like vector search, text-to-SQL, code generation, and more. This book gives you what you need to create effective data pipelines on the Snowflake platform. About the Book Snowflake Data Engineering guides you skill-by-skill through accomplishing on-the-job data engineering tasks using Snowflake. You’ll start by building your first simple pipeline and then expand it by adding increasingly powerful features, including data governance and security, adding CI/CD into your pipelines, and even augmenting data with generative AI. You’ll be amazed how far you can go in just a few short chapters! What's Inside Ingest data from the cloud, APIs, or Snowflake Marketplace Orchestrate data pipelines with streams and tasks Optimize performance and cost About the Reader For software developers and data analysts. Readers should know the basics of SQL and the Cloud. About the Author Maja Ferle is a Snowflake Subject Matter Expert and a Snowflake Data Superhero who holds the SnowPro Advanced Data Engineer and the SnowPro Advanced Data Analyst certifications. Quotes An incredible guide for going from zero to production with Snowflake. - Doyle Turner, Microsoft A must-have if you’re looking to excel in the field of data engineering. - Isabella Renzetti, Data Analytics Consultant & Trainer Masterful! Unlocks the true potential of Snowflake for modern data engineers. - Shankar Narayanan, Microsoft Valuable insights will enhance your data engineering skills and lead to cost-effective solutions. A must read! - Frédéric L’Anglais, Maxa Comprehensive, up-to-date and packed with real-life code examples. - Albert Nogués, Danone

AI Engineering

2024-12-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chip Huyen (Independent)

AI/ML Data Analytics RAG ai-ml artificial-intelligence-ai artificial intelligence (ai) data

Recent breakthroughs in AI have not only increased demand for AI products, they've also lowered the barriers to entry for those who want to build AI products. The model-as-a-service approach has transformed AI from an esoteric discipline into a powerful development tool that anyone can use. Everyone, including those with minimal or no prior AI experience, can now leverage AI models to build applications. In this book, author Chip Huyen discusses AI engineering: the process of building applications with readily available foundation models. The book starts with an overview of AI engineering, explaining how it differs from traditional ML engineering and discussing the new AI stack. The more AI is used, the more opportunities there are for catastrophic failures, and therefore, the more important evaluation becomes. This book discusses different approaches to evaluating open-ended models, including the rapidly growing AI-as-a-judge approach. AI application developers will discover how to navigate the AI landscape, including models, datasets, evaluation benchmarks, and the seemingly infinite number of use cases and application patterns. You'll learn a framework for developing an AI application, starting with simple techniques and progressing toward more sophisticated methods, and discover how to efficiently deploy these applications. Understand what AI engineering is and how it differs from traditional machine learning engineering Learn the process for developing an AI application, the challenges at each step, and approaches to address them Explore various model adaptation techniques, including prompt engineering, RAG, fine-tuning, agents, and dataset engineering, and understand how and why they work Examine the bottlenecks for latency and cost when serving foundation models and learn how to overcome them Choose the right model, dataset, evaluation benchmarks, and metrics for your needs Chip Huyen works to accelerate data analytics on GPUs at Voltron Data. Previously, she was with Snorkel AI and NVIDIA, founded an AI infrastructure startup, and taught Machine Learning Systems Design at Stanford. She's the author of the book Designing Machine Learning Systems, an Amazon bestseller in AI. AI Engineering builds upon and is complementary to Designing Machine Learning Systems (O'Reilly).

talk-data.com

Activity Trend

Top Events

Top Speakers

Snowflake: The Definitive Guide, 2nd Edition

Universal Data Modeling

Elasticsearch Query Language the Definitive Guide

Observability Engineering, 2nd Edition

PostgreSQL 18 for Developers

Data Engineering with Azure Databricks

Modernizing SAP Business Warehouse: A Strategic Guidance to Migrating to SAP Business Data Cloud (SAP Datasphere and SAP Analytics Cloud)

Apache Hudi: The Definitive Guide

Advanced Snowflake

Mastering PostgreSQL Administration: Internals, Operations, Monitoring, and Oracle Migration Strategies

The Definitive Guide to OpenSearch

Fundamentals of Metadata Management

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Getting Started with BN4L and GTT Integrations for SAP: Freight Collaboration with SAP Business Network for Logistics and Global Track and Trace

Narrative SQL: Crafting Data Analysis Queries That Tell Stories

Apache Kafka in Action

Amazon Redshift Cookbook - Second Edition

SnowPro Core Certification Study Guide

Snowflake Data Engineering

AI Engineering