talk-data.com talk-data.com

Topic

data

3406

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Practical Data Engineering with Apache Projects: Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using Open Source solutions. Focusing on 10 real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more. Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios. At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering. In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering. You Will Learn: The foundational concepts of data engineering and practical experience in solving real-world data engineering problems How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino 10 hands-on data engineering projects Troubleshoot common challenges in data engineering projects Who is this book for: Early-career data engineers and aspiring data engineers who are looking to build a strong foundation in the field; mid-career professionals looking to transition into data engineering roles; and technology enthusiasts interested in gaining insights into data engineering practices and tools.

SAP ABAP 7.5 Optimization for HANA: AMDP, CDS and Native SQL for Peak Performance

In the evolving landscape of SAP development, performance is no longer just a nice-to-have—it's a necessity. With the power of SAP HANA and the enhancements introduced in ABAP 7.5, developers are now equipped to rethink how applications are built, executed, and optimized. This book is your guide to that transformation. We begin by understanding the core shift: moving data-intensive operations directly into the HANA database. When implemented correctly, this "code pushdown" philosophy dramatically reduces data transfer and processing overhead. AMDP (ABAP Managed Database Procedures), our in-database processing engine, enables us to write complex logic directly in SQLScript, harnessing HANA’s parallel processing capabilities. We focus on crafting efficient AMDP procedures by adopting set-based operations and minimizing unnecessary data movement. Next, we explore Core Data Services (CDS) Views, our go-to data modeling tool. CDS Views are not just simple database views; they act as semantic layers that define how our applications interact with data. We learn to create optimized CDS Views by leveraging associations, annotations, and table functions, enabling us to build reusable, high-performance data models. These views simplify complex queries, improve data consistency, and enhance application flexibility. We then turn to Native SQL, our direct line to the HANA database. While AMDP and CDS Views provide powerful abstractions, Native SQL offers ultimate control for specialized tasks. We embed Native SQL within AMDP procedures to access database-specific features and fine-tune performance for critical operations. Along the way, we apply best practices for writing efficient queries, with a strong focus on indexing, join strategies, and precise data filtering. Throughout this journey, we emphasize the importance of rigorous testing and proactive monitoring. Just like a race car undergoes extensive testing before hitting the track, our ABAP applications require careful validation to ensure accuracy and optimal performance. We explore techniques for unit testing AMDP procedures, validating CDS Views, and monitoring query performance. We also look at strategies for detecting and addressing potential bottlenecks before they affect end users. SAP ABAP 7.5 Optimization for HANA is not just about writing faster code—it’s about fundamentally rethinking how we develop applications. By embracing code pushdown, leveraging AMDP, CDS Views, and Native SQL, and implementing robust testing and monitoring strategies, we build ABAP applications that are not only faster, but also more scalable, maintainable, and adaptable to the ever-evolving demands of modern business. You Will: Learn how to implement the "code pushdown" philosophy, moving data-intensive operations directly into the HANA database to reduce data transfer and processing overhead Understand to create optimized CDS Views, leveraging associations, annotations, and table functions to build reusable, high-performance data models that simplify complex queries and improve data consistency. Explore how to write complex logic directly in SQLScript using AMDP, harnessing HANA's parallel processing capabilities, and using Native SQL for specialized tasks, accessing database-specific features to optimize performance. This Book is For: ABAP Developers, SAP Consultants and Architects and IT Managers and Technical Leads

Engineering Lakehouses with Open Table Formats

Engineering Lakehouses with Open Table Formats introduces the architecture and capabilities of open table formats like Apache Iceberg, Apache Hudi, and Delta Lake. The book guides you through the design, implementation, and optimization of lakehouses that can handle modern data processing requirements effectively with real-world practical insights. What this Book will help me do Understand the fundamentals of open table formats and their benefits in lakehouse architecture. Learn how to implement performant data processing using tools like Apache Spark and Flink. Master advanced topics like indexing, partitioning, and interoperability between data formats. Explore data lifecycle management and integration with frameworks like Apache Airflow and dbt. Build secure lakehouses with regulatory compliance using best practices detailed in the book. Author(s) Dipankar Mazumdar and Vinoth Govindarajan are seasoned professionals with extensive experience in big data processing and software architecture. They bring their expertise from working with data lakehouses and are known for their ability to explain complex technical concepts clearly. Their collaborative approach brings valuable insights into the latest trends in data management. Who is it for? This book is ideal for data engineers, architects, and software professionals aiming to master modern lakehouse architectures. If you are familiar with data lakes or warehouses and wish to transition to an open data architectural design, this book is suited for you. Readers should have basic knowledge of databases, Python, and Apache Spark for the best experience.

Building a Data and AI Platform with PostgreSQL

In a world where data sovereignty, scalability, and AI innovation are at the forefront of enterprise strategy, PostgreSQL is emerging as the key to unlocking transformative business value. This new guide serves as your beacon for navigating the convergence of AI, open source technologies, and intelligent data platforms. Authors Tom Taulli, Benjamin Anderson, and Jozef de Vries offer a strategic and practical approach to building AI and data platforms that balance innovation with governance, empowering organizations to take control of their data future. Whether you're designing frameworks for advanced AI applications, modernizing legacy infrastructures, or solving data challenges at scale, you can use this guide to bridge the gap between technical complexity and actionable strategy. Written for IT executives, data leaders, and practitioners alike, it will equip you with the tools and insights to harness Postgre's unique capabilities—extensibility, unstructured data management, and hybrid workloads—for long-term success in an AI-driven world. Learn how to build an AI and data platform using PostgreSQL Overcome data challenges like modernization, integration, and governance Optimize AI performance with model fine-tuning and retrieval-augmented generation (RAG) best practices Discover use cases that align data strategy with business goals Take charge of your data and AI future with this comprehensive and accessible roadmap

Just Use Postgres!

You probably don’t need a collection of specialty databases. Just use Postgres instead! Written for application developers and database pros, Just Use Postgres! shows you how to get the most out of the powerful Postgres database. In Just Use Postgres! you’ll learn how to: Use Postgres as an RDBMS for transactional workloads Develop generative AI, geospatial, and time-series applications Take advantage of modern SQL including window functions and CTEs Perform full-text search and process JSON documents Use Postgres as a message queue Optimize performance with various index types including B-trees, GIN, GiST, HNSW, and more Over the decades, PostgreSQL, aka Postgres, has grown into the most powerful general-purpose database and has become the de facto standard for developers worldwide. Just Use Postgres! takes a modern look at Postgres, exploring the database’s most up-to-date features for AI, time-series, full-text search, geospatial, and other application workloads. About the Technology You know that PostgreSQL is a fast, reliable, SQL compliant RDBMS. You may not know that it’s also great for geospatial systems, time series, full-text search, JSON documents, AI vector embeddings, and many other specialty database functions. For almost any data task you can imagine, you can use Postgres. About the Book Just Use Postgres! covers recipes for using Postgres in dozens of applications normally reserved for single-purpose databases. Written for busy application developers, each chapter explores a different use case illuminating the breadth and depth of Postgres’s capabilities. Along the way, you’ll also meet an incredible ecosystem of Postgres extensions like pgvector, PostGIS, pgmq, and TimescaleDB. You’ll be amazed at everything you can accomplish with Postgres! What's Inside Generative AI, geospatial, and time-series applications Modern SQL including window functions and CTEs Full-text search and JSON B-trees, GIN, GiST, HNSW, and more About the Reader For application developers, software engineers, and architects who know the basics of SQL. About the Author Denis Magda is a recognized Postgres expert and software engineer who worked on Java at Sun Microsystems and Oracle before focusing on databases and large-scale distributed systems. Quotes I was pleasantly surprised to learn many new things from this book. - From the Afterword by Vlad Mihalcea An excellent guide covering everything from basics to cutting-edge features. - Dave Cramer, PostgreSQL JDBC Maintainer Pleasant, easy to read with tonnes of great code. - Mike McQuillan, McQTech Ltd Well-organized and easy to search. - Edward Pollack, Microsoft Data Platform MVP The missing guide to understanding and using Postgres. - Mehboob Alam, POSTGRESNX, Inc.

Pro Oracle GoldenGate 23ai for the DBA: Powering the Foundation of Data Integration and AI

Transform your data replication strategy into a competitive advantage with Oracle GoldenGate 23ai. This comprehensive guide delivers the practical knowledge DBAs and architects need to implement, optimize , and scale Oracle GoldenGate 23ai in production environments. Written by Oracle ACE Director Bobby Curtis, it blends deep technical expertise with real-world business insights from hundreds of implementations across manufacturing, financial services, and technology sectors. Beyond traditional replication, this book explores the groundbreaking capabilities that make GoldenGate 23ai essential for modern AI initiatives. Learn how to implement real-time vector replication for RAG systems, integrate with cloud platforms like GCP and Snowflake, and automate deployments using REST APIs and Python. Each chapter offers proven strategies to deliver measurable ROI while reducing operational risk. Whether you're upgrading from Classic GoldenGate , deploying your first cloud data pipeline, or building AI-ready data architectures, this book provides the strategic guidance and technical depth to succeed. With Bobby's signature direct approach, you'll avoid common pitfalls and implement best practices that scale with your business. What You Will Learn Master the microservices architecture and new capabilities of Oracle GoldenGate 23ai Implement secure, high-performance data replication across Oracle, PostgreSQL, and cloud databases Configure vector replication for AI and machine learning workloads, including RAG systems Design and build multi-master replication models with automatic conflict resolution Automate deployments and management using RESTful APIs and Python Optimize performance for sub-second replication lag in production environments Secure your replication environment with enterprise-grade features and compliance Upgrade from Classic to Microservices architecture with zero downtime Integrate with cloud platforms including OCI, GCP, AWS, and Azure Implement real-time data pipelines to BigQuery , Snowflake, and other cloud targets Navigate Oracle licensing models and optimize costs Who This Book Is For Database administrators, architects, and IT leaders working with Oracle GoldenGate —whether deploying for the first time, migrating from Classic architecture, or enabling AI-driven replication—will find actionable guidance on implementation, performance tuning, automation, and cloud integration. Covers unidirectional and multi-master replication and is packed with real-world use cases.

AI Systems Performance Engineering

Elevate your AI system performance capabilities with this definitive guide to maximizing efficiency across every layer of your AI infrastructure. In today's era of ever-growing generative models, AI Systems Performance Engineering provides engineers, researchers, and developers with a hands-on set of actionable optimization strategies. Learn to co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems that excel in both training and inference. Authored by Chris Fregly, a performance-focused engineering and product leader, this resource transforms complex AI systems into streamlined, high-impact AI solutions. Inside, you'll discover step-by-step methodologies for fine-tuning GPU CUDA kernels, PyTorch-based algorithms, and multinode training and inference systems. You'll also master the art of scaling GPU clusters for high performance, distributed model training jobs, and inference servers. The book ends with a 175+-item checklist of proven, ready-to-use optimizations. Codesign and optimize hardware, software, and algorithms to achieve maximum throughput and cost savings Implement cutting-edge inference strategies that reduce latency and boost throughput in real-world settings Utilize industry-leading scalability tools and frameworks Profile, diagnose, and eliminate performance bottlenecks across complex AI pipelines Integrate full stack optimization techniques for robust, reliable AI system performance

Keep Safe Using Mobile Tech, 2nd Edition

Leverage your smartphone and smartwatch for improved personal safety! Version 2.0, updated November 12, 2025 The digital and “real” worlds can both be scary places. The smartphone (and often smartwatch) you already carry with you can help reduce risks, deter theft, and mitigate violence. This book teaches you to secure your hardware, block abuse, automatically call emergency services, connect with others to ensure you arrive where and when you intended, detect stalking by compact trackers, and keep your ecosystem accounts from Apple, Google, and Microsoft secure. You don’t have to be reminded of the virtual and physical risks you face every day. Some of us are targeted more than others. Modern digital features built into mobile operating systems (and some computer operating systems) can help reduce our anxiety by putting more power in our hands to deter, deflect, block, and respond to abuse, threats, and emergencies. Keep Safe Using Mobile Tech looks at both digital threats, like online abuse and account hijacking, and ones in the physical world, like being stalked through Bluetooth trackers, facing domestic violence, or being in a car crash. The book principally covers the iPhone, Apple Watch, Android devices, and Wear OS watches. It also covers more limited but useful features available on the iPad and on computers running macOS or Windows. This second edition incorporates the massive number of new safety features Google added since October 2024 to the Android operating system, some particular to Google Pixel phones and smartwatches, and improved blocking, filtering, and screening added to Apple’s iOS 26 and related operating system updates in fall 2025. This book explores many techniques to help:

Learn how to harden your Apple Account, Google Account, and Microsoft Account beyond just a password or a text-message token. Discover filtering and blocking tools from Apple and Google that can prevent abusive, fraudulent, and phishing messages and calls from reaching you. Block seeing unwanted sensitive images on your iPhone, iPad, Mac, Apple Watch, or Android phone—and help your kids receive advice on how not to send them. Turn on tracking on your Apple, Google, and Microsoft devices, and use it to recover or erase stolen hardware. Keep your cloud-archived messages from leaking to attackers. Screen calls with an automated assistant so that you know who wants you before picking up and without sending to voicemail. Lock down your devices to keep thieves and other personal invaders from accessing them. Prepare for emergencies by setting up medical information on your mobile devices. Let a supported smartphone or smartwatch recognize when you’re in a car crash or have taken a hard fall and call emergency services for you (and text your emergency contacts) if you can’t respond. Keep track of heart anomalies through smartwatch alerts and tests on your Apple Watch and many Android Wear smartwatches. Tell others where or when you expect to check in with them again, and let your smartphone alert them if you don’t with your Apple iPhone or Android phone. Deter stalking from tiny Bluetooth trackers. Protect your devices and accounts against access from domestic assailants. Block thieves who steal your phone—potentially threatening you or attacking you in person—from gaining access to the rest of your digital life.

Data Engineering for Beginners

A hands-on technical and industry roadmap for aspiring data engineers In Data Engineering for Beginners, big data expert Chisom Nwokwu delivers a beginner-friendly handbook for everyone interested in the fundamentals of data engineering. Whether you're interested in starting a rewarding, new career as a data analyst, data engineer, or data scientist, or seeking to expand your skillset in an existing engineering role, Nwokwu offers the technical and industry knowledge you need to succeed. The book explains: Database fundamentals, including relational and noSQL databases Data warehouses and data lakes Data pipelines, including info about batch and stream processing Data quality dimensions Data security principles, including data encryption Data governance principles and data framework Big data and distributed systems concepts Data engineering on the cloud Essential skills and tools for data engineering interviews and jobs Data Engineering for Beginners offers an easy-to-read roadmap on a seemingly complicated and intimidating subject. It addresses the topics most likely to cause a beginning data engineer to stumble, clearly explaining key concepts in an accessible way. You'll also find: A comprehensive glossary of data engineering terms Common and practical career paths in the data engineering industry An introduction to key cloud technologies and services you may encounter early in your data engineering career Perfect for practicing and aspiring data analysts, data scientists, and data engineers, Data Engineering for Beginners is an effective and reliable starting point for learning an in-demand skill. It's a powerful resource for everyone hoping to expand their data engineering Skillset and upskill in the big data era.

Mastering Snowflake DataOps with DataOps.live: An End-to-End Guide to Modern Data Management

This practical, in-depth guide shows you how to build modern, sophisticated data processes using the Snowflake platform and DataOps.live —the only platform that enables seamless DataOps integration with Snowflake. Designed for data engineers, architects, and technical leaders, it bridges the gap between DataOps theory and real-world implementation, helping you take control of your data pipelines to deliver more efficient, automated solutions. . You’ll explore the core principles of DataOps and how they differ from traditional DevOps, while gaining a solid foundation in the tools and technologies that power modern data management—including Git, DBT, and Snowflake. Through hands-on examples and detailed walkthroughs, you’ll learn how to implement your own DataOps strategy within Snowflake and maximize the power of DataOps.live to scale and refine your DataOps processes. Whether you're just starting with DataOps or looking to refine and scale your existing strategies, this book—complete with practical code examples and starter projects—provides the knowledge and tools you need to streamline data operations, integrate DataOps into your Snowflake infrastructure, and stay ahead of the curve in the rapidly evolving world of data management. What You Will Learn Explore the fundamentals of DataOps , its differences from DevOps, and its significance in modern data management Understand Git’s role in DataOps and how to use it effectively Know why DBT is preferred for DataOps and how to apply it Set up and manage DataOps.live within the Snowflake ecosystem Apply advanced techniques to scale and evolve your DataOps strategy Who This Book Is For Snowflake practitioners—including data engineers, platform architects, and technical managers—who are ready to implement DataOps principles and streamline complex data workflows using DataOps.live.

Building Data Integration Solutions

Are you struggling to manage and make sense of the vast streams of data flowing into your organization? In today's data-driven world, the ability to effectively unify and organize disparate data sources is not just an advantage—it's a necessity. The challenge lies in navigating the complexities of data diversity, volume, and regulatory demands, which can overwhelm even the most seasoned data professionals. In this essential book, Jay Borthen offers a comprehensive guide to understanding the art of data integration. This book dives deep into the processes and strategies necessary for creating effective data pipelines that ensure consistency, accuracy, and accessibility of your data. Whether you're a novice looking to understand the basics or an experienced professional aiming to refine your skills, Borthen's insights and practical advice, grounded in real-world case studies, will empower you to transform your organization's data handling capabilities. Understand various data integration solutions and how different technologies can be employed Gain insights into the relationship between data integration and the overall data life cycle Learn to effectively design, set up, and manage data integration components within pipelines Acquire the knowledge to configure pipelines, perform data migrations, transformations, and more

Apache Hudi: The Definitive Guide

Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi. With this practical guide, data engineers, data architects, and software architects will discover how to seamlessly build an interoperable lakehouse from disparate data sources and deliver faster insights using your query engine of choice. Authors Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, and Rebecca Bilbro provide practical examples and insights to help you unlock the full potential of data lakehouses for different levels of analytics, from batch to interactive to streaming. You'll also learn how to evaluate storage choices and leverage built-in automated table optimizations to build, maintain, and operate production data applications. Understand the need for transactional data lakehouses and the challenges associated with building them Explore data ecosystem support provided by Apache Hudi for popular data sources and query engines Perform different write and read operations on Apache Hudi tables and effectively use them for various use cases, including batch and stream applications Apply different storage techniques and considerations such as indexing and clustering to maximize your lakehouse performance Build end-to-end incremental data pipelines using Apache Hudi for faster ingestion and fresher analytics

FinOps for Snowflake: A Guide to Cloud Financial Optimization

Unlock the full financial potential of your Snowflake environment. Learn how to cut costs, boost performance, and take control of your cloud data spend with FinOps for Snowflake—your essential guide to implementing a smart, automated, and Snowflake-optimized FinOps strategy. In today’s data-driven world, financial optimization on platforms like Snowflake is more critical than ever. Whether you're just beginning your FinOps journey or refining mature practices, this book provides a practical roadmap to align Snowflake usage with business goals, reduce costs, and improve performance—without compromising agility. Grounded in real-world case studies and packed with actionable strategies, FinOps for Snowflake shows how leading organizations are transforming their environments through automation, governance, and cost intelligence. You'll learn how to apply proven techniques for architecture tuning, workload and storage efficiency, and performance optimization—empowering you to make smarter, data-driven decisions. What You Will Learn Master FinOps principles tailored for Snowflake’s architecture and pricing model Enable collaboration across finance, engineering, and business teams Deliver real-time cost insights for smarter decision-making Optimize compute, storage, and Snowflake AI and ML services for efficiency Leverage Snowflake Cortex AI and Adoptive Warehouse/Compute for intelligent cost governance Apply proven strategies to achieve operational excellence and measurable savings Who this Book is For Data professionals, cloud engineers, FinOps practitioners, and finance teams seeking to improve cost visibility, operational efficiency, and financial accountability in Snowflake environments.

The SAP Fiori Handbook: A Step-By-Step Guide to SAP Fiori Essentials

The SAP Fiori Handbook is your one-stop-shop to turbo charge your UX skills to ensure your enterprise applications are more user-friendly and accessible. This handbook is broadly divided into four sections and provides you with an in-depth exploration of the SAP Fiori system with chapters offering a theoretical context as well as detailed, step-by-step explanations of the key concepts providing you with a systematic approach to deepen your understanding of the SAP Fiori environment. The book will cover everything from introductory concepts and installation before moving through the key elements of the SAP Fiori system , from the Fiori App, Launchpad Content Manager and SAP Fiori UI. We will also cover important topics like app support and troubleshooting and diving into SAP Fiori Reports too. You Will: Explore the entire SAP Fiori eco-system [endif]Learn to configure and manage SAP Fiori Launchpad content [endif]See how to create custom apps and technical catalogs [endif]Explore how to implement Spaces and Pages Understand how to use App Support Functionality for troubleshooting Explore how to configure and manage Catalogs and Groups using SAP Fiori Launchpad Designer Understand how to convert existing Groups to Pages Get to grips with Fiori Apps recommendation report as well as SAP Fiori Upgrade Impact Analysis Report Who is this Book for: SAP Fiori administrators, consultants and Business Analysts as well as anyone responsible for configuring and maintaining the SAP Fiori launchpad experience for their company.

Advanced Snowflake

As Snowflake's capabilities expand, staying updated with its latest features and functionalities can be overwhelming. The platform's rapid development gave rise to advanced tools like Snowpark and the Native App Framework, which are crucial for optimizing data operations but may seem complex to navigate. In this essential book, author Muhammad Fasih Ullah offers a detailed guide to understanding these sophisticated tools, ensuring you can leverage the full potential of Snowflake for data processing, application development, and deploying machine learning models at scale. You'll gain actionable insights and structured examples to transform your understanding and skills in handling advanced data scenarios within Snowflake. By the end of this book, you will: Grasp advanced features such as Snowpark, Snowflake Native App Framework, and Iceberg tables Enhance your projects with geospatial functions for comprehensive geospatial analytics Interact with Snowflake using a variety of programming languages through Snowpark Implement and manage machine learning models effectively using Snowpark ML Develop and deploy applications within the Snowflake environment

Mastering PostgreSQL Administration: Internals, Operations, Monitoring, and Oracle Migration Strategies

This book is your one-stop resource on PostgreSQL system architecture, installation, management, maintenance, and migration. It will help you address the critical needs driving successful database management today: reliability and availability, performance and scalability, security and compliance, cost-effectiveness and flexibility, disaster recovery, and real-time analytics—all in one volume. Each topic in the book is thoroughly explained by industry experts and includes step-by-step instructions for configuring the features, a discussion of common issues and their solutions, and an exploration of real-world scenarios and case studies that illustrate how concepts work in practice. You won't find the book's comprehensive coverage of advanced topics, including migration from Oracle to PostgreSQL, heterogeneous replication, and backup & recovery, in one place—online or anywhere else. What You Will Learn Install PostgreSQL using source code and yum installation Back up and recover Migrate from Oracle database to PostgreSQL using ora2pg utility Replicate from PostgreSQL to Oracle database and vice versa using Oracle GoldenGate Monitor using Grafana, PGAdmin, and command line tools Maintain with VACUUM, REINDEX, etc. Who This Book Is For Intermediate and advanced PostgreSQL users, including PostgreSQL administrators, architects, developers, analysts, disaster recovery system engineers, high availability engineers, and migration engineers

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

Modernizing SAP with AWS: A Comprehensive Journey to Cloud Migration, Architecture, and Innovation Strategies

Follow the cloud journey of a fictional company Nimbus Airlines and the process it goes through to modernize its SAP systems. This book provides a detailed guide for those looking to transition their SAP systems to the cloud using Amazon Web Services (AWS). Through the lens of various characters, the book is structured in three parts — starting with an introduction to SAP and AWS fundamentals, followed by technical architecture insights, and concluding with migration strategies and case studies, the book covers technical aspects of modernizing SAP with AWS. You’ll review the partnership between SAP and AWS, highlighted by their long-standing collaboration and shared innovations. Then design an AWS architecture tailored for SAP workloads, including high availability, disaster recovery, and operations automation. The book concludes with a tour of the migration process, offering various strategies, tools, and frameworks reinforced with real-world customer case studies that showcase successful SAP migrations to AWS. Modernizing SAP with AWS equips business leaders and technical architects with the knowledge to leverage AWS for their SAP systems, ensuring a smooth transition and unlocking new opportunities for innovation. What You Will Learn Understand the fundamentals of AWS and its key components, including computing, storage, networking, and microservices, for SAP systems. Explore the technical partnership between SAP and AWS, learning how their collaboration drives innovation and delivers business value. Design an optimized AWS architecture for SAP workloads, focusing on high availability, disaster recovery, and operations automation. Discover innovative ways to enhance and extend SAP functionality using AWS tools for better system performance and automation. Who This Book Is For SAP professionals and consultants interested in learning how AWS can enhance SAP performance, security, and automation. Cloud engineers and developers involved in SAP migration projects, looking for best practices and real-world case studies for successful implementation. Enterprise architects seeking to design optimized, scalable, and secure SAP infrastructure on AWS. CIOs, CTOs, and IT managers aiming to modernize SAP systems and unlock innovation through cloud technology.

Understanding ETL (Updated Edition)

"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Constant shifts in the data landscape—including the implementations of lakehouse architectures and the importance of high-scale real-time data—mean that today's data practitioners must approach ETL a bit differently. This updated technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You'll come away equipped to make informed decisions when implementing ETL and confident about choosing the technology stack that will help you succeed. Discover what ETL looks like in the new world of data lakehouses Learn how to deal with real-time data Explore low-code ETL tools Understand how to best achieve scale, performance, and observability

Apache Polaris: The Definitive Guide

Revolutionize your understanding of modern data management with Apache Polaris (incubating), the open source catalog designed for data lakehouse industry standard Apache Iceberg. This comprehensive guide takes you on a journey through the intricacies of Apache Iceberg data lakehouses, highlighting the pivotal role of Iceberg catalogs. Authors Alex Merced, Andrew Madson, and Tomer Shiran explore Apache Polaris's architecture and features in detail, equipping you with the knowledge needed to leverage its full potential. Data engineers, data architects, data scientists, and data analysts will learn how to seamlessly integrate Apache Polaris with popular data tools like Apache Spark, Snowflake, and Dremio to enhance data management capabilities, optimize workflows, and secure datasets. Get a comprehensive introduction to Iceberg data lakehouses Understand how catalogs facilitate efficient data management and querying in Iceberg Explore Apache Polaris's unique architecture and its powerful features Deploy Apache Polaris locally, and deploy managed Apache Polaris from Snowflake and Dremio Perform basic table operations on Apache Spark, Snowflake, and Dremio