DWH

Google Cloud Certified Professional Data Engineer Certification Guide

2026-02-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ankur Roy , Sireesha Pulipati

Cloud Computing Data Engineering GCP Cyber Security SQL cloud-computing cloud-platforms gcp-certifications gcp-certifications-professional-tier google-cloud google-cloud-certified-professional-data-engineer google cloud certified - professional data engineer +1 more

A guide to pass the GCP Professional Data Engineer exam on your first attempt and upgrade your data engineering skills on GCP. Key Features Fully understand the certification exam content and exam objectives Consolidate your knowledge of all essential exam topics and key concepts Get realistic experience of answering exam-style questions Develop practical skills for everyday use Purchase of this book unlocks access to web-based exam prep resources including mock exams, flashcards, exam tips Book Description The GCP Professional Data Engineer certification validates the fundamental knowledge required to perform data engineering tasks and use GCP services to enhance data engineering processes and further your career in the data engineering/architecting field. This book is a best-in-class study guide that fully covers the GCP Professional Data Engineer exam objectives and helps you pass the exam first time. Complete with clear explanations, chapter review questions, realistic mock exams, and pragmatic solutions, this guide will help you master the core exam concepts and build the understanding you need to go into the exam with the skills and confidence to get the best result you can. With the help of relevant examples, you'll learn fundamental data engineering concepts such as data warehousing and data security. As you progress, you'll delve into the important domains of the exam, including data pipelining, data migration, and data processing. Unlike other study guides, this book contains logical reasoning behind the choice of correct answers based in scenarios and provide you with excellent tips regarding the optimal use of each service, and gives you everything you need to pass the exam and enhance your prospects in the data engineering field. What you will learn Create data solutions and pipelines in GCP Analyze and transform data into useful information Apply data engineering concepts to real scenarios Create secure, cost-effective, valuable GCP workloads Work in the GCP environment with industry best practices Who this book is for This book is for data engineers who want a reliable source for the key concepts and terms present in the most prestigious and highly-sought-after cloud-based data engineering certification. This book will help you improve your data engineering in GCP skills to give you a better chance at earning the GCP Professional Data Engineer Certification. You will already be familiar with the Google Cloud Platform, having either explored it (professionally or personally) for at least a year. You should also have some familiarity with basic data concepts (such as types of data and basic SQL knowledge).

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

2025-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dustin Dorsey (Onix) , Cameron Cyr

CI/CD Cloud Computing dbt Git Modern Data Stack Python SQL data data-engineering data-warehouse storage-repositories

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

2025-08-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Donna Strok , Dmitry Foshin , Dmitry Anoshin

Analytics BI Cloud Computing Data Analytics Databricks ETL/ELT Iceberg Matillion Cyber Security Snowflake Tableau data +1 more

This book is your guide to the modern market of data analytics platforms and the benefits of using Snowflake, the data warehouse built for the cloud. As organizations increasingly rely on modern cloud data platforms, the core of any analytics framework—the data warehouse—is more important than ever. This updated 2nd edition ensures you are ready to make the most of the industry’s leading data warehouse. This book will onboard you to Snowflake and present best practices for deploying and using the Snowflake data warehouse. The book also covers modern analytics architecture, integration with leading analytics software such as Matillion ETL, Tableau, and Databricks, and migration scenarios for on-premises legacy data warehouses. This new edition includes expanded coverage of SnowPark for developing complex data applications, an introduction to managing large datasets with Apache Iceberg tables, and instructions for creating interactive data applications using Streamlit, ensuring readers are equipped with the latest advancements in Snowflake's capabilities. What You Will Learn Master key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake Integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Manage large datasets with Apache Iceberg Tables Implement continuous data loading with Snowpipe and Dynamic Tables Who This Book Is For Data professionals, business analysts, IT administrators, and existing or potential Snowflake users

Microsoft Fabric Analytics Engineer Associate Certification Companion: Preparation for DP-600 Microsoft Certification

2025-07-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Dr. Gomathi S

AI/ML Analytics Data Engineering Microsoft Fabric Cyber Security analytics-platforms data data-science dp-600-microsoft-certified-fabric-analytics-engineer-associate dp-600: microsoft certified fabric analytics engineer associate microsoft-fabric

As organizations increasingly leverage Microsoft Fabric to unify their data engineering, analytics, and governance strategies, the role of the Fabric Analytics Engineer has become more crucial than ever. This book equips readers with the knowledge and hands-on skills required to excel in this domain and pass the DP-600 certification exam confidently. This book covers the entire certification syllabus with clarity and depth, beginning with an overview of Microsoft Fabric. You will gain an understanding of the platform’s architecture and how it integrates with data and AI workloads to provide a unified analytics solution. You will then delve into implementing a data warehouse in Microsoft Fabric, exploring techniques to ingest, transform, and store data efficiently. Next, you will learn how to work with semantic models in Microsoft Fabric, enabling them to create intuitive, meaningful data representations for visualization and reporting. Then, you will focus on administration and governance in Microsoft Fabric, emphasizing best practices for security, compliance, and efficient management of analytics solutions. Lastly, you will find detailed practice tests and exam strategies along with supplementary materials to reinforce key concepts. After reading the book, you will have the background and capability to learn the skills and concepts necessary both to pass the DP-600 exam and become a confident Fabric Analytics Engineer. What You Will Learn A complete understanding of all DP-600 certification exam objectives and requirements Key concepts and terminology related to Microsoft Fabric Analytics Step-by-step preparation for successfully passing the DP-600 certification exam Insights into exam structure, question patterns, and strategies for tackling challenging sections Confidence in demonstrating skills validated by the Microsoft Certified: Fabric Analytics Engineer Associate credential Who This Book Is For Data engineers, analysts, and professionals with some experience in data engineering or analytics, seeking to expand their knowledge of Microsoft Fabric

Amazon Redshift Cookbook - Second Edition

2025-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anusha Challa (AWS) , Harshida Patel (AWS) , Shruti Worlikar (AWS Analytics)

AI/ML Analytics AWS Cloud Computing Data Analytics ETL/ELT GenAI Redshift Cyber Security amazon-redshift data data-engineering +1 more

Amazon Redshift Cookbook provides practical techniques for utilizing AWS's managed data warehousing service effectively. With this book, you'll learn to create scalable and secure data analytics solutions, tackle data integration challenges, and leverage Redshift's advanced features like data sharing and generative AI capabilities. What this Book will help me do Create end-to-end data analytics solutions from ingestion to reporting using Amazon Redshift. Optimize the performance and security of Redshift implementations to meet enterprise standards. Leverage Amazon Redshift for zero-ETL ingestion and advanced concurrency scaling. Integrate Redshift with data lakes for enhanced data processing versatility. Implement generative AI and machine learning solutions directly within Redshift environments. Author(s) Shruti Worlikar, Harshida Patel, and Anusha Challa are seasoned data experts who bring together years of experience with Amazon Web Services and data analytics. Their combined expertise enables them to offer actionable insights, hands-on recipes, and proven strategies for implementing and optimizing Amazon Redshift-based solutions. Who is it for? This book is best suited for data analysts, data engineers, and architects who are keen on mastering modern data warehouse solutions using Redshift. Readers should have some knowledge of data warehousing and familiarity with cloud concepts. Ideal for professionals looking to migrate on-premises systems or build cloud-native analytics pipelines leveraging Redshift.

Accelerating Data Pipeline Development

2025-03-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Josh Hall (Coalesce)

Data Engineering data data-engineering

Today's data engineering teams are overwhelmed—juggling fire drills and endless requests while relying on manual, repetitive processes for building data pipelines. This much-needed tech guide from author Josh Hall introduces a practical approach to streamlining pipeline development, empowering teams to work smarter, not harder. Using Coalesce, a modern development platform, you'll learn to standardize workflows, apply reusable design patterns, and build faster, more efficient pipelines—all without piling on tech debt. Ideal for data engineers, architects, and analysts of all experience levels, the book offers clear explanations of Coalesce's core functionality including configuring environments, defining nodes, and connecting to data warehouses. Packed with workflows and useful takeaways, it's your guide to delivering high-quality, actionable data while reducing pipeline development time. Set up Coalesce and integrate with a data warehouse Use reusable nodes and design patterns for faster development Accelerate pipeline delivery with reduced manual effort Leverage Coalesce Marketplace for advanced functionality

DuckDB in Action

2024-08-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael Simons , Mark Needham , Michael Hunger

Analytics API Big Data Cloud Computing CSV Data Analytics DuckDB Java JSON Motherduck Neo4j Pandas +8 more

Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: Read and process data from CSV, JSON and Parquet sources both locally and remote Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the Technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the Book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's Inside Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Fast-paced SQL recap: From simple queries to advanced analytics About the Reader For data pros comfortable with Python and CLI tools. About the Authors Mark Needham is a blogger and video creator at @‌LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j. Quotes I use DuckDB every day, and I still learned a lot about how DuckDB makes things that are hard in most databases easy! - Jordan Tigani, Founder, MotherDuck An excellent resource! Unlocks possibilities for storing, processing, analyzing, and summarizing data at the edge using DuckDB. - Pramod Sadalage, Director, Thoughtworks Clear and accessible. A comprehensive resource for harnessing the power of DuckDB for both novices and experienced professionals. - Qiusheng Wu, Associate Professor, University of Tennessee Excellent! The book all we ducklings have been waiting for! - Gunnar Morling, Decodable

Data Modeling with Microsoft Power BI

2024-06-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Markus Ehrenmueller-Jensen

BI Data Modelling DAX ETL/ELT dimensional modeling Microsoft Power BI SQL business-intelligence data data-science microsoft-power-platform +1 more

Data modeling is the single most overlooked feature in Power BI Desktop, yet it's what sets Power BI apart from other tools on the market. This practical book serves as your fast-forward button for data modeling with Power BI, Analysis Services tabular, and SQL databases. It serves as a starting point for data modeling, as well as a handy refresher. Author Markus Ehrenmueller-Jensen, founder of Savory Data, shows you the basic concepts of Power BI's semantic model with hands-on examples in DAX, Power Query, and T-SQL. If you're looking to build a data warehouse layer, chapters with T-SQL examples will get you started. You'll begin with simple steps and gradually solve more complex problems. This book shows you how to: Normalize and denormalize with DAX, Power Query, and T-SQL Apply best practices for calculations, flags and indicators, time and date, role-playing dimensions and slowly changing dimensions Solve challenges such as binning, budget, localized models, composite models, and key value with DAX, Power Query, and T-SQL Discover and tackle performance issues by applying solutions in DAX, Power Query, and T-SQL Work with tables, relations, set operations, normal forms, dimensional modeling, and ETL

Database Management Systems by Pearson

2024-05-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rohit Khurana

Cyber Security SQL data data-engineering relational-databases

Express Learning is a series of books designed as quick reference guides to important undergraduate computer courses. The organized and accessible format of these books allows students to learn important concepts in an easy-to-understand, question-and-answer format. These portable learning tools have been designed as one-stop references for students to understand and master the subjects by themselves.

Features –

• Designed as a student-friendly self-learning guide. The book is written in a clear, concise, and lucid manner. • Easy-to-understand question-and-answer format. • Includes previously asked as well as new questions organized in chapters. • All types of questions including MCQs, short and long questions are covered. • Solutions to numerical questions asked at examinations are provided. • All ideas and concepts are presented with clear examples. • Text is well structured and well supported with suitable diagrams. • Inter-chapter dependencies are kept to a minimum

Book Contents –

1: Database System 2: Conceptual Modelling 3: Relational Model 4: Relational Algebra and Calculus 5: Structured Query Language 6: Relational Database Design 7: Data Storage and Indexing 8: Query Processing and Optimization 9: Introduction to Transaction Processing 10: Concurrency Control Techniques 11: Database Recovery System 12: Database Security 13: Database System Architecture 14: Data Warehousing, OLAP, and Data Mining 15: Information Retrieval 16: Miscellaneous Questions

Express Learning - Data Warehousing and Data Mining, 1st Edition by Pearson

2024-05-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by ITL Education

data data-engineering data-warehouse storage-repositories

Express Learning is a series of books designed as quick reference guides to important undergraduate courses. The organized and accessible format of these books allows students to learn important concepts in an easy-to-understand, question-and-answer format. These portable learning tools have been designed as one-stop references for students to understand and master the subjects by themselves.

Book Contents –

Chapter 1: Introduction to Data Warehouse Chapter 2: Building a Data Warehouse Chapter 3: Data Warehouse: Architecture Chapter 4: OLAP Technology Chapter 5: Introduction to Data Mining Chapter 6: Data Preprocessing Chapter 7: Mining Association Rules Chapter 8: Classification and Prediction Chapter 9: Cluster Analysis Chapter 10: Advanced Techniques of Data Mining and Its Applications Index

Azure Data Factory by Example: Practical Implementation for Data Engineers

2024-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Swinbank

Analytics Azure ADF Cloud Computing ETL/ELT Microsoft SQL Synapse data data-engineering data-lake storage-repositories

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. This edition, updated for 2024, includes the latest developments to the Azure Data Factory service: Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics. Improvements to flow control provided by activity deactivation and the Fail activity. The introduction of reusable data flow components such as user-defined functions and flowlets. Extensions to integration runtime capabilities including Managed VNet support. The ability to trigger pipelines in response to custom events. Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

Azure Data Factory Cookbook - Second Edition

2024-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

Analytics Azure ADF Cloud Computing Data Engineering Data Lake Databricks Delta ETL/ELT Microsoft Fabric Synapse +4 more

This comprehensive guide to Azure Data Factory shows you how to create robust data pipelines and workflows to handle both cloud and on-premises data solutions. Through practical recipes, you will learn to build, manage, and optimize ETL, hybrid ETL, and ELT processes. The book offers detailed explanations to help you integrate technologies like Azure Synapse, Data Lake, and Databricks into your projects. What this Book will help me do Master building and managing data pipelines using Azure Data Factory's latest versions and features. Leverage Azure Synapse and Azure Data Lake for streamlined data integration and analytics workflows. Enhance your ETL/ELT solutions with Microsoft Fabric, Databricks, and Delta tables. Employ debugging tools and workflows in Azure Data Factory to identify and solve data processing issues efficiently. Implement industry-grade best practices for reliable and efficient data orchestration and integration pipelines. Author(s) Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, and Xenia Ireton collectively bring years of expertise in data engineering and cloud-based solutions. They are recognized professionals in the Azure ecosystem, dedicated to sharing their knowledge through detailed and actionable content. Their collaborative approach ensures that this book provides practical insights for technical audiences. Who is it for? This book is ideal for data engineers, ETL developers, and professional architects who work with cloud and hybrid environments. If you're looking to upskill in Azure Data Factory or expand your knowledge into related technologies like Synapse Analytics or Databricks, this is for you. Readers should have a foundational understanding of data warehousing concepts to fully benefit from the material.

Mastering Microsoft Fabric: SAASification of Analytics

2024-02-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Debananda Ghosh

AI/ML Analytics AWS Azure ADF BI Cloud Computing Data Engineering Data Lakehouse Data Management Data Science LLM +9 more

Learn and explore the capabilities of Microsoft Fabric, the latest evolution in cloud analytics suites. This book will help you understand how users can leverage Microsoft Office equivalent experience for performing data management and advanced analytics activity. The book starts with an overview of the analytics evolution from on premises to cloud infrastructure as a service (IaaS), platform as a service (PaaS), and now software as a service (SaaS version) and provides an introduction to Microsoft Fabric. You will learn how to provision Microsoft Fabric in your tenant along with the key capabilities of SaaS analytics products and the advantage of using Fabric in the enterprise analytics platform. OneLake and Lakehouse for data engineering is discussed as well as OneLake for data science. Author Ghosh teaches you about data warehouse offerings inside Microsoft Fabric and the new data integration experience which brings Azure Data Factory and Power Query Editor of Power BI together in a single platform. Also demonstrated is Real-Time Analytics in Fabric, including capabilities such as Kusto query and database. You will understand how the new event stream feature integrates with OneLake and other computations. You also will know how to configure the real-time alert capability in a zero code manner and go through the Power BI experience in the Fabric workspace. Fabric pricing and its licensing is also covered. After reading this book, you will understand the capabilities of Microsoft Fabric and its Integration with current and upcoming Azure OpenAI capabilities. What You Will Learn Build OneLake for all data like OneDrive for Microsoft Office Leverage shortcuts for cross-cloud data virtualization in Azure and AWS Understand upcoming OpenAI integration Discover new event streaming and Kusto query inside Fabric real-time analytics Utilize seamless tooling for machine learning and data science Who This Book Is For Citizen users and experts in the data engineering and data science fields, along with chief AI officers

Deciphering Data Architectures

2024-02-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by James Serra (Microsoft)

Big Data Data Lake Data Lakehouse Microsoft Fabric data data-engineering data-lake storage-repositories

Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of these architectures to help data professionals understand the pros and cons of each. James Serra, big data and data warehousing solution architect at Microsoft, examines common data architecture concepts, including how data warehouses have had to evolve to work with data lake features. You'll learn what data lakehouses can help you achieve, as well as how to distinguish data mesh hype from reality. Best of all, you'll be able to determine the most appropriate data architecture for your needs. With this book, you'll: Gain a working understanding of several data architectures Learn the strengths and weaknesses of each approach Distinguish data architecture theory from reality Pick the best architecture for your use case Understand the differences between data warehouses and data lakes Learn common data architecture concepts to help you build better solutions Explore the historical evolution and characteristics of data architectures Learn essentials of running an architecture design session, team organization, and project success factors Free from product discussions, this book will serve as a timeless resource for years to come.

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

2023-12-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Abhishek Mishra , Anjani Kumar , Sanjeev Kumar (Tesa SE)

AWS Azure BI Big Data Cloud Computing Data Governance Data Lake Data Lakehouse Delta Pandas Cyber Security Data Streaming +4 more

Design and architect new generation cloud-based data warehouses using Azure and AWS. This book provides an in-depth understanding of how to build modern cloud-native data warehouses, as well as their history and evolution. The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices in the modern world of distributed data processing. The authors walk you through the essential concepts of Data Mesh, Data Lake, Lakehouse, and Delta Lake. And they demonstrate the services and offerings available on Azure and AWS that deal with data orchestration, data democratization, data governance, data security, and business intelligence. After completing this book, you will be ready to design and architect enterprise-grade, cloud-based modern data warehouses using industry best practices and guidelines. What You Will Learn Understand the core concepts underlying modern data warehouses Design and build cloud-native data warehousesGain a practical approach to architecting and building data warehouses on Azure and AWS Implement modern data warehousing components such as Data Mesh, Data Lake, Delta Lake, and Lakehouse Process data through pandas and evaluate your model’s performance using metrics such as F1-score, precision, and recall Apply deep learning to supervised, semi-supervised, and unsupervised anomaly detection tasks for tabular datasets and time series applications Who This Book Is For Experienced developers, cloud architects, and technology enthusiasts looking to build cloud-based modern data warehouses using Azure and AWS

Analytics Engineering with SQL and dbt

2023-12-08 · O'Reilly SQL Books O'Reilly Amazon

book

by Rui Pedro Machado , Helder Russa

Analytics Analytics Engineering BI Data Engineering dbt SQL

With the shift from data warehouses to data lakes, data now lands in repositories before it's been transformed, enabling engineers to model raw data into clean, well-defined datasets. dbt (data build tool) helps you take data further. This practical book shows data analysts, data engineers, BI developers, and data scientists how to create a true self-service transformation platform through the use of dynamic SQL. Authors Rui Machado from Monstarlab and Hélder Russa from Jumia show you how to quickly deliver new data products by focusing more on value delivery and less on architectural and engineering aspects. If you know your business well and have the technical skills to model raw data into clean, well-defined datasets, you'll learn how to design and deliver data models without any technical influence. With this book, you'll learn: What dbt is and how a dbt project is structured How dbt fits into the data engineering and analytics worlds How to collaborate on building data models The main tools and architectures for building useful, functional data models How to fit dbt into data warehousing and laking architecture How to build tests for data transformations

Data Exploration and Preparation with BigQuery

2023-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mike Kahn

Big Data BigQuery Cloud Computing GCP SQL data data-engineering google-bigquery

In "Data Exploration and Preparation with BigQuery," Michael Kahn provides a hands-on guide to understanding and utilizing Google's powerful data warehouse solution, BigQuery. This comprehensive book equips you with the skills needed to clean, transform, and analyze large datasets for actionable business insights. What this Book will help me do Master the process of exploring and assessing the quality of datasets. Learn SQL for performing efficient and advanced data transformations in BigQuery. Optimize the performance of BigQuery queries for speed and cost-effectiveness. Discover best practices for setting up and managing BigQuery resources. Apply real-world case studies to analyze data and derive meaningful insights. Author(s) Michael Kahn is an experienced data engineer and author specializing in big data solutions and technologies. With years of hands-on experience working with Google Cloud Platform and BigQuery, he has assisted organizations in optimizing their data pipelines for effective decision-making. His accessible writing style ensures complex topics become approachable, enabling readers of various skill levels to succeed. Who is it for? This book is tailored for data analysts, data engineers, and data scientists who want to learn how to effectively use BigQuery for data exploration and preparation. Whether you're new to BigQuery or looking to deepen your expertise in working with large datasets, this book provides clear guidance and practical examples to achieve your goals.

Amazon Redshift: The Definitive Guide

2023-10-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rajesh Francis , Rajiv Gupta , Milind Oke

AI/ML Analytics AWS Cloud Computing Redshift Cyber Security amazon-redshift data data-engineering relational-databases

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse. Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift. By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you: Build a cloud data strategy around Amazon Redshift as foundational data warehouse Get started with Amazon Redshift with simple-to-use data models and design best practices Understand how and when to use Redshift Serverless and Redshift provisioned clusters Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing Learn best practices for security, monitoring, resilience, and disaster recovery Leverage Amazon Redshift integration with other AWS services to unlock additional value

Learning and Operating Presto

2023-09-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tim Meehan , Ying Su , Angelica Lo Duca , Vivek Bharathan

BI Cloud Computing Hadoop IBM Presto Cyber Security SQL data data-engineering

The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside. Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production. With this book, you will: Learn how to install and configure Presto Use Presto with business intelligence tools Understand how to connect Presto to a variety of data sources Extend Presto for real-time business insight Learn how to apply best practices and tuning Get troubleshooting tips for logs, error messages, and more Explore Presto's architectural concepts and usage patterns Understand Presto security and administration

Pro Power BI Architecture: Development, Deployment, Sharing, and Security for Microsoft Power BI Solutions

2023-07-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by reza rad (RADACAD)

BI Cloud Computing Dataflow Microsoft Power BI Cyber Security business-intelligence data data-science microsoft-power-platform power-bi

This book provides detailed guidance around architecting and deploying Power BI reporting solutions, including help and best practices for sharing and security. You’ll find chapters on dataflows, shared datasets, composite model and DirectQuery connections to Power BI datasets, deployment pipelines, XMLA endpoints, and many other important features related to the overall Power BI architecture that are new since the first edition. You will gain an understanding of what functionality each of the Power BI components provide (such as Dataflow, Shared Dataset, Datamart, thin reports, and paginated reports), so that you can make an informed decision about what components to use in your solution. You will get to know the pros and cons of each component, and how they all work together within the larger Power BI architecture. Commonly encountered problems you will learn to handle include content unexpectedly changing while users are in the process of creating reports and building analyses, methods of sharing analyses that don’t cover all the requirements of your business or organization, and inconsistent security models. Detailed examples help you to understand and choose from among the different methods available for sharing and securing Power BI content so that only intended recipients can see it. The knowledge provided in this book will allow you to choose an architecture and deployment model that suits the needs of your organization. It will also help ensure that you do not spend your time maintaining your solution, but on using it for its intended purpose: gaining business value from mining and analyzing your organization’s data. What You Will Learn Architect Power BI solutions that are reliable and easy to maintain Create development templates and structures in support of reusability Set up and configure the Power BI gateway as a bridge between on-premises data sourcesand the Power BI cloud service Select a suitable connection type—Live Connection, DirectQuery, Scheduled Refresh, or Composite Model—for your use case Choose the right sharing method for how you are using Power BI in your organization Create and manage environments for development, testing, and production Secure your data using row-level and object-level security Save money by choosing the right licensing plan Who This Book Is For Data analysts and developers who are building reporting solutions around Power BI, as well as architects and managers who are responsible for the big picture of how Power BI meshes with an organization’s other systems, including database and data warehouse systems.

talk-data.com

Activity Trend

Top Events

Top Speakers

Google Cloud Certified Professional Data Engineer Certification Guide

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Microsoft Fabric Analytics Engineer Associate Certification Companion: Preparation for DP-600 Microsoft Certification

Amazon Redshift Cookbook - Second Edition

Accelerating Data Pipeline Development

DuckDB in Action

Data Modeling with Microsoft Power BI

Database Management Systems by Pearson

Express Learning - Data Warehousing and Data Mining, 1st Edition by Pearson

Azure Data Factory by Example: Practical Implementation for Data Engineers

Azure Data Factory Cookbook - Second Edition

Mastering Microsoft Fabric: SAASification of Analytics

Deciphering Data Architectures

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

Analytics Engineering with SQL and dbt

Data Exploration and Preparation with BigQuery

Amazon Redshift: The Definitive Guide

Learning and Operating Presto

Pro Power BI Architecture: Development, Deployment, Sharing, and Security for Microsoft Power BI Solutions