ETL/ELT

The Data Engineer's Guide to Microsoft Fabric

2027-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Henrik Reich (twoday Data & AI)

Data Engineering Data Lakehouse Databricks Microsoft Fabric Python Spark SQL Data Streaming analytics-platforms data data-science +1 more

Modern data engineering is evolving; and with Microsoft Fabric, the entire data platform experience is being redefined. This essential book offers a fresh, hands-on approach to navigating this shift. Rather than being an introduction to features, this guide explains how Fabric's key components—Lakehouse, Warehouse, and Real-Time Intelligence—work under the hood and how to put them to use in realistic workflows. Written by Christian Henrik Reich, a data engineering expert with experience that extends from Databricks to Fabric, this book is a blend of foundational theory and practical implementation of lakehouse solutions in Fabric. You'll explore how engines like Apache Spark and Fabric Warehouse collaborate with Fabric's Real-Time Intelligence solution in an integrated platform, and how to build ETL/ELT pipelines that deliver on speed, accuracy, and scale. Ideal for both new and practicing data engineers, this is your entry point into the fabric of the modern data platform. Acquire a working knowledge of lakehouses, warehouses, and streaming in Fabric Build resilient data pipelines across real-time and batch workloads Apply Python, Spark SQL, T-SQL, and KQL within a unified platform Gain insight into architectural decisions that scale with data needs Learn actionable best practices for engineering clean, efficient, governed solutions

Data Engineering for Multimodal AI

2026-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vasundra Srinivasan

AI/ML Cloud Computing Data Engineering Data Governance MLOps Cyber Security data data-engineering

A shift is underway in how organizations approach data infrastructure for AI-driven transformation. As multimodal AI systems and applications become increasingly sophisticated and data hungry, data systems must evolve to meet these complex demands. Data Engineering for Multimodal AI is one of the first practical guides for data engineers, machine learning engineers, and MLOps specialists looking to rapidly master the skills needed to build robust, scalable data infrastructures for multimodal AI systems and applications. You'll follow the entire lifecycle of AI-driven data engineering, from conceptualizing data architectures to implementing data pipelines optimized for multimodal learning in both cloud native and on-premises environments. And each chapter includes step-by-step guides and best practices for implementing key concepts. Design and implement cloud native data architectures optimized for multimodal AI workloads Build efficient and scalable ETL processes for preparing diverse AI training data Implement real-time data processing pipelines for multimodal AI inference Develop and manage feature stores that support multiple data modalities Apply data governance and security practices specific to multimodal AI projects Optimize data storage and retrieval for various types of multimodal ML models Integrate data versioning and lineage tracking in multimodal AI workflows Implement data-quality frameworks to ensure reliable outcomes across data types Design data pipelines that support responsible AI practices in a multimodal context

Understanding ETL (Updated Edition)

2025-09-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matt Palmer

AI/ML BI Data Lakehouse data data-engineering etl

"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Constant shifts in the data landscape—including the implementations of lakehouse architectures and the importance of high-scale real-time data—mean that today's data practitioners must approach ETL a bit differently. This updated technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You'll come away equipped to make informed decisions when implementing ETL and confident about choosing the technology stack that will help you succeed. Discover what ETL looks like in the new world of data lakehouses Learn how to deal with real-time data Explore low-code ETL tools Understand how to best achieve scale, performance, and observability

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

2025-08-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Donna Strok , Dmitry Foshin , Dmitry Anoshin

Analytics BI Cloud Computing Data Analytics Databricks DWH Iceberg Matillion Cyber Security Snowflake Tableau data +1 more

This book is your guide to the modern market of data analytics platforms and the benefits of using Snowflake, the data warehouse built for the cloud. As organizations increasingly rely on modern cloud data platforms, the core of any analytics framework—the data warehouse—is more important than ever. This updated 2nd edition ensures you are ready to make the most of the industry’s leading data warehouse. This book will onboard you to Snowflake and present best practices for deploying and using the Snowflake data warehouse. The book also covers modern analytics architecture, integration with leading analytics software such as Matillion ETL, Tableau, and Databricks, and migration scenarios for on-premises legacy data warehouses. This new edition includes expanded coverage of SnowPark for developing complex data applications, an introduction to managing large datasets with Apache Iceberg tables, and instructions for creating interactive data applications using Streamlit, ensuring readers are equipped with the latest advancements in Snowflake's capabilities. What You Will Learn Master key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake Integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Manage large datasets with Apache Iceberg Tables Implement continuous data loading with Snowpipe and Dynamic Tables Who This Book Is For Data professionals, business analysts, IT administrators, and existing or potential Snowflake users

Amazon Redshift Cookbook - Second Edition

2025-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anusha Challa (AWS) , Harshida Patel (AWS) , Shruti Worlikar (AWS Analytics)

AI/ML Analytics AWS Cloud Computing Data Analytics DWH GenAI Redshift Cyber Security amazon-redshift data data-engineering +1 more

Amazon Redshift Cookbook provides practical techniques for utilizing AWS's managed data warehousing service effectively. With this book, you'll learn to create scalable and secure data analytics solutions, tackle data integration challenges, and leverage Redshift's advanced features like data sharing and generative AI capabilities. What this Book will help me do Create end-to-end data analytics solutions from ingestion to reporting using Amazon Redshift. Optimize the performance and security of Redshift implementations to meet enterprise standards. Leverage Amazon Redshift for zero-ETL ingestion and advanced concurrency scaling. Integrate Redshift with data lakes for enhanced data processing versatility. Implement generative AI and machine learning solutions directly within Redshift environments. Author(s) Shruti Worlikar, Harshida Patel, and Anusha Challa are seasoned data experts who bring together years of experience with Amazon Web Services and data analytics. Their combined expertise enables them to offer actionable insights, hands-on recipes, and proven strategies for implementing and optimizing Amazon Redshift-based solutions. Who is it for? This book is best suited for data analysts, data engineers, and architects who are keen on mastering modern data warehouse solutions using Redshift. Readers should have some knowledge of data warehousing and familiarity with cloud concepts. Ideal for professionals looking to migrate on-premises systems or build cloud-native analytics pipelines leveraging Redshift.

Databricks Certified Data Engineer Associate Study Guide

2025-02-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Derar Alhussein (Acadford)

Data Engineering Data Governance Data Lakehouse Databricks Delta Spark SQL Data Streaming data data-engineering databricks-data-engineer-associate

Data engineers proficient in Databricks are currently in high demand. As organizations gather more data than ever before, skilled data engineers on platforms like Databricks become critical to business success. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the Databricks platform and its capabilities, as well as the essential skills to effectively execute various data engineering tasks on the platform. In this comprehensive study guide, you will build a strong foundation in all topics covered on the certification exam, including the Databricks Lakehouse and its tools and benefits. You'll also learn to develop ETL pipelines in both batch and streaming modes. Moreover, you'll discover how to orchestrate data workflows and design dashboards while maintaining data governance. Finally, you'll dive into the finer points of exactly what's on the exam and learn to prepare for it with mock tests. Author Derar Alhussein teaches you not only the fundamental concepts but also provides hands-on exercises to reinforce your understanding. From setting up your Databricks workspace to deploying production pipelines, each chapter is carefully crafted to equip you with the skills needed to master the Databricks Platform. By the end of this book, you'll know everything you need to ace the Databricks Data Engineer Associate certification exam with flying colors, and start your career as a certified data engineer from Databricks! You'll learn how to: Use the Databricks Platform and Delta Lake effectively Perform advanced ETL tasks using Apache Spark SQL Design multi-hop architecture to process data incrementally Build production pipelines using Delta Live Tables and Databricks Jobs Implement data governance using Databricks SQL and Unity Catalog Derar Alhussein is a senior data engineer with a master's degree in data mining. He has over a decade of hands-on experience in software and data projects, including large-scale projects on Databricks. He currently holds eight certifications from Databricks, showcasing his proficiency in the field. Derar is also an experienced instructor, with a proven track record of success in training thousands of data engineers, helping them to develop their skills and obtain professional certifications.

Essential Data Analytics, Data Science, and AI: A Practical Guide for a Data-Driven World

2024-12-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Maxine Attobrah

AI/ML Analytics Cloud Computing Data Analytics Data Science Python Tableau data data-science

In today’s world, understanding data analytics, data science, and artificial intelligence is not just an advantage but a necessity. This book is your thorough guide to learning these innovative fields, designed to make the learning practical and engaging. The book starts by introducing data analytics, data science, and artificial intelligence. It illustrates real-world applications, and, it addresses the ethical considerations tied to AI. It also explores ways to gain data for practice and real-world scenarios, including the concept of synthetic data. Next, it uncovers Extract, Transform, Load (ETL) processes and explains how to implement them using Python. Further, it covers artificial intelligence and the pivotal role played by machine learning models. It explains feature engineering, the distinction between algorithms and models, and how to harness their power to make predictions. Moving forward, it discusses how to assess machine learning models after their creation, with insights into various evaluation techniques. It emphasizes the crucial aspects of model deployment, including the pros and cons of on-device versus cloud-based solutions. It concludes with real-world examples and encourages embracing AI while dispelling fears, and fostering an appreciation for the transformative potential of these technologies. Whether you’re a beginner or an experienced professional, this book offers valuable insights that will expand your horizons in the world of data and AI. What you will learn: What are Synthetic data and Telemetry data How to analyze data using programming languages like Python and Tableau. What is feature engineering What are the practical Implications of Artificial Intelligence Who this book is for: Data analysts, scientists, and engineers seeking to enhance their skills, explore advanced concepts, and stay up-to-date with ethics. Business leaders and decision-makers across industries are interested in understanding the transformative potential and ethical implications of data analytics and AI in their organizations.

Data Engineering with AWS Cookbook

2024-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Viquar Khan , Trâm Ngọc Phạm , Gonzalo Herreros González , Huda Nofal

Analytics Athena AWS Amazon EMR AWS Glue Big Data Cloud Computing Data Engineering Data Lake QuickSight Redshift data +1 more

Data Engineering with AWS Cookbook serves as a comprehensive practical guide for building scalable and efficient data engineering solutions using AWS. With this book, you will master implementing data lakes, orchestrating data pipelines, and creating serving layers using AWS's robust services, such as Glue, EMR, Redshift, and Athena. With hands-on exercises and practical recipes, you will enhance your AWS-based data engineering projects. What this Book will help me do Gain the skills to design centralized data lake solutions and manage them securely at scale. Develop expertise in crafting data pipelines with AWS's ETL technologies like Glue and EMR. Learn to implement and automate governance, orchestration, and monitoring for data platforms. Build high-performance data serving layers using AWS analytics tools like Redshift and QuickSight. Effectively plan and execute data migrations to AWS from on-premises infrastructure. Author(s) Trâm Ngọc Phạm, Gonzalo Herreros González, Viquar Khan, and Huda Nofal bring together years of collective experience in data engineering and AWS cloud solutions. Each author's deep knowledge and passion for cloud technology have shaped this book into a valuable resource, geared towards practical learning and real-world application. Their approach ensures readers are not just learning but building tangible, impactful solutions. Who is it for? This book is geared towards data engineers and big data professionals engaged in or transitioning to cloud-based environments, specifically on AWS. Ideal readers are those looking to optimize workflows and master AWS tools to create scalable, efficient solutions. The content assumes a basic familiarity with AWS concepts like IAM roles and a command-line interface, ensuring all examples are accessible yet meaningful for those seeking advancement in AWS data engineering.

Data Engineering Best Practices

2024-10-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard J. Schiller , David Larochelle

Agile/Scrum AI/ML Analytics Big Data Cloud Computing Data Engineering data data-engineering

Unlock the secrets to building scalable and efficient data architectures with 'Data Engineering Best Practices.' This book provides in-depth guidance on designing, implementing, and optimizing cloud-based data pipelines. You will gain valuable insights into best practices, agile workflows, and future-proof designs. What this Book will help me do Effectively plan and architect scalable data solutions leveraging cloud-first strategies. Master agile processes tailored to data engineering for improved project outcomes. Implement secure, efficient, and reliable data pipelines optimized for analytics and AI. Apply real-world design patterns and avoid common pitfalls in data flow and processing. Create future-ready data engineering solutions following industry-proven frameworks. Author(s) Richard J. Schiller and David Larochelle are seasoned data engineering experts with decades of experience crafting efficient and secure cloud-based infrastructures. Their collaborative writing distills years of real-world expertise into practical advice aimed at helping engineers succeed in a rapidly evolving field. Who is it for? This book is ideal for data engineers, ETL specialists, and big data professionals seeking to enhance their knowledge in cloud-based solutions. Some familiarity with data engineering, ETL pipelines, and big data technologies is helpful. It suits those keen on mastering advanced practices, improving agility, and developing efficient data pipelines. Perfect for anyone looking to future-proof their skills in data engineering.

Data Modeling with Microsoft Power BI

2024-06-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Markus Ehrenmueller-Jensen

BI Data Modelling DAX DWH dimensional modeling Microsoft Power BI SQL business-intelligence data data-science microsoft-power-platform +1 more

Data modeling is the single most overlooked feature in Power BI Desktop, yet it's what sets Power BI apart from other tools on the market. This practical book serves as your fast-forward button for data modeling with Power BI, Analysis Services tabular, and SQL databases. It serves as a starting point for data modeling, as well as a handy refresher. Author Markus Ehrenmueller-Jensen, founder of Savory Data, shows you the basic concepts of Power BI's semantic model with hands-on examples in DAX, Power Query, and T-SQL. If you're looking to build a data warehouse layer, chapters with T-SQL examples will get you started. You'll begin with simple steps and gradually solve more complex problems. This book shows you how to: Normalize and denormalize with DAX, Power Query, and T-SQL Apply best practices for calculations, flags and indicators, time and date, role-playing dimensions and slowly changing dimensions Solve challenges such as binning, budget, localized models, composite models, and key value with DAX, Power Query, and T-SQL Discover and tackle performance issues by applying solutions in DAX, Power Query, and T-SQL Work with tables, relations, set operations, normal forms, dimensional modeling, and ETL

Apache Iceberg: The Definitive Guide

2024-05-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by alex merced (Dremio) , Tomer Shiran (Dremio) , Jason Hughes (Dremio)

AI/ML Analytics Flink Data Lakehouse Dremio Iceberg Spark Data Streaming apache-iceberg data data-engineering data-lake +1 more

Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way. Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg. With this book, you'll learn: The architecture of Apache Iceberg tables What happens under the hood when you perform operations on Iceberg tables How to further optimize Iceberg tables for maximum performance How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.

Azure Data Factory by Example: Practical Implementation for Data Engineers

2024-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Swinbank

Analytics Azure ADF Cloud Computing DWH Microsoft SQL Synapse data data-engineering data-lake storage-repositories

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. This edition, updated for 2024, includes the latest developments to the Azure Data Factory service: Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics. Improvements to flow control provided by activity deactivation and the Fail activity. The introduction of reusable data flow components such as user-defined functions and flowlets. Extensions to integration runtime capabilities including Managed VNet support. The ability to trigger pipelines in response to custom events. Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

Azure Data Factory Cookbook - Second Edition

2024-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

Analytics Azure ADF Cloud Computing Data Engineering Data Lake Databricks Delta DWH Microsoft Fabric Synapse +4 more

This comprehensive guide to Azure Data Factory shows you how to create robust data pipelines and workflows to handle both cloud and on-premises data solutions. Through practical recipes, you will learn to build, manage, and optimize ETL, hybrid ETL, and ELT processes. The book offers detailed explanations to help you integrate technologies like Azure Synapse, Data Lake, and Databricks into your projects. What this Book will help me do Master building and managing data pipelines using Azure Data Factory's latest versions and features. Leverage Azure Synapse and Azure Data Lake for streamlined data integration and analytics workflows. Enhance your ETL/ELT solutions with Microsoft Fabric, Databricks, and Delta tables. Employ debugging tools and workflows in Azure Data Factory to identify and solve data processing issues efficiently. Implement industry-grade best practices for reliable and efficient data orchestration and integration pipelines. Author(s) Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, and Xenia Ireton collectively bring years of expertise in data engineering and cloud-based solutions. They are recognized professionals in the Azure ecosystem, dedicated to sharing their knowledge through detailed and actionable content. Their collaborative approach ensures that this book provides practical insights for technical audiences. Who is it for? This book is ideal for data engineers, ETL developers, and professional architects who work with cloud and hybrid environments. If you're looking to upskill in Azure Data Factory or expand your knowledge into related technologies like Synapse Analytics or Databricks, this is for you. Readers should have a foundational understanding of data warehousing concepts to fully benefit from the material.

Cracking the Data Engineering Interview

2023-11-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kedeisha Bryan , Taamir Ransome

CI/CD Cloud Computing Data Engineering Data Modelling Python Cyber Security SQL data data-engineering

"Cracking the Data Engineering Interview" is your essential guide to mastering the data engineering interview process. This book offers practical insights and techniques to build your resume, refine your skills in Python, SQL, data modeling, and ETL, and confidently tackle over 100 mock interview questions. Gain the knowledge and confidence to land your dream role in data engineering. What this Book will help me do Craft a compelling data engineering portfolio to stand out to employers. Refresh and deepen understanding of essential topics like Python, SQL, and ETL. Master over 100 interview questions that cover both technical and behavioral aspects. Understand data engineering concepts such as data modeling, security, and CI/CD. Develop negotiation, networking, and personal branding skills crucial for job applications. Author(s) None Bryan and None Ransome are seasoned authors with a wealth of experience in data engineering and professional development. Drawing from their extensive industry backgrounds, they provide actionable strategies for aspiring data engineers. Their approachable writing style and real-world insights make complex topics accessible to readers. Who is it for? This book is ideal for aspiring data engineers looking to navigate the job application process effectively. Readers should be familiar with data engineering fundamentals, including Python, SQL, cloud data platforms, and ETL processes. It's tailored for professionals aiming to enhance their portfolios, tackle challenging interviews, and boost their chances of landing a data engineering role.

Fuzzy Data Matching with SQL

2023-10-02 · O'Reilly SQL Books O'Reilly Amazon

book

by Jim Lehmer

Data Quality JSON SQL XML

If you were handed two different but related sets of data, what tools would you use to find the matches? What if all you had was SQL SELECT access to a database? In this practical book, author Jim Lehmer provides best practices, techniques, and tricks to help you import, clean, match, score, and think about heterogeneous data using SQL. DBAs, programmers, business analysts, and data scientists will learn how to identify and remove duplicates, parse strings, extract data from XML and JSON, generate SQL using SQL, regularize data and prepare datasets, and apply data quality and ETL approaches for finding the similarities and differences between various expressions of the same data. Full of real-world techniques, the examples in the book contain working code. You'll learn how to: Identity and remove duplicates in two different datasets using SQL Regularize data and achieve data quality using SQL Extract data from XML and JSON Generate SQL using SQL to increase your productivity Prepare datasets for import, merging, and better analysis using SQL Report results using SQL Apply data quality and ETL approaches to finding similarities and differences between various expressions of the same data

Mastering Tableau 2023 - Fourth Edition

2023-08-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Marleen Meier

AI/ML Analytics BI Data Governance Data Quality DataViz Python Tableau data data-science data-science-tasks data-visualization

This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.

Building a Fast Universal Data Access Platform

2023-08-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christopher Gardner

IoT Cyber Security Tableau data data-engineering

Your company relies on data to succeed—data that traditionally comes from a business's transactional processes, pulled from the transaction systems through an extract-transform-load (ETL) process into a warehouse for reporting purposes. But this data flow is no longer sufficient given the growth of the internet of things (IOT), web commerce, and cybersecurity. How can your company keep up with today's increasing magnitude of data and insights? Organizations that can no longer rely on data generated by business processes are looking outside their workflow for information on customer behavior, retail patterns, and industry trends. In this report, author Christopher Gardner examines the challenges of building a framework that provides universal access to data. You will: Learn the advantages and challenges of universal data access, including data diversity, data volume, and the speed of analytic operations Discover how to build a framework for data diversity and universal access Learn common methods for improving database and performance SLAs Examine the organizational requirements that a fast universal data access platform must meet Explore a case study that demonstrates how components work together to form a multiaccess, high-volume, high-performance interface About the author: Christopher Gardner is the campus Tableau application administrator at the University of Michigan, controlling security, updates, and performance maintenance.

Data Engineering with dbt

2023-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Roberto Zagni

Analytics Cloud Computing Data Engineering dbt Snowflake SQL data data-engineering

Data Engineering with dbt provides a comprehensive guide to building modern, reliable data platforms using dbt and SQL. You'll gain hands-on experience building automated ELT pipelines, using dbt Cloud with Snowflake, and embracing patterns for scalable and maintainable data solutions. What this Book will help me do Set up and manage a dbt Cloud environment and create reliable ELT pipelines. Integrate Snowflake with dbt to implement robust data engineering workflows. Transform raw data into analytics-ready data using dbt's features and SQL. Apply advanced dbt functionality such as macros and Jinja for efficient coding. Ensure data accuracy and platform reliability with built-in testing and monitoring. Author(s) None Zagni is a seasoned data engineering professional with a wealth of experience in designing scalable data platforms. Through practical insights and real-world applications, Zagni demystifies complex data engineering practices. Their approachable teaching style makes technical concepts accessible and actionable. Who is it for? This book is perfect for data engineers, analysts, and analytics engineers looking to leverage dbt for data platform development. If you're a manager or decision maker interested in fostering efficient data workflows or a professional with basic SQL knowledge aiming to deepen your expertise, this resource will be invaluable.

Transitioning to Microsoft Power Platform: An Excel User Guide to Building Integrated Cloud Applications in Power BI, Power Apps, and Power Automate

2023-05-02 · O'Reilly Data Science Books O'Reilly Amazon

book

by David Ding

Analytics BI Cloud Computing CRM Dashboard DataViz Microsoft Power BI SQL business-intelligence data data-science +2 more

Welcome to this step-by-step guide for Excel users, data analysts, and finance specialists. It is designed to take you through practical report and development scenarios, including both the approach and the technical challenges. This book will equip you with an understanding of the overall Power Platform use case for addressing common business challenges. While Power BI continues to be an excellent tool of choice in the BI space, Power Platform is the real game changer. Using an integrated architecture, a small team of citizen developers can build solutions for all kinds of business problems. For small businesses, Power Platform can be used to build bespoke CRM, Finance, and Warehouse management tools. For large businesses, it can be used to build an integration point for existing systems to simplify reporting, operation, and approval processes. The author has drawn on his15 years of hands-on analytics experience to help you pivot from the traditional Excel-based reporting environment. By using different business scenarios, this book provides you with clear reasons why a skill is important before you start to dive into the scenarios. You will use a fast prototyping approach to continue to build exciting reporting, automation, and application solutions and improve them while you acquire new skill sets. The book helps you get started quickly with Power BI. It covers data visualization, collaboration, and governance practices. You will learn about the most practical SQL challenges. And you will learn how to build applications in PowerApps and Power Automate. The book ends with an integrated solution framework that can be adapted to solve a wide range of complex business problems. What You Will Learn Develop reporting solutions and business applications Understand the Power Platform licensing and development environment Apply Data ETL and modeling in Power BI Use Data Storytelling and dashboard design to better visualize data Carry out data operations with SQL and SharePoint lists Develop useful applications using Power Apps Develop automated workflows using Power Automate Integrate solutions with Power BI, Power Apps, and Power Automate to build enterprise solutions Who This Book Is For Next-generation data specialists, including Excel-based users who want to learn Power BI and build internal apps; finance specialists who want to take a different approach to traditional accounting reports; and anyone who wants to enhance their skill set for the future job market.

Serverless ETL and Analytics with AWS Glue

2022-08-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Albert Quiroga , Subramanya Vajiraya , Vishal Pathak , Ishan Gaur , Noritaka Sekiyama (Amazon Web Services (AWS)) , Tomohiro Tanaka

AI/ML Analytics AWS AWS Glue Cloud Computing Data Analytics Data Engineering Data Lake Data Management Amazon SageMaker Cyber Security data +2 more

Discover how to harness AWS Glue for your ETL and data analysis workflows with "Serverless ETL and Analytics with AWS Glue." This comprehensive guide introduces readers to the capabilities of AWS Glue, from building data lakes to performing advanced ETL tasks, allowing you to create efficient, secure, and scalable data pipelines with serverless technology. What this Book will help me do Understand and utilize various AWS Glue features for data lake and ETL pipeline creation. Leverage AWS Glue Studio and DataBrew for intuitive data preparation workflows. Implement effective storage optimization techniques for enhanced data analytics. Apply robust data security measures, including encryption and access control, to protect data. Integrate AWS Glue with machine learning tools like SageMaker to build intelligent models. Author(s) The authors of this book include experts across the fields of data engineering and AWS technologies. With backgrounds in data analytics, software development, and cloud architecture, they bring a depth of practical experience. Their approach combines hands-on tutorials with conceptual clarity, ensuring a blend of foundational knowledge and actionable insights. Who is it for? This book is designed for ETL developers, data engineers, and data analysts who are familiar with data management concepts and want to extend their skills into serverless cloud solutions. If you're looking to master AWS Glue for building scalable and efficient ETL pipelines or are transitioning existing systems to the cloud, this book is ideal for you.

talk-data.com

Activity Trend

Top Events

Top Speakers

The Data Engineer's Guide to Microsoft Fabric

Data Engineering for Multimodal AI

Understanding ETL (Updated Edition)

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Amazon Redshift Cookbook - Second Edition

Databricks Certified Data Engineer Associate Study Guide

Essential Data Analytics, Data Science, and AI: A Practical Guide for a Data-Driven World

Data Engineering with AWS Cookbook

Data Engineering Best Practices

Data Modeling with Microsoft Power BI

Apache Iceberg: The Definitive Guide

Azure Data Factory by Example: Practical Implementation for Data Engineers

Azure Data Factory Cookbook - Second Edition

Cracking the Data Engineering Interview

Fuzzy Data Matching with SQL

Mastering Tableau 2023 - Fourth Edition

Building a Fast Universal Data Access Platform

Data Engineering with dbt

Transitioning to Microsoft Power Platform: An Excel User Guide to Building Integrated Cloud Applications in Power BI, Power Apps, and Power Automate

Serverless ETL and Analytics with AWS Glue