Data Governance

Snowflake: The Definitive Guide, 2nd Edition

2027-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Joyce Kaye Avila

AI/ML Analytics Cloud Computing Data Management GenAI Iceberg Cyber Security Snowflake SQL data data-engineering

Snowflake is reshaping data management by integrating AI, analytics, and enterprise workloads into a single cloud platform. Snowflake: The Definitive Guide is a comprehensive resource for data architects, engineers, and business professionals looking to harness Snowflake's evolving capabilities, including Cortex AI, Snowpark, and Polaris Catalog for Apache Iceberg. This updated edition provides real-world strategies and hands-on activities for optimizing performance, securing data, and building AI-driven applications. With hands-on SQL examples and best practices, this book helps readers process structured and unstructured data, implement scalable architectures, and integrate Snowflake's AI tools seamlessly. Whether you're setting up accounts, managing access controls, or leveraging generative AI, this guide equips you with the expertise to maximize Snowflake's potential. Implement AI-powered workloads with Snowflake Cortex Explore Snowsight and Streamlit for no-code development Ensure security with access control and data governance Optimize storage, queries, and computing costs Design scalable data architectures for analytics and machine learning

Data Engineering for Multimodal AI

2026-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vasundra Srinivasan

AI/ML Cloud Computing Data Engineering ETL/ELT MLOps Cyber Security data data-engineering

A shift is underway in how organizations approach data infrastructure for AI-driven transformation. As multimodal AI systems and applications become increasingly sophisticated and data hungry, data systems must evolve to meet these complex demands. Data Engineering for Multimodal AI is one of the first practical guides for data engineers, machine learning engineers, and MLOps specialists looking to rapidly master the skills needed to build robust, scalable data infrastructures for multimodal AI systems and applications. You'll follow the entire lifecycle of AI-driven data engineering, from conceptualizing data architectures to implementing data pipelines optimized for multimodal learning in both cloud native and on-premises environments. And each chapter includes step-by-step guides and best practices for implementing key concepts. Design and implement cloud native data architectures optimized for multimodal AI workloads Build efficient and scalable ETL processes for preparing diverse AI training data Implement real-time data processing pipelines for multimodal AI inference Develop and manage feature stores that support multiple data modalities Apply data governance and security practices specific to multimodal AI projects Optimize data storage and retrieval for various types of multimodal ML models Integrate data versioning and lineage tracking in multimodal AI workflows Implement data-quality frameworks to ensure reliable outcomes across data types Design data pipelines that support responsible AI practices in a multimodal context

Data Engineering with Azure Databricks

2026-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

AI/ML Airflow Analytics Azure ADF Azure DevOps CI/CD Cloud Computing Data Engineering Data Lakehouse Databricks Delta +11 more

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

Data Contracts in Practice

2026-02-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ryan Collingwood

Data Contracts Data Quality JSON Python SQL YAML data data-engineering

In 'Data Contracts in Practice', Ryan Collingwood provides a detailed guide to managing and formalizing data responsibilities within organizations. Through practical examples and real-world use cases, you'll learn how to systematically address data quality, governance, and integration challenges using data contracts. What this Book will help me do Learn to identify and formalize expectations in data interactions, improving clarity among teams. Master implementation techniques to ensure data consistency and quality across critical business processes. Understand how to effectively document and deploy data contracts to bolster data governance. Explore solutions for proactively addressing and managing data changes and requirements. Gain real-world skills through practical examples using technologies like Python, SQL, JSON, and YAML. Author(s) Ryan Collingwood is a seasoned expert with over 20 years of experience in product management, data analysis, and software development. His holistic techno-social approach, designed to address both technical and organizational challenges, brings a unique perspective to improving data processes. Ryan's writing is informed by his extensive hands-on experience and commitment to enabling robust data ecosystems. Who is it for? This book is ideal for data engineers, software developers, and business analysts working to enhance organizational data integration. Professionals with a familiarity of system design, JSON, and YAML will find it particularly beneficial. Enterprise architects and leadership roles looking to understand data contract implementation and their business impacts will also greatly benefit. Basic understanding of Python and SQL is recommended to maximize learning.

The Definitive Guide to Microsoft Fabric

2025-11-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jean-Pierre Riehl , Romain Casteres (Microsoft) , Emilie BEAU (Microsoft) , Christopher Maneu (Microsoft) , Frederic Gisbert

AI/ML Analytics Data Analytics DataOps DevOps Microsoft Fabric Python Cyber Security SQL analytics-platforms data +2 more

Master Microsoft Fabric from basics to advanced architectures with expert guidance to unify, secure, and scale analytics on real-world data platforms Key Features Build a complete data analytics platform with Microsoft Fabric Apply proven architectures, governance, and security strategies Gain real-world insights from five seasoned data experts Purchase of the print or Kindle book includes a free PDF eBook Book Description Microsoft Fabric is reshaping how organizations manage, analyze, and act on data by unifying ingestion, storage, transformation, analytics, AI, and visualization in a single platform. The Definitive Guide to Microsoft Fabric takes you from your very first workspace to building a secure, scalable, and future-proof analytics environment. You’ll learn how to unify data in OneLake, design data meshes, transform and model data, implement real-time analytics, and integrate AI capabilities. The book also covers advanced topics, such as governance, security, cost optimization, and team collaboration using DevOps and DataOps principles. Drawing on the real-world expertise of five seasoned professionals who have built and advised on platforms for startups, SMEs, and Europe’s largest enterprises, this book blends strategic insight with practical guidance. By the end of this book, you’ll have gained the knowledge and skills to design, deploy, and operate a Microsoft Fabric platform that delivers sustainable business value. What you will learn Understand Microsoft Fabric architecture and concepts Unify data storage and data governance with OneLake Ingest and transform data using multiple Fabric tools Implement real-time analytics and event processing Design effective semantic models and reports Integrate AI and machine learning into data workflows Apply governance, security, and compliance controls Optimize performance and costs at scale Who this book is for This book is for data engineers, analytics engineers, architects, and data analysts moving into platform design roles. It’s also valuable for technical leaders seeking to unify analytics in their organizations. You’ll need only a basic grasp of databases, SQL, and Python.

Data Engineering for Beginners

2025-11-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chisom Nwokwu

Big Data Cloud Computing Data Engineering Data Quality NoSQL Cyber Security data data-engineering

A hands-on technical and industry roadmap for aspiring data engineers In Data Engineering for Beginners, big data expert Chisom Nwokwu delivers a beginner-friendly handbook for everyone interested in the fundamentals of data engineering. Whether you're interested in starting a rewarding, new career as a data analyst, data engineer, or data scientist, or seeking to expand your skillset in an existing engineering role, Nwokwu offers the technical and industry knowledge you need to succeed. The book explains: Database fundamentals, including relational and noSQL databases Data warehouses and data lakes Data pipelines, including info about batch and stream processing Data quality dimensions Data security principles, including data encryption Data governance principles and data framework Big data and distributed systems concepts Data engineering on the cloud Essential skills and tools for data engineering interviews and jobs Data Engineering for Beginners offers an easy-to-read roadmap on a seemingly complicated and intimidating subject. It addresses the topics most likely to cause a beginning data engineer to stumble, clearly explaining key concepts in an accessible way. You'll also find: A comprehensive glossary of data engineering terms Common and practical career paths in the data engineering industry An introduction to key cloud technologies and services you may encounter early in your data engineering career Perfect for practicing and aspiring data analysts, data scientists, and data engineers, Data Engineering for Beginners is an effective and reliable starting point for learning an in-demand skill. It's a powerful resource for everyone hoping to expand their data engineering Skillset and upskill in the big data era.

CompTIA Data+ Study Guide, 2nd Edition

2025-11-04 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sharif Nijim , Mike Chapple

Data Quality Data Science DataViz comptia-data comptia data+ data data-science

Prepare for the CompTIA Data+ exam, as well as a new career in data science, with this effective study guide In the newly revised second edition of CompTIA Data+ Study Guide: Exam DA0-002, veteran IT professionals Mike Chapple and Sharif Nijim provide a powerful, one-stop resource for anyone planning to pursue the CompTIA Data+ certification and go on to an exciting new career in data science. The authors walk you through the info you need to succeed on the exam and in your first day at a data science-focused job. Complete with two online practice tests, this book comprehensively covers every objective tested by the updated DA0-002 exam, including databases and data acquisition, data quality, data analysis and statistics, data visualization, and data governance. You'll also find: Efficient and comprehensive content, helping you get up-to-speed as quickly as possible Bite-size chapters that break down essential topics into manageable and accessible lessons Complimentary access to Sybex' famous online learning environment, with practice questions, a complete glossary of common industry terminology, hundreds of flashcards, and more A practical and hands-on pathway to the CompTIA Data+ certification, as well as a new career in data science, the CompTIA Data+ Study Guide, Second Edition, offers the foundational knowledge, skills, and abilities you need to get started in an exciting and rewarding new career.

AWS Certified Data Engineer Associate Study Guide

2025-08-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anusha Challa (AWS) , Dylan Qu , Sakti Mishra (AWS)

AI/ML AWS Data Engineering Data Quality amazon-web-services-aws aws-certifications aws-certifications-associate-tier aws-certified-data-engineer-associate cloud-computing cloud-platforms it-operations

There's no better time to become a data engineer. And acing the AWS Certified Data Engineer Associate (DEA-C01) exam will help you tackle the demands of modern data engineering and secure your place in the technology-driven future. Authors Sakti Mishra, Dylan Qu, and Anusha Challa equip you with the knowledge and sought-after skills necessary to effectively manage data and excel in your career. Whether you're a data engineer, data analyst, or machine learning engineer, you'll discover in-depth guidance, practical exercises, sample questions, and expert advice you need to leverage AWS services effectively and achieve certification. By reading, you'll learn how to: Ingest, transform, and orchestrate data pipelines effectively Select the ideal data store, design efficient data models, and manage data lifecycles Analyze data rigorously and maintain high data quality standards Implement robust authentication, authorization, and data governance protocols Prepare thoroughly for the DEA-C01 exam with targeted strategies and practices

Building Effective Privacy Programs

2025-08-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jason Edwards , Griffin Weaver

AI/ML Blockchain GDPR/CCPA IoT data data-engineering data-security-privacy data security & privacy

Presents a structured approach to privacy management, an indispensable resource for safeguarding data in an ever-evolving digital landscape In today’s data-driven world, protecting personal information has become a critical priority for organizations of all sizes. Building Effective Privacy Programs: Cybersecurity from Principles to Practice equips professionals with the tools and knowledge to design, implement, and sustain robust privacy programs. Seamlessly integrating foundational principles, advanced privacy concepts, and actionable strategies, this practical guide serves as a detailed roadmap for navigating the complex landscape of data privacy. Bridging the gap between theoretical concepts and practical implementation, Building Effective Privacy Programs combines in-depth analysis with practical insights, offering step-by-step instructions on building privacy-by-design frameworks, conducting privacy impact assessments, and managing compliance with global regulations. In-depth chapters feature real-world case studies and examples that illustrate the application of privacy practices in a variety of scenarios, complemented by discussions of emerging trends such as artificial intelligence, blockchain, IoT, and more. Providing timely and comprehensive coverage of privacy principles, regulatory compliance, and actionable strategies, Building Effective Privacy Programs: Addresses all essential areas of cyberprivacy, from foundational principles to advanced topics Presents detailed analysis of major laws, such as GDPR, CCPA, and HIPAA, and their practical implications Offers strategies to integrate privacy principles into business processes and IT systems Covers industry-specific applications for healthcare, finance, and technology sectors Highlights successful privacy program implementations and lessons learned from enforcement actions Includes glossaries, comparison charts, sample policies, and additional resources for quick reference Written by seasoned professionals with deep expertise in privacy law, cybersecurity, and data protection, Building Effective Privacy Programs: Cybersecurity from Principles to Practice is a vital reference for privacy officers, legal advisors, IT professionals, and business executives responsible for data governance and regulatory compliance. It is also an excellent textbook for advanced courses in cybersecurity, information systems, business law, and business management.

Data Usability in the Enterprise: How Usability Leads to Optimal Digital Experiences

2025-04-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Praveen Gujar (LinkedIn)

Data Quality DataViz Cyber Security data data-engineering

Ensuring data usability is paramount to unlocking a company’s full potential and driving informed decision-making. Part of author Saurav Bhattacharya’s trilogy that covers the essential pillars of digital ecosystems—security, reliability, and usability—this book offers a comprehensive exploration of the fundamental concepts, principles, and practices essential for enhancing data accessibility and effectiveness. You’ll study the core aspects of data design, standardization, and interoperability, gaining the knowledge needed to create and maintain high-quality data environments. By examining the tools and technologies that improve data usability, along with best practices for data visualization and user-centric strategies, this book serves as an invaluable resource for professionals seeking to leverage data more effectively. The book also addresses crucial governance issues, ensuring data quality, integrity, and security are maintained. Through a detailed analysis of data governance frameworks and privacy concerns, you’ll see how to manage data responsibly. Additionally, the book includes compelling case studies that highlight successful data usability implementations, future trends, and the challenges faced in achieving optimal data usability. By fostering a culture of data literacy and usability, this book will help you and your organization navigate the evolving data landscape and harness the power of data for innovation and growth. What You Will Learn Understand the fundamental concepts and importance of data usability, including effective data design, enhancing data accessibility, and ensuring data standardization and interoperability. Review the latest tools and technologies that enhance data usability, best practices for data visualization, and strategies for implementing user-centric data approaches. Ensure data quality and integrity, while navigating data privacy and security concerns. Implement robust data governance frameworks to manage data responsibly and effectively. Who This Book Is For Cybersecurity and IT professionals

Databricks Certified Data Engineer Associate Study Guide

2025-02-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Derar Alhussein (Acadford)

Data Engineering Data Lakehouse Databricks Delta ETL/ELT Spark SQL Data Streaming data data-engineering databricks-data-engineer-associate

Data engineers proficient in Databricks are currently in high demand. As organizations gather more data than ever before, skilled data engineers on platforms like Databricks become critical to business success. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the Databricks platform and its capabilities, as well as the essential skills to effectively execute various data engineering tasks on the platform. In this comprehensive study guide, you will build a strong foundation in all topics covered on the certification exam, including the Databricks Lakehouse and its tools and benefits. You'll also learn to develop ETL pipelines in both batch and streaming modes. Moreover, you'll discover how to orchestrate data workflows and design dashboards while maintaining data governance. Finally, you'll dive into the finer points of exactly what's on the exam and learn to prepare for it with mock tests. Author Derar Alhussein teaches you not only the fundamental concepts but also provides hands-on exercises to reinforce your understanding. From setting up your Databricks workspace to deploying production pipelines, each chapter is carefully crafted to equip you with the skills needed to master the Databricks Platform. By the end of this book, you'll know everything you need to ace the Databricks Data Engineer Associate certification exam with flying colors, and start your career as a certified data engineer from Databricks! You'll learn how to: Use the Databricks Platform and Delta Lake effectively Perform advanced ETL tasks using Apache Spark SQL Design multi-hop architecture to process data incrementally Build production pipelines using Delta Live Tables and Databricks Jobs Implement data governance using Databricks SQL and Unity Catalog Derar Alhussein is a senior data engineer with a master's degree in data mining. He has over a decade of hands-on experience in software and data projects, including large-scale projects on Databricks. He currently holds eight certifications from Databricks, showcasing his proficiency in the field. Derar is also an experienced instructor, with a proven track record of success in training thousands of data engineers, helping them to develop their skills and obtain professional certifications.

Snowflake Recipes: A Problem-Solution Approach to Implementing Modern Data Pipelines

2024-12-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by John Eipe , Dillon Dayton

Agile/Scrum AI/ML AWS Azure BigQuery Cloud Computing GCP Microsoft Python Cyber Security Snowflake SQL +3 more

Explore Snowflake’s core concepts and unique features that differentiates it from industry competitors, such as, Azure Synapse and Google BigQuery. This book provides recipes for architecting and developing modern data pipelines on the Snowflake data platform by employing progressive techniques, agile practices, and repeatable strategies. You’ll walk through step-by-step instructions on ready-to-use recipes covering a wide range of the latest development topics. Then build scalable development pipelines and solve specific scenarios common to all modern data platforms, such as, data masking, object tagging, data monetization, and security best practices. Throughout the book you’ll work with code samples for Amazon Web Services, Microsoft Azure, and Google Cloud Platform. There’s also a chapter devoted to solving machine learning problems with Snowflake. Authors Dillon Dayton and John Eipe are both Snowflake SnowPro Core certified, specializing in data and digital services, and understand the challenges of finding the right solution to complex problems. The recipes in this book are based on real world use cases and examples designed to help you provide quality, performant, and secured data to solve business initiatives. What You’ll Learn Handle structured and un- structured data in Snowflake. Apply best practices and different options for data transformation. Understand data application development. Implement data sharing, data governance and security. Who This book Is For Data engineers, scientists and analysts moving into Snowflake, looking to build data apps. This book expects basic knowledge in Cloud (AWS or Azure or GCP), SQL and Python

Snowflake Data Engineering

2024-12-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Maja Ferle

AI/ML Analytics API CI/CD Cloud Computing Data Analytics Data Engineering ELK Funnel GenAI Microsoft Python +5 more

A practical introduction to data engineering on the powerful Snowflake cloud data platform. Data engineers create the pipelines that ingest raw data, transform it, and funnel it to the analysts and professionals who need it. The Snowflake cloud data platform provides a suite of productivity-focused tools and features that simplify building and maintaining data pipelines. In Snowflake Data Engineering, Snowflake Data Superhero Maja Ferle shows you how to get started. In Snowflake Data Engineering you will learn how to: Ingest data into Snowflake from both cloud and local file systems Transform data using functions, stored procedures, and SQL Orchestrate data pipelines with streams and tasks, and monitor their execution Use Snowpark to run Python code in your pipelines Deploy Snowflake objects and code using continuous integration principles Optimize performance and costs when ingesting data into Snowflake Snowflake Data Engineering reveals how Snowflake makes it easy to work with unstructured data, set up continuous ingestion with Snowpipe, and keep your data safe and secure with best-in-class data governance features. Along the way, you’ll practice the most important data engineering tasks as you work through relevant hands-on examples. Throughout, author Maja Ferle shares design tips drawn from her years of experience to ensure your pipeline follows the best practices of software engineering, security, and data governance. About the Technology Pipelines that ingest and transform raw data are the lifeblood of business analytics, and data engineers rely on Snowflake to help them deliver those pipelines efficiently. Snowflake is a full-service cloud-based platform that handles everything from near-infinite storage, fast elastic compute services, inbuilt AI/ML capabilities like vector search, text-to-SQL, code generation, and more. This book gives you what you need to create effective data pipelines on the Snowflake platform. About the Book Snowflake Data Engineering guides you skill-by-skill through accomplishing on-the-job data engineering tasks using Snowflake. You’ll start by building your first simple pipeline and then expand it by adding increasingly powerful features, including data governance and security, adding CI/CD into your pipelines, and even augmenting data with generative AI. You’ll be amazed how far you can go in just a few short chapters! What's Inside Ingest data from the cloud, APIs, or Snowflake Marketplace Orchestrate data pipelines with streams and tasks Optimize performance and cost About the Reader For software developers and data analysts. Readers should know the basics of SQL and the Cloud. About the Author Maja Ferle is a Snowflake Subject Matter Expert and a Snowflake Data Superhero who holds the SnowPro Advanced Data Engineer and the SnowPro Advanced Data Analyst certifications. Quotes An incredible guide for going from zero to production with Snowflake. - Doyle Turner, Microsoft A must-have if you’re looking to excel in the field of data engineering. - Isabella Renzetti, Data Analytics Consultant & Trainer Masterful! Unlocks the true potential of Snowflake for modern data engineers. - Shankar Narayanan, Microsoft Valuable insights will enhance your data engineering skills and lead to cost-effective solutions. A must read! - Frédéric L’Anglais, Maxa Comprehensive, up-to-date and packed with real-life code examples. - Albert Nogués, Danone

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

2024-10-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jason Yip , Nikhil Gupta

AI/ML Analytics Data Lakehouse Data Science Databricks Delta GenAI LLM RAG Cyber Security SQL data +2 more

This book is your comprehensive guide to building robust Generative AI solutions using the Databricks Data Intelligence Platform. Databricks is the fastest-growing data platform offering unified analytics and AI capabilities within a single governance framework, enabling organizations to streamline their data processing workflows, from ingestion to visualization. Additionally, Databricks provides features to train a high-quality large language model (LLM), whether you are looking for Retrieval-Augmented Generation (RAG) or fine-tuning. Databricks offers a scalable and efficient solution for processing large volumes of both structured and unstructured data, facilitating advanced analytics, machine learning, and real-time processing. In today's GenAI world, Databricks plays a crucial role in empowering organizations to extract value from their data effectively, driving innovation and gaining a competitive edge in the digital age. This book will not only help you master the Data Intelligence Platform but also help power your enterprise to the next level with a bespoke LLM unique to your organization. Beginning with foundational principles, the book starts with a platform overview and explores features and best practices for ingestion, transformation, and storage with Delta Lake. Advanced topics include leveraging Databricks SQL for querying and visualizing large datasets, ensuring data governance and security with Unity Catalog, and deploying machine learning and LLMs using Databricks MLflow for GenAI. Through practical examples, insights, and best practices, this book equips solution architects and data engineers with the knowledge to design and implement scalable data solutions, making it an indispensable resource for modern enterprises. Whether you are new to Databricks and trying to learn a new platform, a seasoned practitioner building data pipelines, data science models, or GenAI applications, or even an executive who wants to communicate the value of Databricks to customers, this book is for you. With its extensive feature and best practice deep dives, it also serves as an excellent reference guide if you are preparing for Databricks certification exams. What You Will Learn Foundational principles of Lakehouse architecture Key features including Unity Catalog, Databricks SQL (DBSQL), and Delta Live Tables Databricks Intelligence Platform and key functionalities Building and deploying GenAI Applications from data ingestion to model serving Databricks pricing, platform security, DBRX, and many more topics Who This Book Is For Solution architects, data engineers, data scientists, Databricks practitioners, and anyone who wants to deploy their Gen AI solutions with the Data Intelligence Platform. This is also a handbook for senior execs who need to communicate the value of Databricks to customers. People who are new to the Databricks Platform and want comprehensive insights will find the book accessible.

Financial Data Engineering

2024-10-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tamer Khraisha

API Data Engineering data data-engineering

Today, investment in financial technology and digital transformation is reshaping the financial landscape and generating many opportunities. Too often, however, engineers and professionals in financial institutions lack a practical and comprehensive understanding of the concepts, problems, techniques, and technologies necessary to build a modern, reliable, and scalable financial data infrastructure. This is where financial data engineering is needed. A data engineer developing a data infrastructure for a financial product possesses not only technical data engineering skills but also a solid understanding of financial domain-specific challenges, methodologies, data ecosystems, providers, formats, technological constraints, identifiers, entities, standards, regulatory requirements, and governance. This book offers a comprehensive, practical, domain-driven approach to financial data engineering, featuring real-world use cases, industry practices, and hands-on projects. You'll learn: The data engineering landscape in the financial sector Specific problems encountered in financial data engineering The structure, players, and particularities of the financial data domain Approaches to designing financial data identification and entity systems Financial data governance frameworks, concepts, and best practices The financial data engineering lifecycle from ingestion to production The varieties and main characteristics of financial data workflows How to build financial data pipelines using open source tools and APIs Tamer Khraisha, PhD, is a senior data engineer and scientific author with more than a decade of experience in the financial sector.

Practical Lakehouse Architecture

2024-07-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Gaurav Ashok Thalpati

AI/ML BI Data Lakehouse Cyber Security data data-engineering data-lake storage-repositories

This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures. Practical Lakehouse Architecture shows you how to: Understand key lakehouse concepts and features like transaction support, time travel, and schema evolution Understand the differences between traditional and lakehouse data architectures Differentiate between various file formats and table formats Design lakehouse architecture layers for storage, compute, metadata management, and data consumption Implement data governance and data security within the platform Evaluate technologies and decide on the best technology stack to implement the lakehouse for your use case Make critical design decisions and address practical challenges to build a future-ready data platform Start your lakehouse implementation journey and migrate data from existing systems to the lakehouse

Data Engineering with Databricks Cookbook

2024-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pulkit Chadha

Big Data Cloud Computing Data Engineering Databricks DataOps Delta DevOps Python Spark SQL Data Streaming data +1 more

In "Data Engineering with Databricks Cookbook," you'll learn how to efficiently build and manage data pipelines using Apache Spark, Delta Lake, and Databricks. This recipe-based guide offers techniques to transform, optimize, and orchestrate your data workflows. What this Book will help me do Master Apache Spark for data ingestion, transformation, and analysis. Learn to optimize data processing and improve query performance with Delta Lake. Manage streaming data processing with Spark Structured Streaming capabilities. Implement DataOps and DevOps workflows tailored for Databricks. Enforce data governance policies using Unity Catalog for scalable solutions. Author(s) Pulkit Chadha, the author of this book, is a Senior Solutions Architect at Databricks. With extensive experience in data engineering and big data applications, he brings practical insights into implementing modern data solutions. His educational writings focus on empowering data professionals with actionable knowledge. Who is it for? This book is ideal for data engineers, data scientists, and analysts who want to deepen their knowledge in managing and transforming large datasets. Readers should have an intermediate understanding of SQL, Python programming, and basic data architecture concepts. It is especially well-suited for professionals working with Databricks or similar cloud-based data platforms.

Data Engineering with Google Cloud Platform - Second Edition

2024-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Adi Wijaya

BigQuery CI/CD Cloud Computing Data Engineering Dataflow Google Dataform GCP Cloud Composer cloud-computing cloud-platforms google-cloud it-operations

Data Engineering with Google Cloud Platform is your ultimate guide to building scalable data platforms using Google Cloud technologies. In this book, you will learn how to leverage products such as BigQuery, Cloud Composer, and Dataplex for efficient data engineering. Expand your expertise and gain practical knowledge to excel in managing data pipelines within the Google Cloud ecosystem. What this Book will help me do Understand foundational data engineering concepts using Google Cloud Platform. Learn to build and manage scalable data pipelines with tools such as Dataform and Dataflow. Explore advanced topics like data governance and secure data handling in Google Cloud. Boost readiness for Google Cloud data engineering certification with real-world exam guidance. Master cost-effective strategies and CI/CD practices for data engineering on Google Cloud. Author(s) Adi Wijaya, the author of this book, is a Data Strategic Cloud Engineer at Google with extensive experience in data engineering and the Google Cloud ecosystem. With his hands-on expertise, he emphasizes practical solutions and in-depth knowledge sharing, guiding readers through the intricacies of Google Cloud for data engineering success. Who is it for? This book is ideal for data analysts, IT practitioners, software engineers, and data enthusiasts aiming to excel in data engineering. Whether you're a beginner tackling fundamental concepts or an experienced professional exploring Google Cloud's advanced capabilities, this book is designed for you. It bridges your current skills with modern data engineering practices on Google Cloud, making it a valuable resource at any stage of your career.

Engineering Data Mesh in Azure Cloud

2024-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Aniruddha Deswandikar

AI/ML Analytics Azure Cloud Computing Data Analytics Data Contracts Microsoft data data-engineering data-mesh database-architecture

Discover how to implement a modern data mesh architecture using Microsoft Azure's Cloud Adoption Framework. In this book, you'll learn the strategies to decentralize data while maintaining strong governance, turning your current analytics struggles into scalable and streamlined processes. Unlock the potential of data mesh to achieve advanced and democratized analytics platforms. What this Book will help me do Learn to decentralize data governance and integrate data domains effectively. Master strategies for building and implementing data contracts suited to your organization's needs. Explore how to design a landing zone for a data mesh using Azure's Cloud Adoption Framework. Understand how to apply key architecture patterns for analytics, including AI and machine learning. Gain the knowledge to scale analytics frameworks using modern cloud-based platforms. Author(s) None Deswandikar is a seasoned data architect with extensive experience in implementing cutting-edge data solutions in the cloud. With a passion for simplifying complex data strategies, None brings real-world customer experiences into practical guidance. This book reflects None's dedication to helping organizations achieve their data goals with clarity and effectiveness. Who is it for? This book is ideal for chief data officers, data architects, and engineers seeking to transform data analytics frameworks to accommodate advanced workloads. Especially useful for professionals aiming to implement cloud-based data mesh solutions, it assumes familiarity with centralized data systems, data lakes, and data integration techniques. If modernizing your organization's data strategy appeals to you, this book is for you.

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

2023-12-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Abhishek Mishra , Anjani Kumar , Sanjeev Kumar (Tesa SE)

AWS Azure BI Big Data Cloud Computing Data Lake Data Lakehouse Delta DWH Pandas Cyber Security Data Streaming +4 more

Design and architect new generation cloud-based data warehouses using Azure and AWS. This book provides an in-depth understanding of how to build modern cloud-native data warehouses, as well as their history and evolution. The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices in the modern world of distributed data processing. The authors walk you through the essential concepts of Data Mesh, Data Lake, Lakehouse, and Delta Lake. And they demonstrate the services and offerings available on Azure and AWS that deal with data orchestration, data democratization, data governance, data security, and business intelligence. After completing this book, you will be ready to design and architect enterprise-grade, cloud-based modern data warehouses using industry best practices and guidelines. What You Will Learn Understand the core concepts underlying modern data warehouses Design and build cloud-native data warehousesGain a practical approach to architecting and building data warehouses on Azure and AWS Implement modern data warehousing components such as Data Mesh, Data Lake, Delta Lake, and Lakehouse Process data through pandas and evaluate your model’s performance using metrics such as F1-score, precision, and recall Apply deep learning to supervised, semi-supervised, and unsupervised anomaly detection tasks for tabular datasets and time series applications Who This Book Is For Experienced developers, cloud architects, and technology enthusiasts looking to build cloud-based modern data warehouses using Azure and AWS

talk-data.com

Activity Trend

Top Events

Top Speakers

Snowflake: The Definitive Guide, 2nd Edition

Data Engineering for Multimodal AI

Data Engineering with Azure Databricks

Data Contracts in Practice

The Definitive Guide to Microsoft Fabric

Data Engineering for Beginners

CompTIA Data+ Study Guide, 2nd Edition

AWS Certified Data Engineer Associate Study Guide

Building Effective Privacy Programs

Data Usability in the Enterprise: How Usability Leads to Optimal Digital Experiences

Databricks Certified Data Engineer Associate Study Guide

Snowflake Recipes: A Problem-Solution Approach to Implementing Modern Data Pipelines

Snowflake Data Engineering

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

Financial Data Engineering

Practical Lakehouse Architecture

Data Engineering with Databricks Cookbook

Data Engineering with Google Cloud Platform - Second Edition

Engineering Data Mesh in Azure Cloud

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS