Analytics

Data Engineering with AWS Cookbook

2024-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Viquar Khan , Trâm Ngọc Phạm , Gonzalo Herreros González , Huda Nofal

Athena AWS Amazon EMR AWS Glue Big Data Cloud Computing Data Engineering Data Lake ETL/ELT QuickSight Redshift data +1 more

Data Engineering with AWS Cookbook serves as a comprehensive practical guide for building scalable and efficient data engineering solutions using AWS. With this book, you will master implementing data lakes, orchestrating data pipelines, and creating serving layers using AWS's robust services, such as Glue, EMR, Redshift, and Athena. With hands-on exercises and practical recipes, you will enhance your AWS-based data engineering projects. What this Book will help me do Gain the skills to design centralized data lake solutions and manage them securely at scale. Develop expertise in crafting data pipelines with AWS's ETL technologies like Glue and EMR. Learn to implement and automate governance, orchestration, and monitoring for data platforms. Build high-performance data serving layers using AWS analytics tools like Redshift and QuickSight. Effectively plan and execute data migrations to AWS from on-premises infrastructure. Author(s) Trâm Ngọc Phạm, Gonzalo Herreros González, Viquar Khan, and Huda Nofal bring together years of collective experience in data engineering and AWS cloud solutions. Each author's deep knowledge and passion for cloud technology have shaped this book into a valuable resource, geared towards practical learning and real-world application. Their approach ensures readers are not just learning but building tangible, impactful solutions. Who is it for? This book is geared towards data engineers and big data professionals engaged in or transitioning to cloud-based environments, specifically on AWS. Ideal readers are those looking to optimize workflows and master AWS tools to create scalable, efficient solutions. The content assumes a basic familiarity with AWS concepts like IAM roles and a command-line interface, ensuring all examples are accessible yet meaningful for those seeking advancement in AWS data engineering.

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

2024-10-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jason Yip , Nikhil Gupta

AI/ML Data Governance Data Lakehouse Data Science Databricks Delta GenAI LLM RAG Cyber Security SQL data +2 more

This book is your comprehensive guide to building robust Generative AI solutions using the Databricks Data Intelligence Platform. Databricks is the fastest-growing data platform offering unified analytics and AI capabilities within a single governance framework, enabling organizations to streamline their data processing workflows, from ingestion to visualization. Additionally, Databricks provides features to train a high-quality large language model (LLM), whether you are looking for Retrieval-Augmented Generation (RAG) or fine-tuning. Databricks offers a scalable and efficient solution for processing large volumes of both structured and unstructured data, facilitating advanced analytics, machine learning, and real-time processing. In today's GenAI world, Databricks plays a crucial role in empowering organizations to extract value from their data effectively, driving innovation and gaining a competitive edge in the digital age. This book will not only help you master the Data Intelligence Platform but also help power your enterprise to the next level with a bespoke LLM unique to your organization. Beginning with foundational principles, the book starts with a platform overview and explores features and best practices for ingestion, transformation, and storage with Delta Lake. Advanced topics include leveraging Databricks SQL for querying and visualizing large datasets, ensuring data governance and security with Unity Catalog, and deploying machine learning and LLMs using Databricks MLflow for GenAI. Through practical examples, insights, and best practices, this book equips solution architects and data engineers with the knowledge to design and implement scalable data solutions, making it an indispensable resource for modern enterprises. Whether you are new to Databricks and trying to learn a new platform, a seasoned practitioner building data pipelines, data science models, or GenAI applications, or even an executive who wants to communicate the value of Databricks to customers, this book is for you. With its extensive feature and best practice deep dives, it also serves as an excellent reference guide if you are preparing for Databricks certification exams. What You Will Learn Foundational principles of Lakehouse architecture Key features including Unity Catalog, Databricks SQL (DBSQL), and Delta Live Tables Databricks Intelligence Platform and key functionalities Building and deploying GenAI Applications from data ingestion to model serving Databricks pricing, platform security, DBRX, and many more topics Who This Book Is For Solution architects, data engineers, data scientists, Databricks practitioners, and anyone who wants to deploy their Gen AI solutions with the Data Intelligence Platform. This is also a handbook for senior execs who need to communicate the value of Databricks to customers. People who are new to the Databricks Platform and want comprehensive insights will find the book accessible.

Data Engineering Best Practices

2024-10-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard J. Schiller , David Larochelle

Agile/Scrum AI/ML Big Data Cloud Computing Data Engineering ETL/ELT data data-engineering

Unlock the secrets to building scalable and efficient data architectures with 'Data Engineering Best Practices.' This book provides in-depth guidance on designing, implementing, and optimizing cloud-based data pipelines. You will gain valuable insights into best practices, agile workflows, and future-proof designs. What this Book will help me do Effectively plan and architect scalable data solutions leveraging cloud-first strategies. Master agile processes tailored to data engineering for improved project outcomes. Implement secure, efficient, and reliable data pipelines optimized for analytics and AI. Apply real-world design patterns and avoid common pitfalls in data flow and processing. Create future-ready data engineering solutions following industry-proven frameworks. Author(s) Richard J. Schiller and David Larochelle are seasoned data engineering experts with decades of experience crafting efficient and secure cloud-based infrastructures. Their collaborative writing distills years of real-world expertise into practical advice aimed at helping engineers succeed in a rapidly evolving field. Who is it for? This book is ideal for data engineers, ETL specialists, and big data professionals seeking to enhance their knowledge in cloud-based solutions. Some familiarity with data engineering, ETL pipelines, and big data technologies is helpful. It suits those keen on mastering advanced practices, improving agility, and developing efficient data pipelines. Perfect for anyone looking to future-proof their skills in data engineering.

In-Memory Analytics with Apache Arrow - Second Edition

2024-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matthew Topol (Voltron Data)

Arrow Dremio DuckDB Parquet Snowflake SQL apache-arrow data data-engineering

Dive into efficient data handling with 'In-Memory Analytics with Apache Arrow.' This book explores Apache Arrow, a powerful open-source project that revolutionizes how tabular and hierarchical data are processed. You'll learn to streamline data pipelines, accelerate analysis, and utilize high-performance tools for data exchange. What this Book will help me do Understand and utilize the Apache Arrow in-memory data format for your data analysis needs. Implement efficient and high-speed data pipelines using Arrow subprojects like Flight SQL and Acero. Enhance integration and performance in analysis workflows by using tools like Parquet and Snowflake with Arrow. Master chaining and reusing computations across languages and environments with Arrow's cross-language support. Apply in real-world scenarios by integrating Apache Arrow with analytics systems like Dremio and DuckDB. Author(s) Matthew Topol, the author of this book, brings 15 years of technical expertise in the realm of data processing and analysis. Having worked across various environments and languages, Matthew offers insights into optimizing workflows using Apache Arrow. His approachable writing style ensures that complex topics are comprehensible. Who is it for? This book is tailored for developers, data engineers, and data scientists eager to enhance their analytic toolset. Whether you're a beginner or have experience in data analysis, you'll find the concepts actionable and transformative. If you are curious about improving the performance and capabilities of your analytic pipelines or tools, this book is for you.

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

2024-09-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pavan Kumar Narayanan

AI/ML Airflow API AWS Azure Cloud Computing Data Analytics Data Engineering Data Quality GCP Kafka Microsoft +9 more

This book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists

Streaming Databases

2024-08-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ralph Matthias Debusmann , Hubert Dulay

Data Streaming data data-engineering streaming-architecture streaming-messaging

Real-time applications are becoming the norm today. But building a model that works properly requires real-time data from the source, in-flight stream processing, and low latency serving of its analytics. With this practical book, data engineers, data architects, and data analysts will learn how to use streaming databases to build real-time solutions. Authors Hubert Dulay and Ralph M. Debusmann take you through streaming database fundamentals, including how these databases reduce infrastructure for real-time solutions. You'll learn the difference between streaming databases, stream processing, and real-time online analytical processing (OLAP) databases. And you'll discover when to use push queries versus pull queries, and how to serve synchronous and asynchronous data emanating from streaming databases. This guide helps you: Explore stream processing and streaming databases Learn how to build a real-time solution with a streaming database Understand how to construct materialized views from any number of streams Learn how to serve synchronous and asynchronous data Get started building low-complexity streaming solutions with minimal setup

Elastic Stack 8.x Cookbook

2024-06-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Yazid Akadiri , Huage Chen

AI/ML Data Engineering ELK Kibana Logstash NLP Cyber Security data data-engineering elastic-stack-elk-stack elastic stack (elk stack) elasticsearch +1 more

Unlock the potential of the Elastic Stack with the "Elastic Stack 8.x Cookbook." This book provides over 80 hands-on recipes, guiding you through ingesting, processing, and visualizing data using Elasticsearch, Logstash, Kibana, and more. You'll also explore advanced features like machine learning and observability to create data-driven applications with ease. What this Book will help me do Implement a robust workflow for ingesting, transforming, and visualizing diverse datasets. Utilize Kibana to create insightful dashboards and visual analytics. Leverage Elastic Stack's AI capabilities, such as natural language processing and machine learning. Develop search solutions and integrate advanced features like vector search. Monitor and optimize your Elastic Stack deployments for performance and security. Author(s) Huage Chen and Yazid Akadiri are experienced professionals in the field of Elastic Stack. They bring years of practical experience in data engineering, observability, and software development. Huage and Yazid aim to provide a clear, practical pathway for both beginners and experienced users to get the most out of the Elastic Stack's capabilities. Who is it for? This book is perfect for developers, data engineers, and observability practitioners looking to harness the power of Elastic Stack. It caters to both beginners and experts, providing clear instructions to help readers understand and implement powerful data solutions. If you're working with search applications, data analysis, or system observability, this book is an ideal resource.

Databricks Certified Associate Developer for Apache Spark Using Python

2024-06-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Saba Shah

API Big Data Data Engineering Data Science Databricks Python Spark SQL Data Streaming apache-spark data data-engineering

This book serves as the ultimate preparation for aspiring Databricks Certified Associate Developers specializing in Apache Spark. Deep dive into Spark's components, its applications, and exam techniques to achieve certification and expand your practical skills in big data processing and real-time analytics using Python. What this Book will help me do Deeply understand Apache Spark's core architecture for building big data applications. Write optimized SQL queries and leverage Spark DataFrame API for efficient data manipulation. Apply advanced Spark functions, including UDFs, to solve complex data engineering tasks. Use Spark Streaming capabilities to implement real-time and near-real-time processing solutions. Get hands-on preparation for the certification exam with mock tests and practice questions. Author(s) Saba Shah is a seasoned data engineer with extensive experience working at Databricks and leading data science teams. With her in-depth knowledge of big data applications and Spark, she delivers clear, actionable insights in this book. Her approach emphasizes practical learning and real-world applications. Who is it for? This book is ideal for data professionals such as engineers and analysts aiming to achieve Databricks certification. It is particularly helpful for individuals with moderate Python proficiency who are keen to understand Spark from scratch. If you're transitioning into big data roles, this guide prepares you comprehensively.

IBM z14 (3906) Technical Guide

2024-05-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Octavian Lascu

Cloud Computing IBM Cyber Security data data-engineering

This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™. IBM z14 is the trusted enterprise platform for pervasive encryption, integrating data, transactions, and insights into the data. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 servers to deliver a record level of capacity over the prior IBM Z platforms. In its maximum configuration, z14 is powered by up to 170 client characterizable microprocessors (cores) running at 5.2 GHz. This configuration can run more than 146,000 million instructions per second (MIPS) and up to 32 TB of client memory. The IBM z14 Model M05 is estimated to provide up to 35% more total system capacity than the IBM z13® Model NE1. This Redbooks publication provides information about IBM z14 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with existing IBM Z technology and terminology.

IBM z14 ZR1 Technical Guide

2024-05-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Packheiser , John Troy , Bill White , Octavian Lascu , Hervey Kamga , Martijn Raave

Cloud Computing IBM Cyber Security data data-engineering

This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™ Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, in an industry standard footprint. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 ZR1 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 ZR1 servers to deliver a record level of capacity over the previous IBM Z platforms. In its maximum configuration, z14 ZR1 is powered by up to 30 client characterizable microprocessors (cores) running at 4.5 GHz. This configuration can run more than 29,000 million instructions per second and up to 8 TB of client memory. The IBM z14 Model ZR1 is estimated to provide up to 54% more total system capacity than the IBM z13s® Model N20. This Redbooks publication provides information about IBM z14 ZR1 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with IBM Z technology and terminology.

IBM z15 (8561) Technical Guide

2024-05-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Packheiser , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga , Bo XU

Agile/Scrum Cloud Computing IBM Cyber Security data data-engineering

This IBM® Redbooks® publication describes the features and functions the latest member of the IBM Z® platform, the IBM z15™ (machine type 8561). It includes information about the IBM z15 processor design, I/O innovations, security features, and supported operating systems. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Making use of multicloud integration services Securing data with pervasive encryption Accelerating digital transformation with agile service delivery Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM z15 (8562) Technical Guide

2024-05-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Octavian Lascu

Agile/Scrum Cloud Computing IBM Cyber Security data data-engineering

This IBM® Redbooks® publication describes the features and functions the latest member of the IBM Z® platform, the IBM z15™ Model T02 (machine type 8562). It includes information about the IBM z15 processor design, I/O innovations, security features, and supported operating systems. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Making use of multicloud integration services Securing data with pervasive encryption Accelerating digital transformation with agile service delivery Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

Apache Iceberg: The Definitive Guide

2024-05-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by alex merced (Dremio) , Tomer Shiran (Dremio) , Jason Hughes (Dremio)

AI/ML Flink Data Lakehouse Dremio ETL/ELT Iceberg Spark Data Streaming apache-iceberg data data-engineering data-lake +1 more

Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way. Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg. With this book, you'll learn: The architecture of Apache Iceberg tables What happens under the hood when you perform operations on Iceberg tables How to further optimize Iceberg tables for maximum performance How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.

Natural Language and Search

2024-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Milind Shyani , Karen Kilroy , Jon Handler (AWS)

AI/ML AWS Blockchain CRM Data Lake data data-engineering search

When you look at operational analytics and business data analysis activities—such as log analytics, real-time application monitoring, website search, observability, and more—effective search functionality is key to identifying issues, improving customers experience, and increasing operational effectiveness. How can you support your business needs by leveraging ML-driven advancements in search relevance? In this report, authors Jon Handler, Milind Shyani, Karen Kilroy help executives and data scientists explore how ML can enable ecommerce firms to generate more pertinent search results to drive better sales. You'll learn how personalized search helps you quickly find relevant data within applications, websites, and data lake catalogs. You'll also discover how to locate the content available in CRM systems and document stores. This report helps you: Address the challenges of traditional document search, including data preparation and ingestion Leverage ML techniques to improve search outcomes and the relevance of documents you retrieve Discover what makes a good search solution that's reliable, scalable, and can drive your business forward Learn how to choose a search solution to improve your decision-making process With advancements in ML-driven search, businesses can realize even more benefits and improvements in their data and document search capabilities to better support their own business needs and the needs of their customers. About the authors: Jon Handler is a senior principal solutions architect at Amazon Web Services. Milind Shyani is an applied scientist at Amazon Web Services working on large language models, information retrieval and machine learning algorithms. Karen Kilroy, CEO of Kilroy Blockchain, is a lifelong technologist, full stack software engineer, speaker, and author living in Northwest Arkansas.

Engineering Data Mesh in Azure Cloud

2024-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Aniruddha Deswandikar

AI/ML Azure Cloud Computing Data Analytics Data Contracts Data Governance Microsoft data data-engineering data-mesh database-architecture

Discover how to implement a modern data mesh architecture using Microsoft Azure's Cloud Adoption Framework. In this book, you'll learn the strategies to decentralize data while maintaining strong governance, turning your current analytics struggles into scalable and streamlined processes. Unlock the potential of data mesh to achieve advanced and democratized analytics platforms. What this Book will help me do Learn to decentralize data governance and integrate data domains effectively. Master strategies for building and implementing data contracts suited to your organization's needs. Explore how to design a landing zone for a data mesh using Azure's Cloud Adoption Framework. Understand how to apply key architecture patterns for analytics, including AI and machine learning. Gain the knowledge to scale analytics frameworks using modern cloud-based platforms. Author(s) None Deswandikar is a seasoned data architect with extensive experience in implementing cutting-edge data solutions in the cloud. With a passion for simplifying complex data strategies, None brings real-world customer experiences into practical guidance. This book reflects None's dedication to helping organizations achieve their data goals with clarity and effectiveness. Who is it for? This book is ideal for chief data officers, data architects, and engineers seeking to transform data analytics frameworks to accommodate advanced workloads. Especially useful for professionals aiming to implement cloud-based data mesh solutions, it assumes familiarity with centralized data systems, data lakes, and data integration techniques. If modernizing your organization's data strategy appeals to you, this book is for you.

Azure Data Factory by Example: Practical Implementation for Data Engineers

2024-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Swinbank

Azure ADF Cloud Computing DWH ETL/ELT Microsoft SQL Synapse data data-engineering data-lake storage-repositories

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. This edition, updated for 2024, includes the latest developments to the Azure Data Factory service: Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics. Improvements to flow control provided by activity deactivation and the Fail activity. The introduction of reusable data flow components such as user-defined functions and flowlets. Extensions to integration runtime capabilities including Managed VNet support. The ability to trigger pipelines in response to custom events. Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

Azure Data Factory Cookbook - Second Edition

2024-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

Azure ADF Cloud Computing Data Engineering Data Lake Databricks Delta DWH ETL/ELT Microsoft Fabric Synapse +4 more

This comprehensive guide to Azure Data Factory shows you how to create robust data pipelines and workflows to handle both cloud and on-premises data solutions. Through practical recipes, you will learn to build, manage, and optimize ETL, hybrid ETL, and ELT processes. The book offers detailed explanations to help you integrate technologies like Azure Synapse, Data Lake, and Databricks into your projects. What this Book will help me do Master building and managing data pipelines using Azure Data Factory's latest versions and features. Leverage Azure Synapse and Azure Data Lake for streamlined data integration and analytics workflows. Enhance your ETL/ELT solutions with Microsoft Fabric, Databricks, and Delta tables. Employ debugging tools and workflows in Azure Data Factory to identify and solve data processing issues efficiently. Implement industry-grade best practices for reliable and efficient data orchestration and integration pipelines. Author(s) Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, and Xenia Ireton collectively bring years of expertise in data engineering and cloud-based solutions. They are recognized professionals in the Azure ecosystem, dedicated to sharing their knowledge through detailed and actionable content. Their collaborative approach ensures that this book provides practical insights for technical audiences. Who is it for? This book is ideal for data engineers, ETL developers, and professional architects who work with cloud and hybrid environments. If you're looking to upskill in Azure Data Factory or expand your knowledge into related technologies like Synapse Analytics or Databricks, this is for you. Readers should have a foundational understanding of data warehousing concepts to fully benefit from the material.

Data Observability for Data Engineering

2023-12-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michele Pinto , Sammy El Khammal

Data Engineering Data Quality Python data data-engineering

"Data Observability for Data Engineering" introduces you to the foundational concepts of observing and validating data pipeline health. With real-world projects and Python code examples, you'll gain hands-on experience in improving data quality and minimizing risks, enabling you to implement strategies that ensure accuracy and reliability in your data systems. What this Book will help me do Master data observability techniques to monitor and validate data pipelines effectively. Learn to collect and analyze meaningful metrics to gauge and improve data quality. Develop skills in Python programming specific to applying data concepts such as observable data state. Address scalability challenges using state-of-the-art observability frameworks and practices. Enhance your ability to manage and optimize data workflows ensuring seamless operation from start to end. Author(s) Authors Michele Pinto and Sammy El Khammal bring a wealth of experience in data engineering and observing scalable data systems. Pinto specializes in constructing robust analytics platforms while Khammal offers insights into integrating software observability into massive pipelines. Their collaborative writing style ensures readers find both practical advice and theoretical foundations. Who is it for? This book is geared toward data engineers, architects, and scientists who seek to confidently handle pipeline challenges. Whether you're addressing specific issues or wish to introduce proactive measures in your team, this guide meets the needs of those ready to leverage observability as a key practice.

Elasticsearch in Action, Second Edition

2023-12-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Madhusudhan Konda

AI/ML API ELK Kibana data data-engineering elasticsearch search

Build powerful, production-ready search applications using the incredible features of Elasticsearch. In Elasticsearch in Action, Second Edition you will discover: Architecture, concepts, and fundamentals of Elasticsearch Installing, configuring, and running Elasticsearch and Kibana Creating an index with custom settings Data types, mapping fundamentals, and templates Fundamentals of text analysis and working with text analyzers Indexing, deleting, and updating documents Indexing data in bulk, and reindexing and aliasing operations Learning search concepts, relevancy scores, and similarity algorithms Elasticsearch in Action, Second Edition teaches you to build scalable search applications using Elasticsearch. This completely new edition explores Elasticsearch fundamentals from the ground up. You’ll deep dive into design principles, search architectures, and Elasticsearch’s essential APIs. Every chapter is clearly illustrated with diagrams and hands-on examples. You’ll even explore real-world use cases for full text search, data visualizations, and machine learning. Plus, its comprehensive nature means you’ll keep coming back to the book as a handy reference! About the Technology Create fully professional-grade search engines with Elasticsearch and Kibana! Rewritten for the latest version of Elasticsearch, this practical book explores Elasticsearch’s high-level architecture, reveals infrastructure patterns, and walks through the search and analytics capabilities of numerous Elasticsearch APIs. About the Book Elasticsearch in Action, Second Edition teaches you how to add modern search features to websites and applications using Elasticsearch 8. In it, you’ll quickly progress from the basics of installation and configuring clusters, to indexing documents, advanced aggregations, and putting your servers into production. You’ll especially appreciate the mix of technical detail with techniques for designing great search experiences. What's Inside Understanding search architecture Full text and term-level search queries Analytics and aggregations High-level visualizations in Kibana Configure, scale, and tune clusters About the Reader For application developers comfortable with scripting and command-line applications. About the Author Madhusudhan Konda is a full-stack lead engineer, architect, mentor, and conference speaker. He delivers live online training on Elasticsearch and the Elastic Stack. Quotes Madhu’s passion comes across in the depth and breadth of this book, the enthusiastic tone, and the hands-on examples. I hope you will take what you have read and put it ‘in action’. - From the Foreword by Shay Banon, Founder of Elasticsearch Practical and well-written. A great starting point for beginners and a comprehensive guide for more experienced professionals. - Simona Russo, Serendipity The author’s excitement is evident from the first few paragraphs. Couple that with extensive experience and technical prowess, and you have an instant classic. - Herodotos Koukkides and Semi Koen, Global Japanese Financial Institution

SAP S/4HANA Asset Management: Configure, Equip, and Manage your Enterprise

2023-11-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chandan Mohan Jaiswal , Rajesh Ojha

SAP data data-engineering

S/4HANA empowers enterprises to take big steps towards digitalization, innovation, and being mobile-friendly. This book is a concise guide to SAP S/4HANA Asset Management and will help you begin leveraging the platform’s capabilities quickly and efficiently. SAP S/4HANA Asset Management begins with an overview of the platform and its structure. You will learn how it can help with data storage and analysis, business processes, and reporting and analytics. As the book progresses, you will gain insight into single, time-based, performance-based, and multiple counter-based strategy plans. Any project is incomplete without a budget, and this book will help you understand how to use SAP S/4HANA to create and manage yours. The book’s real-life examples of asset management from contemporary industries reinforce each concept you learn, and its coverage of newer technologies and offerings in S/4HANA Asset Management will give you a sense of the immense potential offered by the platform. When you have finished this book, you will be ready to begin using SAP/S4HANA Asset Management to improve operational planning, maintenance, and scheduling activities in your own business. What You Will Learn Position S/4HANA Asset Management within the overall Business Applications suite Explore essential functionalities for enterprise asset hierarchy mapping Efficiently map both unplanned and planned maintenance activities Seamlessly integrate asset management, finance, controlling, and budgeting Unleash reporting and analytics in Asset Management Configure Asset Management to meet your S/4HANA requirements Who This Book Is For Consultants, project managers, and SAP users who are looking for a complete reference guide on S/4HANA Asset Management.

talk-data.com

Activity Trend

Top Events

Top Speakers

Data Engineering with AWS Cookbook

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

Data Engineering Best Practices

In-Memory Analytics with Apache Arrow - Second Edition

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

Streaming Databases

Elastic Stack 8.x Cookbook

Databricks Certified Associate Developer for Apache Spark Using Python

IBM z14 (3906) Technical Guide

IBM z14 ZR1 Technical Guide

IBM z15 (8561) Technical Guide

IBM z15 (8562) Technical Guide

Apache Iceberg: The Definitive Guide

Natural Language and Search

Engineering Data Mesh in Azure Cloud

Azure Data Factory by Example: Practical Implementation for Data Engineers

Azure Data Factory Cookbook - Second Edition

Data Observability for Data Engineering

Elasticsearch in Action, Second Edition

SAP S/4HANA Asset Management: Configure, Equip, and Manage your Enterprise