O'Reilly Data Engineering Books

Data Engineering with dbt

2023-06-30 O'Reilly Amazon

book

Roberto Zagni

data data-engineering Analytics Cloud Computing Data Engineering dbt

Data Engineering with dbt provides a comprehensive guide to building modern, reliable data platforms using dbt and SQL. You'll gain hands-on experience building automated ELT pipelines, using dbt Cloud with Snowflake, and embracing patterns for scalable and maintainable data solutions. What this Book will help me do Set up and manage a dbt Cloud environment and create reliable ELT pipelines. Integrate Snowflake with dbt to implement robust data engineering workflows. Transform raw data into analytics-ready data using dbt's features and SQL. Apply advanced dbt functionality such as macros and Jinja for efficient coding. Ensure data accuracy and platform reliability with built-in testing and monitoring. Author(s) None Zagni is a seasoned data engineering professional with a wealth of experience in designing scalable data platforms. Through practical insights and real-world applications, Zagni demystifies complex data engineering practices. Their approachable teaching style makes technical concepts accessible and actionable. Who is it for? This book is perfect for data engineers, analysts, and analytics engineers looking to leverage dbt for data platform development. If you're a manager or decision maker interested in fostering efficient data workflows or a professional with basic SQL knowledge aiming to deepen your expertise, this resource will be invaluable.

Geospatial Data Analytics on AWS

2023-06-30 O'Reilly Amazon

book

Jeff DeMuth , Janahan Gnanachandran , Scott Bateman

data data-engineering location-data geographic-information-system-gis geographic information system (gis) AI/ML

In "Geospatial Data Analytics on AWS," you will learn how to store, manage, and analyze geospatial data effectively using various AWS services. This book provides insight into building geospatial data lakes, leveraging AWS databases, and applying best practices to derive insights from spatial data in the cloud. What this Book will help me do Design and manage geospatial data lakes on AWS leveraging S3 and other storage solutions. Analyze geospatial data using AWS services such as Athena and Redshift. Utilize machine learning models for geospatial data processing and analytics using SageMaker. Visualize geospatial data through services like Amazon QuickSight and OpenStreetMap integration. Avoid common pitfalls when managing geospatial data in the cloud. Author(s) Scott Bateman, Janahan Gnanachandran, and Jeff DeMuth bring their extensive experience in cloud computing and geospatial analytics to this book. With backgrounds in cloud architecture, data science, and geospatial applications, they aim to make complex topics accessible. Their collaborative approach ensures readers can practically apply concepts to real-world challenges. Who is it for? This book is ideal for GIS and data professionals, including developers, analysts, and scientists. It suits readers with a basic understanding of geographical concepts but no prior AWS experience. If you're aiming to enhance your cloud-based geospatial data management and analytics skills, this is the guide for you.

Data for All

2023-06-29 O'Reilly Amazon

book

John K. Thompson

data data-engineering AI/ML Analytics Data Science

Do you know what happens to your personal data when you are browsing, buying, or using apps? Discover how your data is harvested and exploited, and what you can do to access, delete, and monetize it. Data for All empowers everyone—from tech experts to the general public—to control how third parties use personal data. Read this eye-opening book to learn: The types of data you generate with every action, every day Where your data is stored, who controls it, and how much money they make from it How you can manage access and monetization of your own data Restricting data access to only companies and organizations you want to support The history of how we think about data, and why that is changing The new data ecosystem being built right now for your benefit The data you generate every day is the lifeblood of many large companies—and they make billions of dollars using it. In Data for All, bestselling author John K. Thompson outlines how this one-sided data economy is about to undergo a dramatic change. Thompson pulls back the curtain to reveal the true nature of data ownership, and how you can turn your data from a revenue stream for companies into a financial asset for your benefit. About the Technology Do you know what happens to your personal data when you’re browsing and buying? New global laws are turning the tide on companies who make billions from your clicks, searches, and likes. This eye-opening book provides an inspiring vision of how you can take back control of the data you generate every day. About the Book Data for All gives you a step-by-step plan to transform your relationship with data and start earning a “data dividend”—hundreds or thousands of dollars paid out simply for your online activities. You’ll learn how to oversee who accesses your data, how much different types of data are worth, and how to keep private details private. What's Inside The types of data you generate with every action, every day How you can manage access and monetization of your own data The history of how we think about data, and why that is changing The new data ecosystem being built right now for your benefit About the Reader For anyone who is curious or concerned about how their data is used. No technical knowledge required. About the Author John K. Thompson is an international technology executive with over 37 years of experience in the fields of data, advanced analytics, and artificial intelligence. Quotes An honest, direct, pull-no-punches source on one of the most important personal issues of our time....I changed some of my own behaviors after reading the book, and I suggest you do so as well. You have more to lose than you may think. - From the Foreword by Thomas H. Davenport, author of Competing on Analytics and The AI Advantage A must-read for anyone interested in the future of data. It helped me understand the reasons behind the current data ecosystem and the laws that are shaping its future. A great resource for both professionals and individuals. I highly recommend it. - Ravit Jain, Founder & Host of The Ravit Show, Data Science Evangelist

IBM Power System AC922 Technical Overview and Introduction

2023-05-02 O'Reilly Amazon

book

Gustavo Santos , Scott Vetter , Ritesh Nohria , Volker Haug

data data-engineering IBM ibm-power-systems AI/ML Analytics

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power System AC922 server (8335-GTH and 8335-GTX models). The Power AC922 server is the next generation of the IBM POWER® processor-based systems, which are designed for deep learning (DL) and artificial intelligence (AI), high-performance analytics, and high-performance computing (HPC). This paper introduces the major innovative Power AC922 server features and their relevant functions: Powerful IBM POWER9™ processors that offer up to 22 cores at up to 2.80 GHz (3.10 GHz turbo) performance with up to 2 TB of memory. IBM Coherent Accelerator Processor Interface (CAPI) 2.0, IBM OpenCAPI™, and second-generation NVIDIA NVLink 2.0 technology for exceptional processor to accelerator intercommunication. Up to six dedicated NVIDIA Tesla V100 graphics processing units (GPUs). This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products and is intended for the following audiences: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power AC922 server. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

Automating Data Transformations

2023-04-25 O'Reilly Amazon

book

Satish Jayanthi , Armon Petrossian

data data-engineering Analytics Cloud Computing Modern Data Stack

The modern data stack has evolved rapidly in the past decade. Yet, as enterprises migrate vast amounts of data from on-premises platforms to the cloud, data teams continue to face limitations executing data transformation at scale. Data transformation is an integral part of the analytics workflow--but it's also the most time-consuming, expensive, and error-prone part of the process. In this report, Satish Jayanthi and Armon Petrossian examine key concepts that will enable you to automate data transformation at scale. IT decision makers, CTOs, and data team leaders will explore ways to democratize data transformation by shifting from activity-oriented to outcome-oriented teams--from manufacturing-line assembly to an approach that lets even junior analysts implement data with only a brief code review. With this insightful report, you will: Learn how successful data systems rely on simplicity, flexibility, user-friendliness, and a metadata-first approach Adopt a product-first mindset (data as a product, or DaaP) for developing data resources that focus on discoverability, understanding, trust, and exploration Build a transformation platform that delivers the most value, using a column-first approach Use data architecture as a service (DAaaS) to help teams build and maintain their own data infrastructure as they work collaboratively About the authors: Armon Petrossian is CEO and cofounder of Coalesce. Previously, he was part of the founding team at WhereScape in North America, where he served as national sales manager for almost a decade. Satish Jayanthi is CTO and cofounder of Coalesce. Prior to that, he was senior solutions architect at WhereScape, where he met his cofounder Armon.

IBM FlashSystem 7300 Product Guide

2023-04-24 O'Reilly Amazon

book

Shu Mookerjee , Konrad Trojok , Vasfi Gucer , Jon Herd , Hartmut Lonzer , Carsten Larsen , Douwe van Terwisga , Kendall Williams , Corne Lottering

data data-engineering IBM AI/ML Analytics Big Data

This IBM® Redpaper Product Guide describes the IBM FlashSystem® 7300 solution, which is a next-generation IBM FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Spectrum® Virtualize. To take advantage of artificial intelligence (AI)-enhanced applications, real-time big data analytics, and cloud architectures that require higher levels of system performance and storage capacity, enterprises around the globe are rapidly moving to modernize established IT infrastructures. However, for many organizations, staff resources, and expertise are limited, and cost-efficiency is a top priority. These organizations have important investments in existing infrastructure that they want to maximize. They need enterprise-grade solutions that optimize cost-efficiency while simplifying the pathway to modernization. IBM FlashSystem 7300 is designed specifically for these requirements and use cases. It also delivers a cyber resilience without compromising application performance. IBM FlashSystem 7300 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering (TCT) IBM HyperSwap® including 3-site replication for high availability Scale-out and scale-up configurations further enhance capacity and throughput for better availability With the release of IBM Spectrum Virtualize V8.5, extra functions and features are available, including support for new third-generation IBM FlashCore Modules Non-Volatile Memory Express (NVMe) type drives within the control enclosure, and 100 Gbps Ethernet adapters that provide NVMe Remote Direct Memory Access (RDMA) options. New software features include GUI enhancements, security enhancements including multifactor authentication and single sign-on, and Fibre Channel (FC) portsets.

Snowflake SnowPro™ Advanced Architect Certification Companion: Hands-on Preparation and Practice

2023-04-13 O'Reilly Amazon

book

Ruchi Soni

data data-engineering Snowflake Analytics Data Engineering DWH

Master the intricacies of Snowflake and prepare for the SnowPro Advanced Architect Certification exam with this comprehensive study companion. This book provides robust and effective study tools to help you prepare for the exam and is also designed for those who are interested in learning the advanced features of Snowflake. The practical examples and in-depth background on theory in this book help you unleash the power of Snowflake in building a high-performance system. The best practices demonstrated in the book help you use Snowflake more powerfully and effectively as a data warehousing and analytics platform. Reading this book and reviewing the concepts will help you gain the knowledge you need to take the exam. The book guides you through a study of the different domains covered on the exam: Accounts and Security, Snowflake Architecture, Data Engineering, and Performance Optimization. You’ll also be well positioned to apply your newly acquired practical skills to real-world Snowflake solutions. You will have a deep understanding of Snowflake to help you take full advantage of Snowflake’s architecture to deliver value analytics insight to your business. What You Will Learn Gain the knowledge you need to prepare for the exam Review in-depth theory on Snowflake to help you build high-performance systems Broaden your skills as a data warehouse designer to cover the Snowflake ecosystem Optimize performance and costs associated with your use of the Snowflake data platform Share data securely both inside your organization and with external partners Apply your practical skills to real-world Snowflake solutions Who This Book Is For Anyone who is planning to take the SnowPro Advanced Architect Certification exam, those who want to move beyond traditional database technologies and build their skills to design and architect solutions using Snowflake services, and veteran database professionals seeking an on-the-job reference to understand one of the newest and fastest-growing technologies in data

What Every Engineer Should Know About Data-Driven Analytics

2023-04-13 O'Reilly Amazon

book

Phillip A. Laplante , Satish Mahadevan Srinivasan

data data-science business-intelligence prescriptive-analytics AI/ML Analytics

What Every Engineer Should Know About Data-Driven Analytics provides a comprehensive introduction to the machine learning theoretical concepts and approaches that are used in predictive data analytics through practical applications and case studies.

Principles of Data Fabric

2023-04-06 O'Reilly Amazon

book

Sonia Mezzetta

data data-engineering database-architecture data-mesh Analytics Data Governance

In "Principles of Data Fabric," you will gain a comprehensive understanding of Data Fabric solutions and architectures. This book provides a clear picture of how to design, implement, and optimize Data Fabric solutions to tackle complex data challenges. By the end, you'll be equipped with the knowledge to unify and leverage your organizational data efficiently. What this Book will help me do Design and architect Data Fabric solutions tailored to specific organizational needs. Learn to integrate Data Fabric with DataOps and Data Mesh for holistic data management. Master the principles of Data Governance and Self-Service analytics within the Data Fabric. Implement best practices for distributed data management and regulatory compliance. Apply industry insights and frameworks to optimize Data Fabric deployment. Author(s) Sonia Mezzetta, the author of "Principles of Data Fabric," is an experienced data professional with a deep understanding of data management frameworks and architectures like Data Fabric, Data Mesh, and DataOps. With years of industry expertise, Sonia has helped organizations implement effective data strategies. Her writing combines technical know-how with an approachable style to enlighten and guide readers on their data journey. Who is it for? This book is ideal for data engineers, data architects, and business analysts who seek to understand and implement Data Fabric solutions. It will also appeal to senior data professionals like Chief Data Officers aiming to integrate Data Fabric into their enterprises. Novice to intermediate knowledge of data management would be beneficial for readers. The content provides clear pathways to achieve actionable results in data strategies.

Building Real-Time Analytics Applications

2023-02-25 O'Reilly Amazon

book

Darin Briskman

data data-engineering streaming-messaging real-time-analytics Analytics Druid

Every organization needs insight to succeed and excel, and the primary foundation for insights today is data—whether it's internal data from operational systems or external data from partners, vendors, and public sources. But how can you use this data to create and maintain analytics applications capable of gaining real insights in real time? In this report, Darin Briskman explains that leading organizations like Netflix, Walmart, and Confluent have found that while traditional analytics still have value, it's not enough. These companies and many others are now building real-time analytics that deliver insights continually, on demand, and at scale—complete with interactive drill-down data conversations, subsecond performance at scale, and always-on reliability. Ideal for data engineers, data scientists, data architects, and software developers, this report helps you: Learn the elements of real-time analytics, including subsecond performance, high concurrency, and the combination of real-time and historical data Examine case studies that show how Netflix, Walmart, and Confluent have adopted real-time analytics Explore Apache Druid, the real-time database that powers real-time analytics applications Learn how to create real-time analytics applications through data design and interfaces Understand the importance of security, resilience, and managed services Darin Briskman is director of technology at Imply Data, Inc., a software company committed to advancing open source technology and making it simple for developers to realize the power of Apache Druid.

IBM Software Systems Integration: With IBM MQ Series for JMS, IBM FileNet Case Manager, and IBM Business Automation Workflow

2023-01-16 O'Reilly Amazon

book

Alan S. Bluck

data data-engineering IBM Analytics API Cognos

Examine the working details for real-world Java programs used for system integration with IBM Software, applying various API libraries (as used by Banking and Insurance companies). This book includes the step-by-step procedure to use the IBM FileNet Case Manager 5.3.3 Case Builder solution and the similar IBM System, IBM Business Automation Workflow to create an Audit System. You'll learn how to implement the workflow with a client Java Message Service (JMS) java method developed with Workflow Custom Operations System Step components. Using IBM Cognos Analytics Version 11.2, you'll be able to create new views for IBM Case Manager Analytics for custom time dimensions. The book also explains the SQL code and procedures required to create example Online Analytical Processing (OLAP) cubes with multi-level time dimensions for IBM Case Manager analytics. IBM Software Systems Integration features the most up to date systems software procedures using tested API calls. What You Will Learn Review techniques for generating custom IBM JMS code Create a new custom view for a multi-level time dimension See how a java program can provide the IBM FileNet document management API calls for content store folder and document replication Configure Java components for content engine events Who This Book Is ForIT consultants, Systems and Solution Architects.

Data Modeling with Tableau

2022-12-30 O'Reilly Amazon

book

Kirk Munroe

data data-engineering data-models Analytics Cloud Computing Data Modelling

"Data Modeling with Tableau" provides a comprehensive guide to effectively utilizing Tableau Prep and Tableau Desktop for building elegant data models that drive organizational insights. You'll explore robust data modeling strategies and governance practices tailored to Tableau's diverse toolset, empowering you to make faster and more informed decisions based on data. What this Book will help me do Understand the fundamentals of data modeling in Tableau using Prep Builder and Desktop. Learn to optimize data sources for performance and better query capabilities. Implement secure and scalable governance strategies with Tableau Server and Cloud. Use advanced Tableau features like Ask Data and Explain Data to enable powerful analytics. Apply best practices for sharing and extending data models within your organization. Author(s) Kirk Munroe is an experienced data professional with a deep understanding of Tableau-driven analytics. With years of in-field expertise, Kirk now dedicates his career to helping businesses unlock their data's potential through effective Tableau solutions. His hands-on approach ensures this book is practical and approachable. Who is it for? This book is ideal for data analysts and business analysts aiming to enhance their skills in data modeling. It is also valuable for professionals such as data stewards, looking to implement secure and performant data strategies. If you seek to make enterprise data more accessible and actionable, this book is for you.

SAP S/4HANA Financial Accounting Configuration: Learn Configuration and Development on an S/4 System

2022-12-24 O'Reilly Amazon

book

Andrew Okungbowa

data data-engineering SAP AI/ML Analytics ERP

Upgrade your knowledge to learn S/4HANA, the latest version of the SAP ERP system, with its built-in intelligent technologies, including AI, machine learning, and advanced analytics. Since the first edition of this book published as SAP ERP Financial and Controlling: Configuration and Use Management, the perspective has changed significantly as S/4HANA now comes with new features, such as FIORI (new GUI), which focuses on flexible app style development and interactivity with mobile phones. It also has a universal journal, which helps in data integration in a single location, such as centralized processing, and is faster than ECC S/3. It merges FI & CO efficiently, which enables document posting in the Controlling area setup. General Ledger Accounts (FI) and Cost Element (CO) are mapped together in a way that cost elements (both primary and secondary) are part of G/L accounts. And a mandatory setup of customer-vendor integration with business partners is included vs the earlier ECC creation with separate vendor master and customer master.This updated edition presents new features in SAP S/4HANA, with in-depth coverage of the FI syllabus in SAP S/4HANA. A practical and hands-on approach includes scenarios with real-life examples and practical illustrations. There is no unnecessary jargon in this configuration and end-user manual. What You Will Learn Configure SAP FI as a pro in S/4 Master core aspects of Financial Accounting and Controlling Integrate SAP Financial with other SAP modules Gain a thorough hands-on experience with IMG (Implementation Guide) Understand and explain the functionalities of SAP FI Who This Book Is For FI consultants, trainers, developers, accountants, and SAP FI support organizations will find the book an excellent reference guide. Beginners without prior FI configuration experience will find the step-by-step illustrations to be practical and great hands-on experience.

The Cloud Data Lake

2022-12-12 O'Reilly Amazon

book

Rukmani Gopalan

data data-engineering storage-repositories data-lake Analytics Big Data

More organizations than ever understand the importance of data lake architectures for deriving value from their data. Building a robust, scalable, and performant data lake remains a complex proposition, however, with a buffet of tools and options that need to work together to provide a seamless end-to-end pipeline from data to insights. This book provides a concise yet comprehensive overview on the setup, management, and governance of a cloud data lake. Author Rukmani Gopalan, a product management leader and data enthusiast, guides data architects and engineers through the major aspects of working with a cloud data lake, from design considerations and best practices to data format optimizations, performance optimization, cost management, and governance. Learn the benefits of a cloud-based big data strategy for your organization Get guidance and best practices for designing performant and scalable data lakes Examine architecture and design choices, and data governance principles and strategies Build a data strategy that scales as your organizational and business needs increase Implement a scalable data lake in the cloud Use cloud-based advanced analytics to gain more value from your data

Unlocking the Value of Real-Time Analytics

2022-11-25 O'Reilly Amazon

book

Christopher Gardner

data data-engineering streaming-messaging real-time-analytics Analytics Data Collection

Storing data and making it accessible for real-time analysis is a huge challenge for organizations today. In 2020 alone, 64.2 billion GB of data was created or replicated, and it continues to grow. With this report, data engineers, architects, and software engineers will learn how to do deep analysis and automate business decisions while keeping your analytical capabilities timely. Author Christopher Gardner takes you through current practices for extracting data for analysis and uncovers the opportunities and benefits of making that data extraction and analysis continuous. By the end of this report, you’ll know how to use new and innovative tools against your data to make real-time decisions. And you’ll understand how to examine the impact of real-time analytics on your business. Learn the four requirements of real-time analytics: latency, freshness, throughput, and concurrency Determine where delays between data collection and actionable analytics occur Understand the reasons for real-time analytics and identify the tools you need to reach a faster, more dynamic level Examine changes in data storage and software while learning methodologies for overcoming delays in existing database architecture Explore case studies that show how companies use columnar data, sharding, and bitmap indexing to store and analyze data Fast and fresh data can make the difference between a successful transaction and a missed opportunity. The report shows you how.

SQL Server 2022 Revealed: A Hybrid Data Platform Powered by Security, Performance, and Availability

2022-11-02 O'Reilly Amazon

book

Bob Ward

data data-engineering relational-databases microsoft-sql-server Analytics Azure

Know how to use the new capabilities and cloud integrations in SQL Server 2022. This book covers the many innovative integrations with the Azure Cloud that make SQL Server 2022 the most cloud-connected edition ever. The book covers cutting-edge features such as the blockchain-based Ledger for creating a tamper-evident record of changes to data over time that you can rely on to be correct and reliable. You'll learn about built-in Query Intelligence capabilities to help you to upgrade with confidence that your applications will perform at least as fast after the upgrade than before. In fact, you'll probably see an increase in performance from the upgrade, with no code changes needed. Also covered are innovations such as contained availability groups and data virtualization with S3 object storage. New cloud integrations covered in this book include Microsoft Azure Purview and the use of Azure SQL for high availability and disaster recovery. The bookcovers Azure Synapse Link with its built-in capabilities to take changes and put them into Synapse automatically. Anyone building their career around SQL Server will want this book for the valuable information it provides on building SQL skills from edge to the cloud. What You Will Learn Know how to use all of the new capabilities and cloud integrations in SQL Server 2022 Connect to Azure for disaster recovery, near real-time analytics, and security Leverage the Ledger to create a tamper-evident record of data changes over time Upgrade from prior releases and achieve faster and more consistent performance with no code changes Access data and storage in different and new formats, such as Parquet and S3, without moving the data and using your existing T-SQL skills Explore new application scenarios using innovations with T-SQL in areassuch as JSON and time series Who This Book Is For SQL Server professionals who want to upgrade their skills to the latest edition of SQL Server; those wishing to take advantage of new integrations with Microsoft Azure Purview (governance), Azure Synapse (analytics), and Azure SQL (HA and DR); and those in need of the increased performance and security offered by Query Intelligence and the new Ledger

Trino: The Definitive Guide, 2nd Edition

2022-10-03 O'Reilly Amazon

book

Manfred Moser , Martin Traverso , Matt Fuller

data data-engineering nosql-databases Analytics Cassandra Data Lake

Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. In the second edition of this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's a data lake using Hive, a modern lakehouse with Iceberg or Delta Lake, a different system like Cassandra, Kafka, or SingleStore, or a relational database like PostgreSQL or Oracle. Analysts, software engineers, and production engineers learn how to manage, use, and even develop with Trino and make it a critical part of their data platform. Authors Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization. Explore Trino's use cases, and learn about tools that help you connect to Trino for querying and processing huge amounts of data Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more Deploy and secure Trino at scale, monitor workloads, tune queries, and connect more applications Learn how other organizations apply Trino successfully

Azure Data Engineering Cookbook - Second Edition

2022-09-26 O'Reilly Amazon

book

Nagaraj Venkatesan , Ahmad Osama , Luca Zanna

data data-engineering Analytics Azure ADF BI

Azure Data Engineering Cookbook is your ultimate guide to mastering data engineering on Microsoft's Azure platform. Through an engaging collection of recipes, this book breaks down procedures to build sophisticated data pipelines, leveraging tools like Azure Data Factory, Data Lake, Databricks, and Synapse Analytics. What this Book will help me do Efficiently process large datasets using Azure Synapse analytics and Azure Databricks pipelines. Transform and shape data within systems by leveraging Azure Synapse data flows. Implement and manage relational databases in Azure with performance tuning and administration. Configure data pipeline solutions integrated with Power BI for insightful reporting. Monitor, optimize, and ensure lineage tracking for your data systems efficiently with Purview and Log analytics. Author(s) Nagaraj Venkatesan is an experienced cloud architect specializing in Microsoft Azure, with years of hands-on data engineering expertise. Ahmad Osama is a seasoned data professional and author's shared emphasis is on practical learning and bridging this with actionable skills effectively. Who is it for? This book is essential for data engineers seeking expertise in Azure's rich engineering capabilities. It's tailored for professionals with a foundational knowledge of cloud services, looking to achieve advanced proficiency in Azure data engineering pipelines.

Serverless ETL and Analytics with AWS Glue

2022-08-30 O'Reilly Amazon

book

Albert Quiroga , Subramanya Vajiraya , Vishal Pathak , Noritaka Sekiyama , Ishan Gaur , Tomohiro Tanaka

data data-engineering etl AI/ML Analytics AWS

Discover how to harness AWS Glue for your ETL and data analysis workflows with "Serverless ETL and Analytics with AWS Glue." This comprehensive guide introduces readers to the capabilities of AWS Glue, from building data lakes to performing advanced ETL tasks, allowing you to create efficient, secure, and scalable data pipelines with serverless technology. What this Book will help me do Understand and utilize various AWS Glue features for data lake and ETL pipeline creation. Leverage AWS Glue Studio and DataBrew for intuitive data preparation workflows. Implement effective storage optimization techniques for enhanced data analytics. Apply robust data security measures, including encryption and access control, to protect data. Integrate AWS Glue with machine learning tools like SageMaker to build intelligent models. Author(s) The authors of this book include experts across the fields of data engineering and AWS technologies. With backgrounds in data analytics, software development, and cloud architecture, they bring a depth of practical experience. Their approach combines hands-on tutorials with conceptual clarity, ensuring a blend of foundational knowledge and actionable insights. Who is it for? This book is designed for ETL developers, data engineers, and data analysts who are familiar with data management concepts and want to extend their skills into serverless cloud solutions. If you're looking to master AWS Glue for building scalable and efficient ETL pipelines or are transitioning existing systems to the cloud, this book is ideal for you.

Snowflake: The Definitive Guide

2022-08-11 O'Reilly Amazon

book

Joyce Kay Avila

data data-engineering Snowflake Analytics Cloud Computing Data Analytics

Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you. You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily. You'll be able to: Efficiently capture, store, and process large amounts of data at an amazing speed Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace

Simplifying Data Engineering and Analytics with Delta

2022-07-29 O'Reilly Amazon

book

Anindita Mahapatra

data data-engineering AI/ML Analytics BI Data Engineering

This book will guide you through mastering Delta, a robust and versatile protocol for data engineering and analytics. You'll discover how Delta simplifies data workflows, supports both batch and streaming data, and is optimized for analytics applications in various industries. By the end, you will know how to create high-performing, analytics-ready data pipelines. What this Book will help me do Understand Delta's unique offering for unifying batch and streaming data processing. Learn approaches to address data governance, reliability, and scalability challenges. Gain technical expertise in building data pipelines optimized for analytics and machine learning use. Master core concepts like data modeling, distributed computing, and Delta's schema evolution features. Develop and deploy production-grade data engineering solutions leveraging Delta for business intelligence. Author(s) Anindita Mahapatra is an experienced data engineer and author with years of expertise in working on Delta and data-driven solutions. Her hands-on approach to explaining complex data concepts makes this book an invaluable resource for professionals in data engineering and analytics. Who is it for? Ideal for data engineers, data analysts, and anyone involved in AI/BI workflows, this book suits learners with some basic knowledge of SQL and Python. Whether you're an experienced professional or looking to upgrade your skills with Delta, this book will provide practical insights and actionable knowledge.

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

2022-07-13 O'Reilly Amazon

book

Ron L'Esteve

data data-engineering storage-repositories data-lake AI/ML Analytics

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs. After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform. What You Will Learn Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform Benefit from the new Delta Lake open-source storage layer for data lakehouses Take advantage of schema evolution, change feeds, live tables, and more Writefunctional PySpark code for data lakehouse ELT jobs Optimize Apache Spark performance through partitioning, indexing, and other tuning options Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake Who This Book Is For Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform.

Data Engineering with Alteryx

2022-06-30 O'Reilly Amazon

book

Paul Houghton

data data-engineering AI/ML Alteryx Analytics Data Engineering

Dive into 'Data Engineering with Alteryx' to master the principles of DataOps while learning to build robust data pipelines using Alteryx. This book guides you through key practices to enhance data pipeline reliability, efficiency, and accessibility, making it an essential resource for modern data professionals. What this Book will help me do Understand and implement DataOps practices within Alteryx workflows. Design and develop data pipelines with Alteryx Designer for efficient data processing. Learn to manage and publish pipelines using Alteryx Server and Alteryx Connect. Gain advanced skills in Alteryx for handling spatial analytics and machine learning. Master techniques to monitor, secure, and optimize data workflows and access. Author(s) Paul Houghton is an experienced data engineer and author specializing in data engineering and DataOps. With extensive experience using Alteryx tools and workflows, Paul has a passion for teaching and sharing his knowledge through clear and practical guidance. His hands-on approach ensures readers successfully navigate and apply technical concepts to real-world projects. Who is it for? This book is ideal for data engineers, data scientists, and data analysts aiming to build reliable data pipelines with Alteryx. You do not need prior experience with Alteryx, but familiarity with data workflows will enhance your learning experience. If you're focused on aligning with DataOps methodologies, this book is tailored for you.

Ten Things to Know About ModelOps

2022-06-25 O'Reilly Amazon

book

Larry Derany , Thomas Hill , Mark Palmer

data data-engineering data-models AI/ML Analytics Data Science

The past few years have seen significant developments in data science, AI, machine learning, and advanced analytics. But the wider adoption of these technologies has also brought greater cost, risk, regulation, and demands on organizational processes, tasks, and teams. This report explains how ModelOps can provide both technical and operational solutions to these problems. Thomas Hill, Mark Palmer, and Larry Derany summarize important considerations, caveats, choices, and best practices to help you be successful with operationalizing AI/ML and analytics in general. Whether your organization is already working with teams on AI and ML, or just getting started, this report presents ten important dimensions of analytic practice and ModelOps that are not widely discussed, or perhaps even known. In part, this report examines: Why ModelOps is the enterprise "operating system" for AI/ML algorithms How to build your organization's IP secret sauce through repeatable processing steps How to anticipate risks rather than react to damage done How ModelOps can help you deliver the many algorithms and model formats available How to plan for success and monitor for value, not just accuracy Why AI will be soon be regulated and how ModelOps helps ensure compliance

In-Memory Analytics with Apache Arrow

2022-06-24 O'Reilly Amazon

book

Matthew Topol

data data-engineering apache-arrow Analytics API Arrow

Discover the power of in-memory data analytics with "In-Memory Analytics with Apache Arrow." This book delves into Apache Arrow's unique capabilities, enabling you to handle vast amounts of data efficiently and effectively. Learn how Arrow improves performance, offers seamless integration, and simplifies data analysis in diverse computing environments. What this Book will help me do Gain proficiency with the datastore facilities and data types defined by Apache Arrow. Master the Arrow Flight APIs to efficiently transfer data between systems. Learn to leverage in-memory processing advantages offered by Arrow for state-of-the-art analytics. Understand how Arrow interoperates with popular tools like Pandas, Parquet, and Spark. Develop and deploy high-performance data analysis pipelines with Apache Arrow. Author(s) Matthew Topol, the author of the book, is an experienced practitioner in data analytics and Apache Arrow technology. Having contributed to the development and implementation of Arrow-powered systems, he brings a wealth of knowledge to readers. His ability to delve deep into technical concepts while keeping explanations practical makes this book an excellent guide for learners of the subject. Who is it for? This book is ideal for professionals in the data domain including developers, data analysts, and data scientists aiming to enhance their data manipulation capabilities. Beginners with some familiarity with data analysis concepts will find it beneficial, as well as engineers designing analytics utilities. Programming examples accommodate users of C, Go, and Python, making it broadly accessible.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Data Engineering with dbt

Geospatial Data Analytics on AWS

Data for All

IBM Power System AC922 Technical Overview and Introduction

Automating Data Transformations

IBM FlashSystem 7300 Product Guide

Snowflake SnowPro™ Advanced Architect Certification Companion: Hands-on Preparation and Practice

What Every Engineer Should Know About Data-Driven Analytics

Principles of Data Fabric

Building Real-Time Analytics Applications

IBM Software Systems Integration: With IBM MQ Series for JMS, IBM FileNet Case Manager, and IBM Business Automation Workflow

Data Modeling with Tableau

SAP S/4HANA Financial Accounting Configuration: Learn Configuration and Development on an S/4 System

The Cloud Data Lake

Unlocking the Value of Real-Time Analytics

SQL Server 2022 Revealed: A Hybrid Data Platform Powered by Security, Performance, and Availability

Trino: The Definitive Guide, 2nd Edition

Azure Data Engineering Cookbook - Second Edition

Serverless ETL and Analytics with AWS Glue

Snowflake: The Definitive Guide

Simplifying Data Engineering and Analytics with Delta

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

Data Engineering with Alteryx

Ten Things to Know About ModelOps

In-Memory Analytics with Apache Arrow