O'Reilly Data Engineering Books

Data Engineering on Azure

2021-08-17 O'Reilly Amazon

book

Vlad Riscutia

data data-engineering AI/ML Analytics Azure Big Data

Build a data platform to the industry-leading standards set by Microsoft’s own infrastructure. In Data Engineering on Azure you will learn how to: Pick the right Azure services for different data scenarios Manage data inventory Implement production quality data modeling, analytics, and machine learning workloads Handle data governance Using DevOps to increase reliability Ingesting, storing, and distributing data Apply best practices for compliance and access control Data Engineering on Azure reveals the data management patterns and techniques that support Microsoft’s own massive data infrastructure. Author Vlad Riscutia, a data engineer at Microsoft, teaches you to bring an engineering rigor to your data platform and ensure that your data prototypes function just as well under the pressures of production. You'll implement common data modeling patterns, stand up cloud-native data platforms on Azure, and get to grips with DevOps for both analytics and machine learning. About the Technology Build secure, stable data platforms that can scale to loads of any size. When a project moves from the lab into production, you need confidence that it can stand up to real-world challenges. This book teaches you to design and implement cloud-based data infrastructure that you can easily monitor, scale, and modify. About the Book In Data Engineering on Azure you’ll learn the skills you need to build and maintain big data platforms in massive enterprises. This invaluable guide includes clear, practical guidance for setting up infrastructure, orchestration, workloads, and governance. As you go, you’ll set up efficient machine learning pipelines, and then master time-saving automation and DevOps solutions. The Azure-based examples are easy to reproduce on other cloud platforms. What's Inside Data inventory and data governance Assure data quality, compliance, and distribution Build automated pipelines to increase reliability Ingest, store, and distribute data Production-quality data modeling, analytics, and machine learning About the Reader For data engineers familiar with cloud computing and DevOps. About the Author Vlad Riscutia is a software architect at Microsoft. Quotes A definitive and complete guide on data engineering, with clear and easy-to-reproduce examples. - Kelum Prabath Senanayake, Echoworx An all-in-one Azure book, covering all a solutions architect or engineer needs to think about. - Albert Nogués, Danone A meaningful journey through the Azure ecosystem. You’ll be building pipelines and joining components quickly! - Todd Cook, Appen A gateway into the world of Azure for machine learning and DevOps engineers. - Krzysztof Kamyczek, Luxoft

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

2021-08-06 O'Reilly Amazon

book

Ron C. L'Esteve

it-operations cloud-computing cloud-platforms microsoft-azure azure-analytics Analytics

Build efficient and scalable batch and real-time data ingestion pipelines, DevOps continuous integration and deployment pipelines, and advanced analytics solutions on the Azure Data Platform. This book teaches you to design and implement robust data engineering solutions using Data Factory, Databricks, Synapse Analytics, Snowflake, Azure SQL database, Stream Analytics, Cosmos database, and Data Lake Storage Gen2. You will learn how to engineer your use of these Azure Data Platform components for optimal performance and scalability. You will also learn to design self-service capabilities to maintain and drive the pipelines and your workloads. The approach in this book is to guide you through a hands-on, scenario-based learning process that will empower you to promote digital innovation best practices while you work through your organization’s projects, challenges, and needs. The clear examples enable you to use this book as a reference and guide for building data engineering solutions in Azure. After reading this book, you will have a far stronger skill set and confidence level in getting hands on with the Azure Data Platform. What You Will Learn Build dynamic, parameterized ELT data ingestion orchestration pipelines in Azure Data Factory Create data ingestion pipelines that integrate control tables for self-service ELT Implement a reusable logging framework that can be applied to multiple pipelines Integrate Azure Data Factory pipelines with a variety of Azure data sources and tools Transform data with Mapping Data Flows in Azure Data Factory Apply Azure DevOps continuous integration and deployment practices to your Azure Data Factory pipelines and development SQL databases Design and implement real-time streaming and advanced analytics solutions using Databricks, Stream Analytics, and Synapse Analytics Get started with a variety of Azure data services through hands-on examples Who This Book Is For Data engineers and data architects who are interested in learning architectural and engineering best practices around ELT and ETL on the Azure Data Platform, those who are creating complex Azure data engineering projects and are searching for patterns of success, and aspiring cloud and data professionals involved in data engineering, data governance, continuous integration and deployment of DevOps practices, and advanced analytics who want a full understanding of the many different tools and technologies that Azure Data Platform provides

Designing Big Data Platforms

2021-07-27 O'Reilly Amazon

book

Yusuf Aytas

data data-engineering Analytics Big Data Computer Science Data Analytics

DESIGNING BIG DATA PLATFORMS Provides expert guidance and valuable insights on getting the most out of Big Data systems An array of tools are currently available for managing and processing data—some are ready-to-go solutions that can be immediately deployed, while others require complex and time-intensive setups. With such a vast range of options, choosing the right tool to build a solution can be complicated, as can determining which tools work well with each other. Designing Big Data Platforms provides clear and authoritative guidance on the critical decisions necessary for successfully deploying, operating, and maintaining Big Data systems. This highly practical guide helps readers understand how to process large amounts of data with well-known Linux tools and database solutions, use effective techniques to collect and manage data from multiple sources, transform data into meaningful business insights, and much more. Author Yusuf Aytas, a software engineer with a vast amount of big data experience, discusses the design of the ideal Big Data platform: one that meets the needs of data analysts, data engineers, data scientists, software engineers, and a spectrum of other stakeholders across an organization. Detailed yet accessible chapters cover key topics such as stream data processing, data analytics, data science, data discovery, and data security. This real-world manual for Big Data technologies: Provides up-to-date coverage of the tools currently used in Big Data processing and management Offers step-by-step guidance on building a data pipeline, from basic scripting to distributed systems Highlights and explains how data is processed at scale Includes an introduction to the foundation of a modern data platform Designing Big Data Platforms: How to Use, Deploy, and Maintain Big Data Systems is a must-have for all professionals working with Big Data, as well researchers and students in computer science and related fields.

Amazon Redshift Cookbook

2021-07-23 O'Reilly Amazon

book

Shruti Worlikar , Thiyagarajan Arumugam , Harshida Patel

data data-engineering relational-databases amazon-redshift Analytics Cloud Computing

Dive into the world of Amazon Redshift with this comprehensive cookbook, packed with practical recipes to build, optimize, and manage modern data warehousing solutions. From understanding Redshift's architecture to implementing advanced data warehousing techniques, this book provides actionable guidance to harness the power of Amazon Redshift effectively. What this Book will help me do Master the architecture and core concepts of Amazon Redshift to architect scalable data warehouses. Optimize data pipelines and automate ETL processes for seamless data ingestion and management. Leverage advanced features like concurrency scaling and Redshift Spectrum for enhanced analytics. Apply best practices for security and cost optimization in Redshift projects. Gain expertise in scaling data warehouse solutions to accommodate large-scale analytics needs. Author(s) Shruti Worlikar, None Arumugam, and None Patel are seasoned experts in data warehousing and analytics with extensive experience using Amazon Redshift. Their backgrounds in implementing scalable data solutions make their insights practical and grounded. Through their collaborative writing, they aim to make complex topics approachable to learners of various skill levels. Who is it for? This book is tailored for professionals such as data warehouse developers, data engineers, and data analysts looking to master Amazon Redshift. It suits intermediate to advanced practitioners with a basic understanding of data warehousing and cloud technologies. Readers seeking to optimize Redshift for cost, performance, and security will find this guide invaluable.

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

2021-07-16 O'Reilly Amazon

book

Dejan Sarka

data data-engineering relational-databases microsoft-sql-server transact-sql AI/ML

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration. No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well. Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining. Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented. T-SQL is supported in SQL Server,Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you. What You Will Learn Describe distribution of variables with statistical measures Find associations between pairs of variables Evaluate the quality of the data you are analyzing Perform time-series analysis on your data Forecast values of a continuous variable Perform market-basket analysis to predict customer purchasing patterns Predict target variable outcomes from one or more input variables Categorize passages of text by extracting and analyzing keywords Who This Book Is For Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

Data Lakes For Dummies

2021-07-14 O'Reilly Amazon

book

Alan R. Simon

data data-engineering storage-repositories data-lake Analytics BI

Take a dive into data lakes “Data lakes” is the latest buzz word in the world of data storage, management, and analysis. Data Lakes For Dummies decodes and demystifies the concept and helps you get a straightforward answer the question: “What exactly is a data lake and do I need one for my business?” Written for an audience of technology decision makers tasked with keeping up with the latest and greatest data options, this book provides the perfect introductory survey of these novel and growing features of the information landscape. It explains how they can help your business, what they can (and can’t) achieve, and what you need to do to create the lake that best suits your particular needs. With a minimum of jargon, prolific tech author and business intelligence consultant Alan Simon explains how data lakes differ from other data storage paradigms. Once you’ve got the background picture, he maps out ways you can add a data lake to your business systems; migrate existing information and switch on the fresh data supply; clean up the product; and open channels to the best intelligence software for to interpreting what you’ve stored. Understand and build data lake architecture Store, clean, and synchronize new and existing data Compare the best data lake vendors Structure raw data and produce usable analytics Whatever your business, data lakes are going to form ever more prominent parts of the information universe every business should have access to. Dive into this book to start exploring the deep competitive advantage they make possible—and make sure your business isn’t left standing on the shore.

Data Fabric as Modern Data Architecture

2021-06-25 O'Reilly Amazon

book

Alice LaPlante

data data-engineering Analytics Data Management Data Quality Fabric

Data fabric is a hot concept in data management today. By encompassing the data ecosystem your company already has in place, this architectural design pattern provides your staff with one reliable place to go for data. In this report, author Alice LaPlante shows CIOs, CDOs, and CAOs how data fabric enables their users to spend more time analyzing than wrangling data. The best way to thrive during this intense period of digital transformation is through data. But after roaring through 2019, progress on getting the most out of data investments has lost steam. Only 38% of companies now say they've created a data-driven organization. This report describes how a data fabric can help you reach the all-important goal of data democratization. Learn how data fabric handles data prep, data delivery, and serves as a data catalog Use data fabric to handle data variety, a top challenge for many organizations Learn how data fabric spans any environment to support data for users and use cases from any source Examine data fabric's capabilities including data and metadata management, data quality, integration, analytics, visualization, and governance Get five pieces of advice for getting started with data fabric

Architecting Data-Intensive SaaS Applications

2021-05-25 O'Reilly Amazon

book

Pui Kei Johnston Chu , Kevin McGinley , William Waddington , Gjorgji Georgievski , Dinesh Kulkarni

data data-engineering AI/ML Analytics Cloud Computing IoT

Through explosive growth in the past decade, data now drives significant portions of our lives, from crowdsourced restaurant recommendations to AI systems identifying effective medical treatments. Software developers have unprecedented opportunity to build data applications that generate value from massive datasets across use cases such as customer 360, application health and security analytics, the IoT, machine learning, and embedded analytics. With this report, product managers, architects, and engineering teams will learn how to make key technical decisions when building data-intensive applications, including how to implement extensible data pipelines and share data securely. The report includes design considerations for making these decisions and uses the Snowflake Data Cloud to illustrate best practices. This report explores: Why data applications matter: Get an introduction to data applications and some of the most common use cases Evaluating platforms for building data apps: Evaluate modern data platforms to confidently consider the merits of potential solutions Building scalable data applications: Learn design patterns and best practices for storage, compute, and security Handling and processing data: Explore techniques and real-world examples for building data pipelines to support data applications Designing for data sharing: Learn best practices for sharing data in modern data applications

SAP S/4HANA Embedded Analytics: Experiences in the Field

2021-05-19 O'Reilly Amazon

book

Freek Keijzer

data data-engineering SAP Agile/Scrum Analytics BI

Imagine you are a business user, consultant, or developer about to enter an SAP S/4HANA implementation project. You are well-versed with SAP’s product portfolio and you know that the preferred reporting option in S/4HANA is embedded analytics. But what exactly is embedded analytics? And how can it be implemented? And who can do it: a business user, a functional consultant specialized in financial or logistics processes? Or does a business intelligence expert or a programmer need to be involved? Good questions! This book will answer these questions, one by one. It will also take you on the same journey that the implementation team needs to follow for every reporting requirement that pops up: start with assessing a more standard option and only move on to a less standard option if the requirement cannot be fulfilled. In consecutive chapters, analytical apps delivered by SAP, apps created using Smart Business Services, and Analytical Queries developed either using tiles or in adevelopment environment are explained in detail with practical examples. The book also explains which option is preferred in which situation. The book covers topics such as in-memory computing, cloud, UX, OData, agile development, and more.Author Freek Keijzer writes from the perspective of an implementation consultant, focusing on functionality that has proven itself useful in the field. Practical examples are abundant, ranging from “codeless” to “hardcore coding.” What You Will Learn Know the difference between static reporting and interactive querying on real-time data Understand which options are available for analytics in SAP S/4HANA Understand which option to choose in which situation Know how to implement these options Who This Book is For SAP power users, functional consultants, developers

Understanding Log Analytics at Scale, 2nd Edition

2021-05-10 O'Reilly Amazon

book

Matt Gillespie , Charles Givre

data data-engineering log-data Analytics Cyber Security

Using log analytics provides organizations with powerful and necessary capabilities for IT security. By analyzing log data, you can drive critical business outcomes, such as identifying security threats or opportunities to build new products. Log analytics also helps improve business efficiency, application, infrastructure, and uptime. In the second edition of this report, data architects and IT infrastructure leads will learn how to get up to speed on log data, log analytics, and log management. Log data, the list of recorded events from software and hardware, typically includes the IP address, time of event, date of event, and more. You'll explore how proactively planned data storage and delivery extends enterprise IT capabilities critical to security analytics deployments. Explore what log analytics is--and why log data is so vital Learn how log analytics helps organizations achieve better business outcomes Use log analytics to address specific business problems Examine the current state of log analytics, including common issues Make the right storage deployments for log analytics use cases Understand how log analytics will evolve in the future With this in-depth report, you'll be able to identify the points your organization needs to consider to achieve successful business outcomes from your log data.

IBM z15 Technical Introduction

2021-05-03 O'Reilly Amazon

book

Frank Packheiser , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Bill White , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga

data data-engineering IBM Agile/Scrum Analytics Cloud Computing

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform, the IBM z15™. It includes information about the Z environment and how it helps integrate data and transactions more securely. It also provides insight for faster and more accurate business decisions. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, and occupies an industry-standard footprint. It is offered as a single air-cooled 19-inch frame called the z15 T02, or as a multi-frame (1 to 4 19-inch frames) called the z15 T01. Both z15 models excel at the following tasks:: Using hybrid multicloud integration services Securing and protecting data with encryption everywhere Providing resilience with key to zero downtime Transforming a transactional platform into a data powerhouse Getting more out of the platform with operational analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and IBM Z technologies This book explains how this system uses innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM Power Systems for SAS Viya 3.5 Deployment Guide

2021-04-22 O'Reilly Amazon

book

Dino Quintero , Sandy Kao , Christopher Chung , Kurt Koehle , Reinaldo Tetsuo Katahira , Abhijit Mane , Adriano Almeida , Travis Siegfried , Taragopal Chattopadhyay , Harry Seifert , Pradyothan Jeedula , Beth L. Hoffman , Antonio Moreira de Oliveira Neto

data data-engineering IBM AI/ML Analytics SAS

This IBM® Redbooks® publication provides options and best practices for deploying SAS Viya 3.5 on IBM POWER9™ servers. SAS Viya is a complex set of artificial intelligence (AI) and analytics solutions that require a properly planned infrastructure to meet the needs of the data scientists, business analysts, and application developers who use Viya capabilities in their daily work activities. Regardless of the user role, the underlying infrastructure matters to ensure performance expectations and service level agreement (SLA) requirements are met or exceeded. Although the general planning process is similar for deploying SAS Viya on any platform, key IBM POWER9 differentiators must be considered to ensure that an optimized infrastructure deployment is achieved. This guide provides useful information that is needed during the planning, sizing, ordering, installing, configuring, and tuning phases of your SAS Viya deployment on POWER9 processor-based servers. This book addresses topics for IT architects, IT specialists, developers, sellers, and anyone who wants to implement SAS Viya 3.5 on IBM POWER9 servers. Moreover, this publication provides documentation to transfer the how-to-skills to the technical teams, and solution guidance to the sales team. This book compliments the documentation that is available in IBM Knowledge Center and aligns with the educational materials that are provided by the IBM Systems Software Education (SSE).

Azure Data Engineering Cookbook

2021-04-05 O'Reilly Amazon

book

Nagaraj Venkatesan , Ahmad Osama

data data-engineering Analytics Azure ADF Cloud Computing

Dive into the world of data engineering with 'Azure Data Engineering Cookbook' to master building efficient ETL workflows using Microsoft Azure Data services. Whether you're working on batch processing solutions or real-time analytics, this book is your guide to implementing effective, scalable data operations. What this Book will help me do Design and implement efficient ETL pipelines for batch and real-time processing on MS Azure. Understand the use of Azure Blob storage for managing large data sets. Ingest, process, and analyze data using tools like Azure Synapse and Databricks. Develop and secure automation pipelines using Azure Data Factory. Leverage Azure Stream Analytics for real-time data processing workflows. Author(s) Ahmad Osama and Nagaraj Venkatesan bring years of expertise in cloud solutions and data engineering. Renowned for their practical teaching approach, they have helped countless professionals master the intricacies of Azure. Their focus is on equipping readers with actionable skills for real-world data challenges. Who is it for? This book is ideal for data engineers and database professionals aiming to hone their expertise in advanced Azure data engineering tasks. Readers should have a working knowledge of Azure fundamentals and basic data engineering concepts. If you're a technical architect or ETL developer seeking to transition or enhance your skills in Azure's ecosystem, you'll find immense value here.

Snowflake Cookbook

2021-02-25 O'Reilly Amazon

book

Hamid Mahmood Qureshi , Hammad Sharif

data data-engineering Snowflake Analytics Cloud Computing DWH

The "Snowflake Cookbook" is your guide to mastering Snowflake's unique cloud-centric architecture. This book provides detailed recipes for building modern data pipelines, configuring efficient virtual warehouses, ensuring robust data protection, and optimizing cost-performance-all while leveraging Snowflake's distinctive features such as data sharing and time travel. What this Book will help me do Set up and configure Snowflake's architecture for optimized performance and cost efficiency. Design and implement robust data pipelines using SQL and Snowflake's specialized features. Secure, manage, and share data efficiently with built-in Snowflake capabilities. Apply performance tuning techniques to enhance your Snowflake implementations. Extend Snowflake's functionality with tools like Spark Connector for advanced workflows. Author(s) Hamid Mahmood Qureshi and Hammad Sharif are both seasoned experts in data warehousing and cloud computing technologies. With extensive experience implementing analytics solutions, they bring a hands-on approach to teaching Snowflake. They are ardent proponents of empowering readers towards creating effective and scalable data solutions. Who is it for? This book is perfect for data warehouse developers, data analysts, cloud architects, and anyone managing cloud data solutions. If you're familiar with basic database concepts or just stepping into Snowflake, you'll find practical guidance here to deepen your understanding and functional expertise in cloud data warehousing.

Data Pipelines Pocket Reference

2021-02-10 O'Reilly Amazon

book

James Densmore

data data-engineering Analytics Cloud Computing Data Analytics Modern Data Stack

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting

IBM Integrated Synchronization: Incremental Updates Unleashed

2021-01-27 O'Reilly Amazon

book

Günter Schöllmann , Cüneyt Göksu , Christian Michel

data data-engineering IBM Analytics Cloud Computing

The IBM® Db2® Analytics Accelerator (Accelerator) is a logical extension of Db2 for IBM z/OS® that provides a high-speed query engine that efficiently and cost-effectively runs analytics workloads. The Accelerator is an integrated back-end component of Db2 for z/OS. Together, they provide a hybrid workload-optimized database management system that seamlessly manages queries that are found in transactional workloads to Db2 for z/OS and queries that are found in analytics applications to Accelerator. Each query runs in its optimal environment for maximum speed and cost efficiency. The incremental update function of Db2 Analytics Accelerator for z/OS updates Accelerator-shadow tables continually. Changes to the data in original Db2 for z/OS tables are propagated to the corresponding target tables with a high frequency and a brief delay. Query results from Accelerator are always extracted from recent, close-to-real-time data. An incremental update capability that is called IBM InfoSphere® Change Data Capture (InfoSphere CDC) is provided by IBM InfoSphere Data Replication for z/OS up to Db2 Analytics Accelerator V7.5. Since then, an extra new replication protocol between Db2 for z/OS and Accelerator that is called IBM Integrated Synchronization was introduced. With Db2 Analytics Accelerator V7.5, customers can choose which one to use. IBM Integrated Synchronization is a built-in product feature that you use to set up incremental updates. It does not require InfoSphere CDC, which is bundled with IBM Db2 Analytics Accelerator. In addition, IBM Integrated Synchronization has more advantages: Simplified administration, packaging, upgrades, and support. These items are managed as part of the Db2 for z/OS maintenance stream. Updates are processed quickly. Reduced CPU consumption on the mainframe due to a streamlined, optimized design where most of the processing is done on the Accelerator. This situation provides reduced latency. Uses IBM Z® Integrated Information Processor (zIIP) on Db2 for z/OS, which leads to reduced CPU costs on IBM Z and better overall performance data, such as throughput and synchronized rows per second. On z/OS, the workload to capture the table changes was reduced, and the remainder can be handled by zIIPs. With the introduction of an enterprise-grade Hybrid Transactional Analytics Processing (HTAP) enabler that is also known as the Wait for Data protocol, the integrated low latency protocol is now enabled to support more analytical queries running against the latest committed data. IBM Db2 for z/OS Data Gate simplifies delivering data from IBM Db2 for z/OS to IBM Cloud® Pak® for Data for direct access by new applications. It uses the special-purpose integrated synchronization protocol to maintain data currency with low latency between Db2 for z/OS and dedicated target databases on IBM Cloud Pak for Data.

Data Accelerator for AI and Analytics

2021-01-20 O'Reilly Amazon

book

Christof Westhues , Abhishek Dave , Mike Knieriemen , Nils Haustein , Gero Schmidt , TJ Harris , Simon Lorenz , Venkateswara Puvvada

data data-engineering AI/ML Analytics Data Analytics Data Management

This IBM® Redpaper publication focuses on data orchestration in enterprise data pipelines. It provides details about data orchestration and how to address typical challenges that customers face when dealing with large and ever-growing amounts of data for data analytics. While the amount of data increases steadily, artificial intelligence (AI) workloads must speed up to deliver insights and business value in a timely manner. This paper provides a solution that addresses these needs: Data Accelerator for AI and Analytics (DAAA). A proof of concept (PoC) is described in detail. This paper focuses on the functions that are provided by the Data Accelerator for AI and Analytics solution, which simplifies the daily work of data scientists and system administrators. This solution helps increase the efficiency of storage systems and data processing to obtain results faster while eliminating unnecessary data copies and associated data management.

What Is a Data Lake?

2020-11-25 O'Reilly Amazon

book

Alex Gorelik

data data-engineering storage-repositories data-lake Analytics AWS

A revolution is occurring in data management regarding how data is collected, stored, processed, governed, managed, and provided to decision makers. The data lake is a popular approach that harnesses the power of big data and marries it with the agility of self-service. With this report, IT executives and data architects will focus on the technical aspects of building a data lake for your organization. Alex Gorelik from Facebook explains the requirements for building a successful data lake that business users can easily access whenever they have a need. You'll learn the phases of data lake maturity, common mistakes that lead to data swamps, and the importance of aligning data with your company's business strategy and gaining executive sponsorship. You'll explore: The ingredients of modern data lakes, such as the use of different ingestion methods for different data formats, and the importance of the three Vs: volume, variety, and velocity Building blocks of successful data lakes, including data ingestion, integration, persistence, data governance, and business intelligence and self-service analytics State-of-the-art data lake architectures offered by Amazon Web Services, Microsoft Azure, and Google Cloud

SUSE and IBM Power Systems for SAP HANA

2020-11-24 O'Reilly Amazon

book

Michael Tabron , Alex Cabanes

data data-engineering SAP Analytics IBM Linux

For organizations charting their way forward in today's digital economy, the clear imperative is to find better ways of extracting more value from data. By gleaning insight from data regarding customer preferences and business operations, organizations can respond to demand more effectively and better deliver the experiences that today's customers want. To this end, many organizations running SAP solutions seek to make the move to the SAP HANA database. SAP HANA offers the speed of in-memory data processing and the ability to combine transactions and analytics on a single platform for insight in real time. However, considerations at the level of IT infrastructure can make or break the success of an SAP HANA implementation. What the database runs on, in other words, matters significantly. This IBM® Redguide publication explores the value of deploying SAP HANA on SUSE Linux Enterprise Server for SAP Applications and the IBM Power platform with IBM POWER9™ processors. Both offerings are optimized to help your organization reap the rewards of SAP HANA while also transforming IT service delivery more generally. Designed for enterprise-grade operations, SUSE Linux Enterprise Server for SAP Applications offers an open-source software-defined infrastructure (SDI) that is optimized for SAP workloads. Reliable, fast, and secure, it also supports the automation that is needed to substantially free up IT staff from service deployment and management duties. Power Systems servers support SAP HANA implementations according to the SAP Tailored Data Center Integration (TDI) 5.0 specification. Optimized for scale-up and scale-out scenarios and built to support virtual persistent memory, Power Systems serves help you provision faster, scale affordably, and maximize uptime by persisting memory across virtual machines (VMs) and multiple SAP HANA instances. Both SUSE and IBM have partnered with SAP for decades to fine-tune these offerings. Together, SUSE and IBM solutions offer a way forward for deploying, optimizing, and running SAP HANA implementations that is proven to be successful. This publication looks at various aspects of this combined offering in greater detail.

Big Data Management

2020-11-09 O'Reilly Amazon

book

Peter Ghavami

data data-engineering Analytics Big Data Data Analytics Data Management

Data analytics is core to business and decision making. The rapid increase in data volume, velocity and variety offers both opportunities and challenges. While open source solutions to store big data, like Hadoop, offer platforms for exploring value and insight from big data, they were not originally developed with data security and governance in mind. Big Data Management discusses numerous policies, strategies and recipes for managing big data. It addresses data security, privacy, controls and life cycle management offering modern principles and open source architectures for successful governance of big data. The author has collected best practices from the world’s leading organizations that have successfully implemented big data platforms. The topics discussed cover the entire data management life cycle, data quality, data stewardship, regulatory considerations, data council, architectural and operational models are presented for successful management of big data. The book is a must-read for data scientists, data engineers and corporate leaders who are implementing big data platforms in their organizations.

Data Engineering with Python

2020-10-23 O'Reilly Amazon

book

Paul Crickard

data data-engineering Analytics Data Engineering Python

Discover the inner workings of data pipelines with 'Data Engineering with Python', a practical guide to mastering the art of data engineering. Through hands-on examples, you'll explore the process of designing data models, implementing data pipelines, and automating data flows, all within the context of Python. What this Book will help me do Understand the fundamentals of designing data architectures and capturing data requirements. Extract, clean, and transform data from various sources, refining it for precise applications. Implement end-to-end data pipelines, including staging, validation, and production deployment. Leverage Python to connect with databases, perform data manipulations, and build analytics workflows. Monitor and log data pipelines to ensure smooth, real-time operations and high quality. Author(s) Paul Crickard is a seasoned expert in data engineering and analytics, bringing years of practical experience to this technical guide. His unique ability to make complex technical concepts accessible makes this book invaluable for learners and professionals alike. A lifelong technologist, Paul focuses on actionable skills and building confidence to work with data pipelines and models. Who is it for? This book is ideal for aspiring data engineers, data analysts aiming to elevate their technical skillsets, or IT professionals transitioning into data-driven roles. Whether you're just stepping into the field or enhance your Python-based data capabilities, this book is tailored to provide solid grounding and practical expertise. Beginners in data engineering will find it accessible and easy to get started, while those refreshing their knowledge will benefit from its focused projects.

Hands-On SQL Server 2019 Analysis Services

2020-10-22 O'Reilly Amazon

book

Steven Hughes

data data-engineering relational-databases microsoft-sql-server Analytics BI

"Hands-On SQL Server 2019 Analysis Services" is a comprehensive guide to mastering data analysis using SQL Server Analysis Services (SSAS). This book provides you with step-by-step directions on creating and deploying tabular and multi-dimensional models, as well as using tools like MDX and DAX to query and analyze data. By the end, you'll be confident in designing effective data models for business analytics. What this Book will help me do Understand how to create and optimize both tabular and multi-dimensional models with SQL Server Analysis Services. Learn to use MDX and DAX to query and manipulate your data for enhanced insights. Integrate SSAS models with visualization tools like Excel and Power BI for effective decision-making. Implement robust security measures to safeguard data within your SSAS deployments. Master scaling and optimizing best practices to ensure high-performance analytical models. Author(s) Steven Hughes is a data analytics expert with extensive experience in business intelligence and SQL Server technologies. With years of practical experience in using SSAS and teaching data professionals, Steven has a knack for breaking down complex concepts into actionable knowledge. His approach to writing involves combining clear explanations with real-world examples. Who is it for? This book is intended for BI professionals, data analysts, and database developers who want to gain hands-on expertise with SQL Server 2019 Analysis Services. Ideal readers should have familiarity with database querying and a basic understanding of business intelligence tools like Power BI and Excel. It's perfect for those aiming to refine their skills in modeling and deploying robust analytics solutions.

IBM Db2 Analytics Accelerator V7 High Availability and Disaster Recovery

2020-10-21 O'Reilly Amazon

book

Frank Neumann , Ute Baumbach

data data-engineering relational-databases ibm-db2 Analytics IBM

IBM® Db2® Analytics Accelerator is a workload optimized appliance add-on to IBM DB2® for IBM z/OS® that enables the integration of analytic insights into operational processes to drive business critical analytics and exceptional business value. Together, the Db2 Analytics Accelerator and DB2 for z/OS form an integrated hybrid environment that can run transaction processing, complex analytical, and reporting workloads concurrently and efficiently. With IBM DB2 Analytics Accelerator for z/OS V7, the following flexible deployment options are introduced: Accelerator on IBM Integrated Analytics System (IIAS): Deployment on pre-configured hardware and software Accelerator on IBM Z®: Deployment within an IBM Secure Service Container LPAR For using the accelerator for business-critical environments, the need arose to integrate the accelerator into High Availability (HA) architectures and Disaster Recovery (DR) processes. This IBM Redpaper™ publication focuses on different integration aspects of both deployment options of the IBM Db2 Analytics Accelerator into HA and DR environments. It also shares best practices to provide wanted Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). HA systems often are a requirement in business-critical environments and can be implemented by redundant, independent components. A failure of one of these components is detected automatically and their tasks are taken over by another component. Depending on business requirements, a system can be implemented in a way that users do not notice outages (continuous availability), or in a major disaster, users notice an outage and systems resume services after a defined period, potentially with loss of data from previous work. IBM Z was strong for decades regarding HA and DR. By design, storage and operating systems are implemented in a way to support enhanced availability requirements. IBM Parallel Sysplex® and IBM Globally Dispersed Parallel Sysplex (IBM GDPS®) offer a unique architecture to support various degrees of automated failover and availability concepts. This IBM Redpaper publication shows how IBM Db2 Analytics Accelerator V7 can easily integrate into or complement existing IBM Z topologies for HA and DR. If you are using IBM Db2 Analytics Accelerator V5.1 or lower, see IBM Db2 Analytics Accelerator: High Availability and Disaster Recovery, REDP-5104.

Making Data Smarter with IBM Spectrum Discover: Practical AI Solutions

2020-10-19 O'Reilly Amazon

book

Maxime Deloche , Ivaylo B. Bozhinov , Kiran Ghag , Mathias Defiebre , Isom Crawford Jr. , Xin Liu , Gauthier Siri , Vasfi Gucer , Joseph Dain , Christopher Vollmar , Abeer Selim

data data-engineering IBM AI/ML Analytics Cloud Computing

More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data, such as the following examples: Pinpointing and activating relevant data for large-scale analytics Lacking the fine-grained visibility that is needed to map data to business priorities Removing redundant, obsolete, and trivial (ROT) data Identifying and classifying sensitive data IBM® Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on-premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage. IBM Spectrum® Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research. This IBM Redbooks® publication presents several use cases that are focused on artificial intelligence (AI) solutions with IBM Spectrum Discover. This book helps storage administrators and technical specialists plan and implement AI solutions by using IBM Spectrum Discover and several other IBM Storage products.

Security and Privacy Issues in IoT Devices and Sensor Networks

2020-10-15 O'Reilly Amazon

book

Narayan C. Debnath , Sudhir Kumar Sharma , Bharat Bhushan

data data-engineering data-security-privacy data security & privacy AI/ML Analytics

Security and Privacy Issues in IoT Devices and Sensor Networks investigates security breach issues in IoT and sensor networks, exploring various solutions. The book follows a two-fold approach, first focusing on the fundamentals and theory surrounding sensor networks and IoT security. It then explores practical solutions that can be implemented to develop security for these elements, providing case studies to enhance understanding. Machine learning techniques are covered, as well as other security paradigms, such as cloud security and cryptocurrency technologies. The book highlights how these techniques can be applied to identify attacks and vulnerabilities, preserve privacy, and enhance data security. This in-depth reference is ideal for industry professionals dealing with WSN and IoT systems who want to enhance the security of these systems. Additionally, researchers, material developers and technology specialists dealing with the multifarious aspects of data privacy and security enhancement will benefit from the book's comprehensive information. Provides insights into the latest research trends and theory in the field of sensor networks and IoT security Presents machine learning-based solutions for data security enhancement Discusses the challenges to implement various security techniques Informs on how analytics can be used in security and privacy

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Data Engineering on Azure

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

Designing Big Data Platforms

Amazon Redshift Cookbook

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

Data Lakes For Dummies

Data Fabric as Modern Data Architecture

Architecting Data-Intensive SaaS Applications

SAP S/4HANA Embedded Analytics: Experiences in the Field

Understanding Log Analytics at Scale, 2nd Edition

IBM z15 Technical Introduction

IBM Power Systems for SAS Viya 3.5 Deployment Guide

Azure Data Engineering Cookbook

Snowflake Cookbook

Data Pipelines Pocket Reference

IBM Integrated Synchronization: Incremental Updates Unleashed

Data Accelerator for AI and Analytics

What Is a Data Lake?

SUSE and IBM Power Systems for SAP HANA

Big Data Management

Data Engineering with Python

Hands-On SQL Server 2019 Analysis Services

IBM Db2 Analytics Accelerator V7 High Availability and Disaster Recovery

Making Data Smarter with IBM Spectrum Discover: Practical AI Solutions

Security and Privacy Issues in IoT Devices and Sensor Networks