O'Reilly Data Engineering Books

IBM Storage FlashSystem 9500 Product Guide for IBM Storage Virtualize 8.6

2024-04-29 O'Reilly Amazon

book

Vasfi Gucer , Jon Herd , Hartmut Lonzer

data data-engineering IBM BI Cloud Computing Cyber Security

This IBM® Redpaper® Product Guide describes the IBM Storage FlashSystem® 9500 solution, which is a next-generation IBM Storage FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Storage Virtualize. Often, applications exist that are foundational to the operations and success of an enterprise. These applications might function as prime revenue generators, guide or control important tasks, or provide crucial business intelligence, among many other jobs. Whatever their purpose, they are mission critical to the organization. They demand the highest levels of performance, functionality, security, and availability. They also must be protected against the newer threat of cyberattacks. To support such mission-critical applications, enterprises of all types and sizes turn to the IBM Storage FlashSystem 9500. IBM Storage FlashSystem 9500 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for HA Scale-out and scale-up configurations that further enhance capacity and throughput for better availability This Redpaper applies to IBM Storage Virtualize V8.6.

Engineering Data Mesh in Azure Cloud

2024-03-29 O'Reilly Amazon

book

Aniruddha Deswandikar

data data-engineering database-architecture data-mesh AI/ML Analytics

Discover how to implement a modern data mesh architecture using Microsoft Azure's Cloud Adoption Framework. In this book, you'll learn the strategies to decentralize data while maintaining strong governance, turning your current analytics struggles into scalable and streamlined processes. Unlock the potential of data mesh to achieve advanced and democratized analytics platforms. What this Book will help me do Learn to decentralize data governance and integrate data domains effectively. Master strategies for building and implementing data contracts suited to your organization's needs. Explore how to design a landing zone for a data mesh using Azure's Cloud Adoption Framework. Understand how to apply key architecture patterns for analytics, including AI and machine learning. Gain the knowledge to scale analytics frameworks using modern cloud-based platforms. Author(s) None Deswandikar is a seasoned data architect with extensive experience in implementing cutting-edge data solutions in the cloud. With a passion for simplifying complex data strategies, None brings real-world customer experiences into practical guidance. This book reflects None's dedication to helping organizations achieve their data goals with clarity and effectiveness. Who is it for? This book is ideal for chief data officers, data architects, and engineers seeking to transform data analytics frameworks to accommodate advanced workloads. Especially useful for professionals aiming to implement cloud-based data mesh solutions, it assumes familiarity with centralized data systems, data lakes, and data integration techniques. If modernizing your organization's data strategy appeals to you, this book is for you.

The Definitive Guide to Data Integration

2024-03-29 O'Reilly Amazon

book

Raphaël MANSUY , Emeric CHAIZE , Mehdi TAZI , Pierre-Yves BONNEFOY

data data-engineering API Cloud Computing Data Engineering Data Quality

Master the modern data stack with 'The Definitive Guide to Data Integration.' This comprehensive book covers the key aspects of data integration, including data sources, storage, transformation, governance, and more. Equip yourself with the knowledge and hands-on skills to manage complex datasets and unlock your data's full potential. What this Book will help me do Understand how to integrate diverse datasets efficiently using modern tools. Develop expertise in designing and implementing robust data integration workflows. Gain insights into real-time data processing and cloud-based data architectures. Learn best practices for data quality, governance, and compliance in integration. Master the use of APIs, workflows, and transformation patterns in practice. Author(s) The authors, None Bonnefoy, None Chaize, Raphaël Mansuy, and Mehdi Tazi, are seasoned experts in data engineering and integration. They bring years of experience in modern data technologies and consulting. Their approachable writing style ensures that readers at various skill levels can grasp complex concepts effectively. Who is it for? This book is ideal for data engineers, architects, analysts, and IT professionals. Whether you're new to data integration or looking to deepen your expertise, this guide caters to individuals seeking to navigate the challenges of the modern data stack.

Azure Data Factory by Example: Practical Implementation for Data Engineers

2024-03-22 O'Reilly Amazon

book

Richard Swinbank

data data-engineering storage-repositories data-lake Analytics Azure

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. This edition, updated for 2024, includes the latest developments to the Azure Data Factory service: Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics. Improvements to flow control provided by activity deactivation and the Fail activity. The introduction of reusable data flow components such as user-defined functions and flowlets. Extensions to integration runtime capabilities including Managed VNet support. The ability to trigger pipelines in response to custom events. Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

Azure Data Factory Cookbook - Second Edition

2024-02-28 O'Reilly Amazon

book

Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

data data-engineering storage-repositories data-lake Analytics Azure

This comprehensive guide to Azure Data Factory shows you how to create robust data pipelines and workflows to handle both cloud and on-premises data solutions. Through practical recipes, you will learn to build, manage, and optimize ETL, hybrid ETL, and ELT processes. The book offers detailed explanations to help you integrate technologies like Azure Synapse, Data Lake, and Databricks into your projects. What this Book will help me do Master building and managing data pipelines using Azure Data Factory's latest versions and features. Leverage Azure Synapse and Azure Data Lake for streamlined data integration and analytics workflows. Enhance your ETL/ELT solutions with Microsoft Fabric, Databricks, and Delta tables. Employ debugging tools and workflows in Azure Data Factory to identify and solve data processing issues efficiently. Implement industry-grade best practices for reliable and efficient data orchestration and integration pipelines. Author(s) Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, and Xenia Ireton collectively bring years of expertise in data engineering and cloud-based solutions. They are recognized professionals in the Azure ecosystem, dedicated to sharing their knowledge through detailed and actionable content. Their collaborative approach ensures that this book provides practical insights for technical audiences. Who is it for? This book is ideal for data engineers, ETL developers, and professional architects who work with cloud and hybrid environments. If you're looking to upskill in Azure Data Factory or expand your knowledge into related technologies like Synapse Analytics or Databricks, this is for you. Readers should have a foundational understanding of data warehousing concepts to fully benefit from the material.

IBM TS7700 Release 5.3 Guide

2024-02-27 O'Reilly Amazon

book

Shinsuke Ueyama , Aderson Pacini , Dave Brettell , Yuki Asakura , Nao Takemura , Lourie Goodall , Alberto Barajas Ortiz , Erina Tatsumi , Nielson ’Nino’ de Carvalho , Chen Zhu , Larry Coyne , Taisei Takai , Tomoaki Ogino , Michael Scott , Kousei Kawamura , Derek Erdmann , Trinidad Armando Rangel Ruiz , Shinya Ohri , Nobuhiko Furuya , Joe Hew , Rin Fujiwara , Ramón A. Minjares Campos , Stefan Neff , Tony Makepeace , Takahiro Tsuda

data data-engineering IBM Cloud Computing Cloud Storage S3

This IBM Redbooks® publication covers IBM TS7700 R5.3. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on over 25 years of experience, the R5.3 release includes many features that enable improved performance, usability, and security. Highlights include the IBM TS7700 Advanced Object Store, an all flash TS7770, grid resiliency enhancements, and Logical WORM retention. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud Object Storage, Amazon S3, and RSTOR. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.3. The R5.3 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000 Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.3 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. TS7700 provides tape virtualization for the IBM Z® environment. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. New and existing capabilities of the TS7700 5.3 release includes the following highlights: Support for IBM TS1160 Tape Drives and JE/JM media Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication of virtual tape and TCT objects Grid access to all logical volume and object data independent of where it resides An all flash TS7770 option for improved performance Full Advanced Object Store Grid Cloud support of DS8000 Transparent Cloud Tier Full AES256 encryption for data that is in-flight and at-rest Tight integration with IBM Z and DFSMS policy management DS8000 Object Store with AES256 in-flight encryption and compression Regulatory compliance through Logical WORM and LWORM Retention support Cloud Storage Tier support for archive, logical volume versions, and disaster recovery Optional integration with physical tape 16 Gb IBM FICON® throughput that exceeds 4 GBps per TS7700 cluster Grid Resiliency Support with Control Unit Initiated Reconfiguration (CUIR) support IBM Z hosts view up to 3,968 3490 devices per TS7700 grid TS7770 Cache On Demand feature that uses capacity-based licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1160, IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM Power9® technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM and CMTG Cyber Resiliency: Building an Automated, VMware Aware Safeguarded Copy Solution to Provide Data Resilience

2024-02-09 O'Reilly Amazon

book

Stephen Doney , Barry Whyte , Neil Morris

data data-engineering IBM Cloud Computing Data Management VMware

This IBM Blueprint outlines how CMTG and IBM have partnered to provide cyber resilient services to their clients. CMTG is one of Australia's leading private cloud providers based in Perth, Western Australia. The solution is based on IBM Storage FlashSystem, IBM Safeguarded Copy and IBM Storage Copy Data Management. The target audience for this Blueprint is IBM Storage technical specialists and storage admins.

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.6

2024-02-07 O'Reilly Amazon

book

James Whitaker , Bill Scales , Barry Whyte

data data-engineering IBM Cloud Computing Cyber Security

IBM® Storage Virtualize based storage systems are secure storage platforms that implement various security-related features, in terms of system-level access controls and data-level security features. This document outlines the available security features and options of IBM Storage Virtualize based storage systems. It is not intended as a "how to" or best practice document. Instead, it is a checklist of features that can be reviewed by a user security team to aid in the definition of a policy to be followed when implementing IBM FlashSystem®, IBM SAN Volume Controller, and IBM Storage Virtualize for Public Cloud. IBM Storage Virtualize features the following levels of security to protect against threats and to keep the attack surface as small as possible: The first line of defense is to offer strict verification features that stop unauthorized users from using login interfaces and gaining access to the system and its configuration. The second line of defense is to offer least privilege features that restrict the environment and limit any effect if a malicious actor does access the system configuration. The third line of defense is to run in a minimal, locked down, mode to prevent damage spreading to the kernel and rest of the operating system. The fourth line of defense is to protect the data at rest that is stored on the system from theft, loss, or corruption (malicious or accidental). The topics that are discussed in this paper can be broadly split into two categories: System security: This type of security encompasses the first three lines of defense that prevent unauthorized access to the system, protect the logical configuration of the storage system, and restrict what actions users can perform. It also ensures visibility and reporting of system level events that can be used by a Security Information and Event Management (SIEM) solution, such as IBM QRadar®. Data security: This type of security encompasses the fourth line of defense. It protects the data that is stored on the system against theft, loss, or attack. These data security features include Encryption of Data At Rest (EDAR) or IBM Safeguarded Copy (SGC). This document is correct as of IBM Storage Virtualize 8.6.

IBM Storage Fusion Multicloud Object Gateway

2024-01-31 O'Reilly Amazon

book

Eyal Abraham , Shawn Houston

data data-engineering IBM Cloud Computing

This Redpaper provides an overview of IBM Storage Fusion Multicloud Object Gateway (MCG) and can be used as a quick reference guide for the most common use cases. The intended audience is cloud and application administrators, as well as other technical staff members who wish to learn how MCG works, how to set it up, and usage of a Backing Store or Namespace Store, as well as object caching.

IBM SAN Volume Controller Model SV3 Product Guide (for IBM Storage Virtualize V8.6)

2024-01-17 O'Reilly Amazon

book

Vasfi Gucer , Jon Herd , Hartmut Lonzer

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller Cloud Computing

This IBM® Redpaper® Product Guide describes the IBM SAN Volume Controller model SV3 solution, which is a next-generation IBM SAN Volume Controller. Built with IBM Storage Virtualize software and part of the IBM Storage family, IBM SAN Volume Controller is an enterprise-class storage system. It helps organizations achieve better data economics by supporting the large-scale workloads that are critical to success. Data centers often contain a mix of storage systems. This situation can arise as a result of company mergers or as a deliberate acquisition strategy. Regardless of how they arise, mixed configurations add complexity to the data center. Different systems have different data services, which make it difficult to move data from one to another without updating automation. Different user interfaces increase the need for training and can make errors more likely. Different approaches to hybrid cloud complicate modernization strategies. Also, many different systems mean more silos of capacity, which can lead to inefficiency. To simplify the data center and to improve flexibility and efficiency in deploying storage, enterprises of all types and sizes turn to IBM SAN Volume Controller, which is built with IBM Spectrum Virtualize software. This software simplifies infrastructure and eliminates differences in management, function, and even hybrid cloud support. IBM SAN Volume Controller introduces a common approach to storage management, function, replication, and hybrid cloud that is independent of storage type. It is the key to modernizing and revitalizing your storage, but is as easy to understand. IBM SAN Volume Controller provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Data-at-rest encryption Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including three-site replication for high availability (HA) This Redpaper applies to IBM Storage Virtualize V8.6.

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

2023-12-27 O'Reilly Amazon

book

Abhishek Mishra , Anjani Kumar , Sanjeev Kumar

data data-engineering storage-repositories data-warehouse AWS Azure

Design and architect new generation cloud-based data warehouses using Azure and AWS. This book provides an in-depth understanding of how to build modern cloud-native data warehouses, as well as their history and evolution. The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices in the modern world of distributed data processing. The authors walk you through the essential concepts of Data Mesh, Data Lake, Lakehouse, and Delta Lake. And they demonstrate the services and offerings available on Azure and AWS that deal with data orchestration, data democratization, data governance, data security, and business intelligence. After completing this book, you will be ready to design and architect enterprise-grade, cloud-based modern data warehouses using industry best practices and guidelines. What You Will Learn Understand the core concepts underlying modern data warehouses Design and build cloud-native data warehousesGain a practical approach to architecting and building data warehouses on Azure and AWS Implement modern data warehousing components such as Data Mesh, Data Lake, Delta Lake, and Lakehouse Process data through pandas and evaluate your model’s performance using metrics such as F1-score, precision, and recall Apply deep learning to supervised, semi-supervised, and unsupervised anomaly detection tasks for tabular datasets and time series applications Who This Book Is For Experienced developers, cloud architects, and technology enthusiasts looking to build cloud-based modern data warehouses using Azure and AWS

Data Exploration and Preparation with BigQuery

2023-11-29 O'Reilly Amazon

book

Mike Kahn

data data-engineering google-bigquery Big Data BigQuery Cloud Computing

In "Data Exploration and Preparation with BigQuery," Michael Kahn provides a hands-on guide to understanding and utilizing Google's powerful data warehouse solution, BigQuery. This comprehensive book equips you with the skills needed to clean, transform, and analyze large datasets for actionable business insights. What this Book will help me do Master the process of exploring and assessing the quality of datasets. Learn SQL for performing efficient and advanced data transformations in BigQuery. Optimize the performance of BigQuery queries for speed and cost-effectiveness. Discover best practices for setting up and managing BigQuery resources. Apply real-world case studies to analyze data and derive meaningful insights. Author(s) Michael Kahn is an experienced data engineer and author specializing in big data solutions and technologies. With years of hands-on experience working with Google Cloud Platform and BigQuery, he has assisted organizations in optimizing their data pipelines for effective decision-making. His accessible writing style ensures complex topics become approachable, enabling readers of various skill levels to succeed. Who is it for? This book is tailored for data analysts, data engineers, and data scientists who want to learn how to effectively use BigQuery for data exploration and preparation. Whether you're new to BigQuery or looking to deepen your expertise in working with large datasets, this book provides clear guidance and practical examples to achieve your goals.

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

2023-11-29 O'Reilly Amazon

book

Elad Eldor

data data-engineering streaming-messaging Kafka Cloud Computing DataOps

This book provides Kafka administrators, site reliability engineers, and DataOps and DevOps practitioners with a list of real production issues that can occur in Kafka clusters and how to solve them. The production issues covered are assembled into a comprehensive troubleshooting guide for those engineers who are responsible for the stability and performance of Kafka clusters in production, whether those clusters are deployed in the cloud or on-premises. This book teaches you how to detect and troubleshoot the issues, and eventually how to prevent them. Kafka stability is hard to achieve, especially in high throughput environments, and the purpose of this book is not only to make troubleshooting easier, but also to prevent production issues from occurring in the first place. The guidance in this book is drawn from the author's years of experience in helping clients and internal customers diagnose and resolve knotty production problems and stabilize their Kafka environments. The book is organized into recipe-style troubleshooting checklists that field engineers can easily follow when under pressure to fix an unstable cluster. This is the book you will want by your side when the stakes are high, and your job is on the line. What You Will Learn Monitor and resolve production issues in your Kafka clusters Provision Kafka clusters with the lowest costs and still handle the required loads Perform root cause analyses of issues affecting your Kafka clusters Know the ways in which your Kafka cluster can affect its consumers and producers Prevent or minimize data loss and delays in data streaming Forestall production issues through an understanding of common failure points Create checklists for troubleshooting your Kafka clusters when problems occur Who This Book Is For Site reliability engineers tasked with maintaining stability of Kafka clusters, Kafka administrators who troubleshoot production issues around Kafka, DevOps and DataOps experts who are involved with provisioning Kafka (whether on-premises or in the cloud), developers of Kafka consumers and producers who wish to learn more about Kafka

Cracking the Data Engineering Interview

2023-11-07 O'Reilly Amazon

book

Kedeisha Bryan , Taamir Ransome

data data-engineering CI/CD Cloud Computing Data Engineering Data Modelling

"Cracking the Data Engineering Interview" is your essential guide to mastering the data engineering interview process. This book offers practical insights and techniques to build your resume, refine your skills in Python, SQL, data modeling, and ETL, and confidently tackle over 100 mock interview questions. Gain the knowledge and confidence to land your dream role in data engineering. What this Book will help me do Craft a compelling data engineering portfolio to stand out to employers. Refresh and deepen understanding of essential topics like Python, SQL, and ETL. Master over 100 interview questions that cover both technical and behavioral aspects. Understand data engineering concepts such as data modeling, security, and CI/CD. Develop negotiation, networking, and personal branding skills crucial for job applications. Author(s) None Bryan and None Ransome are seasoned authors with a wealth of experience in data engineering and professional development. Drawing from their extensive industry backgrounds, they provide actionable strategies for aspiring data engineers. Their approachable writing style and real-world insights make complex topics accessible to readers. Who is it for? This book is ideal for aspiring data engineers looking to navigate the job application process effectively. Readers should be familiar with data engineering fundamentals, including Python, SQL, cloud data platforms, and ETL processes. It's tailored for professionals aiming to enhance their portfolios, tackle challenging interviews, and boost their chances of landing a data engineering role.

IBM TS7700 R5 DS8000 Object Store User's Guide

2023-11-07 O'Reilly Amazon

book

Dave Brettell , Lourie Goodall , Rin Fujiwara

data data-engineering IBM Cloud Computing

The IBM® TS7700 features a functional enhancement that allows for the TS7700 to act as an object store for transparent cloud tiering with IBM DS8000®, DFSMShsm (HSM), and native DFSMSdss (DSS). This function can be used to move data sets directly from DS8000 to TS7700. This IBM Redpaper publication provides a functional overview of the features, provides client value information, and walks through DFSMS, DS8000, and TS7700 set up steps.

Designing a Modern Application Data Stack

2023-10-25 O'Reilly Amazon

book

Adam Morton , Brad Culberson , Kevin McGinley

data data-engineering Snowflake Analytics Cloud Computing

Today's massive datasets represent an unprecedented opportunity for organizations to build data-intensive applications. With this report, product leads, architects, and others who deal with applications and application development will explore why a cloud data platform is a great fit for data-intensive applications. You'll learn how to carefully consider scalability, data processing, and application distribution when making data app design decisions. Cloud data platforms are the modern infrastructure choice for data applications, as they offer improved scalability, elasticity, and cost efficiency. With a better understanding of data-intensive application architectures on cloud-based data platforms and the best practices outlined in this report, application teams can take full advantage of advances in data processing and app distribution to accelerate development, deployment, and adoption cycles. With this insightful report, you will: Learn why a modern cloud data platform is essential for building data-intensive applications Explore how scalability, data processing, and distribution models are key for today's data apps Implement best practices to improve application scalability and simplify data processing for efficiency gains Modernize application distribution plans to meet the needs of app providers and consumers About the authors: Adam Morton works with Intelligen Group, a Snowflake pure-play data and analytics consultancy. Kevin McGinley is technical director of the Snowflake customer acceleration team. Brad Culberson is a data platform architect specializing in data applications at Snowflake.

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.5.3

2023-10-10 O'Reilly Amazon

book

James Whitaker , Bill Scales , Barry Whyte

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller Cloud Computing

IBM® Storage Virtualize based storage systems are secure storage platforms that implement various security-related features, in terms of system-level access controls and data-level security features. This document outlines the available security features and options of IBM Storage Virtualize based storage systems. It is not intended as a "how to" or best practice document. Instead, it is a checklist of features that can be reviewed by a user security team to aid in the definition of a policy to be followed when implementing IBM FlashSystem®, IBM SAN Volume Controller, and IBM Storage Virtualize for Public Cloud. IBM Storage Virtualize features the following levels of security to protect against threats and to keep the attack surface as small as possible: The first line of defense is to offer strict verification features that stop unauthorized users from using login interfaces and gaining access to the system and its configuration. The second line of defense is to offer least privilege features that restrict the environment and limit any effect if a malicious actor does access the system configuration. The third line of defense is to run in a minimal, locked down, mode to prevent damage spreading to the kernel and rest of the operating system. The fourth line of defense is to protect the data at rest that is stored on the system from theft, loss, or corruption (malicious or accidental). The topics that are discussed in this paper can be broadly split into two categories: System security: This type of security encompasses the first three lines of defense that prevent unauthorized access to the system, protect the logical configuration of the storage system, and restrict what actions users can perform. It also ensures visibility and reporting of system level events that can be used by a Security Information and Event Management (SIEM) solution, such as IBM QRadar®. Data security: This type of security encompasses the fourth line of defense. It protects the data that is stored on the system against theft, loss, or attack. These data security features include Encryption of Data At Rest (EDAR) or IBM Safeguarded Copy (SGC). This document is correct as of IBM Storage Virtualize 8.5.3.

Amazon Redshift: The Definitive Guide

2023-10-03 O'Reilly Amazon

book

Rajesh Francis , Rajiv Gupta , Milind Oke

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse. Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift. By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you: Build a cloud data strategy around Amazon Redshift as foundational data warehouse Get started with Amazon Redshift with simple-to-use data models and design best practices Understand how and when to use Redshift Serverless and Redshift provisioned clusters Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing Learn best practices for security, monitoring, resilience, and disaster recovery Leverage Amazon Redshift integration with other AWS services to unlock additional value

Learning and Operating Presto

2023-09-21 O'Reilly Amazon

book

Tim Meehan , Ying Su , Vivek Bharathan , Angelica Lo Duca

data data-engineering Hadoop Presto BI Cloud Computing

The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside. Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production. With this book, you will: Learn how to install and configure Presto Use Presto with business intelligence tools Understand how to connect Presto to a variety of data sources Extend Presto for real-time business insight Learn how to apply best practices and tuning Get troubleshooting tips for logs, error messages, and more Explore Presto's architectural concepts and usage patterns Understand Presto security and administration

IBM Storage as a Service Offering Guide

2023-08-31 O'Reilly Amazon

book

Vasfi Gucer , Hartmut Lonzer

data data-engineering IBM Cloud Computing Cloud Storage

IBM® Storage as a Service (STaaS) extends your hybrid cloud experience with a new flexible consumption model enabled for both your on-premises and hybrid cloud infrastructure needs, giving you the agility, cash flow efficiency, and services of cloud storage with the flexibility to dynamically scale up or down and only pay for what you use beyond the minimal capacity. This IBM Redpaper provides a detailed introduction to the IBM STaaS service. The paper is targeted for data center managers and storage administrators.

IBM Power E1050: Technical Overview and Introduction

2023-08-30 O'Reilly Amazon

book

Guido Somers , Scott Vetter , Tsvetomir Spasov , Marc Gregorutti , Michael Malicdem , Stephen Lutz , Giuliano Anselmi

data data-engineering IBM Cloud Computing Linux Marketing

This IBM® Redpaper publication is a comprehensive guide that covers the IBM Power E1050 server (9043-MRX) that uses the latest IBM Power10 processor-based technology and supports IBM AIX® and Linux operating systems (OSs). The goal of this paper is to provide a hardware architecture analysis and highlight the changes, new technologies, and major features that are being introduced in this system, such as: The latest IBM Power10 processor design, including the dual-chip module (DCM) packaging, which is available in various configurations from 12 - 24 cores per socket. Support of up to 16 TB of memory. Native Peripheral Component Interconnect Express (PCIe) 5th generation (Gen5) connectivity from the processor socket to deliver higher performance and bandwidth for connected adapters. Open Memory Interface (OMI) connected Differential Dual Inline Memory Module (DDIMM) memory cards delivering increased performance, resiliency, and security over industry-standard memory technologies, including transparent memory encryption. Enhanced internal storage performance with the use of native PCIe-connected Non-volatile Memory Express (NVMe) devices in up to 10 internal storage slots to deliver up to 64 TB of high-performance, low-latency storage in a single 4-socket system. Consumption-based pricing in the Power Private Cloud with Shared Utility Capacity commercial model to allow customers to consume resources more flexibly and efficiently, including AIX, Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server, and Red Hat OpenShift Container Platform workloads. This publication is for professionals who want to acquire a better understanding of IBM Power products. The intended audience includes: IBM Power customers Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the set of IBM Power documentation by providing a desktop reference that offers a detailed technical description of the Power E1050 Midrange server model. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions..

IBM Power E1080 Technical Overview and Introduction

2023-08-30 O'Reilly Amazon

book

Ivaylo Bozhinov , Scott Vetter , Dinil Das , Giuliano Anselmi , Manish Arora , Madison Lee , Bartlomiej Grabowski , Armin Röll , Turgut Genc

data data-engineering IBM Cloud Computing Linux Marketing

This IBM® Redpaper® publication provides a broad understanding of a new architecture of the IBM Power® E1080 (also known as the Power E1080) server that supports IBM AIX®, IBM i, and selected distributions of Linux operating systems. The objective of this paper is to introduce the Power E1080, the most powerful and scalable server of the IBM Power portfolio, and its offerings and relevant functions: Designed to support up to four system nodes and up to 240 IBM Power10™ processor cores The Power E1080 can be initially ordered with a single system node or two system nodes configuration, which provides up to 60 Power10 processor cores with a single node configuration or up to 120 Power10 processor cores with a two system nodes configuration. More support for a three or four system nodes configuration is to be added on December 10, 2021, which provides support for up to 240 Power10 processor cores with a full combined four system nodes server. Designed to supports up to 64 TB memory The Power E1080 can be initially ordered with the total memory RAM capacity up to 8 TB. More support is to be added on December 10, 2021 to support up to 64 TB in a full combined four system nodes server. Designed to support up to 32 Peripheral Component Interconnect® (PCIe) Gen 5 slots in a full combined four system nodes server and up to 192 PCIe Gen 3 slots with expansion I/O drawers The Power E1080 supports initially a maximum of two system nodes; therefore, up to 16 PCIe Gen 5 slots, and up to 96 PCIe Gen 3 slots with expansion I/O drawer. More support is to be added on December 10, 2021, to support up to 192 PCIe Gen 3 slots with expansion I/O drawers. Up to over 4,000 directly attached serial-attached SCSI (SAS) disks or solid-state drives (SSDs) Up to 1,000 virtual machines (VMs) with logical partitions (LPARs) per system System control unit, providing redundant system master Flexible Service Processor (FSP) Supports IBM Power System Private Cloud Solution with Dynamic Capacity This publication is for professionals who want to acquire a better understanding of Power servers. The intended audience includes the following roles: Customers Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

Serverless Machine Learning with Amazon Redshift ML

2023-08-30 O'Reilly Amazon

book

Debu Panda , Bhanu Pittampally , Sumeet Joshi , Phil Bates

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Serverless Machine Learning with Amazon Redshift ML provides a hands-on guide to using Amazon Redshift Serverless and Redshift ML for building and deploying machine learning models. Through SQL-focused examples and practical walkthroughs, you will learn efficient techniques for cloud data analytics and serverless machine learning. What this Book will help me do Grasp the workflow of building machine learning models with Redshift ML using SQL. Learn to handle supervised learning tasks like classification and regression. Apply unsupervised learning techniques, such as K-means clustering, in Redshift ML. Develop time-series forecasting models within Amazon Redshift. Understand how to operationalize machine learning in serverless cloud architecture. Author(s) Debu Panda, Phil Bates, Bhanu Pittampally, and Sumeet Joshi are seasoned professionals in cloud computing and machine learning technologies. They combine deep technical knowledge with teaching expertise to guide learners through mastering Amazon Redshift ML. Their collaborative approach ensures that the content is accessible, engaging, and practically applicable. Who is it for? This book is perfect for data scientists, machine learning engineers, and database administrators using or intending to use Amazon Redshift. It's tailored for professionals with basic knowledge of machine learning and SQL who aim to enhance their efficiency and specialize in serverless machine learning within cloud architectures.

High-Performance Data Architectures

2023-08-25 O'Reilly Amazon

book

Joe McKendrick , Ed Huang

data data-engineering Agile/Scrum Analytics Cloud Computing Data Management

By choosing the right database, you can maximize your business potential, improve performance, increase efficiency, and gain a competitive edge. This insightful report examines the benefits of using a simplified data architecture containing cloud-based HTAP (hybrid transactional and analytical processing) database capabilities. You'll learn how this data architecture can help data engineers and data decision makers focus on what matters most: growing your business. Authors Joe McKendrick and Ed Huang explain how cloud native infrastructure supports enterprise businesses and operations with a much more agile foundation. Just one layer up from the infrastructure, cloud-based databases are a crucial part of data management and analytics. Learn how distributed SQL databases containing HTAP capabilities provide more efficient and streamlined data processing to improve cost efficiency and expedite business operations and decision making. This report helps you: Explore industry trends in database development Learn the benefits of a simplified data architecture Comb through the complex and crowded database choices on the market Examine the process of selecting the right database for your business Learn the latest innovations database for improving your company's efficiency and performance

Introduction to Integration Suite Capabilities: Learn SAP API Management, Open Connectors, Integration Advisor and Trading Partner Management

2023-08-16 O'Reilly Amazon

book

Jaspreet Bagga

data data-engineering SAP API Cloud Computing Cyber Security

Discover the power of SAP Integration Suite's capabilities with this hands-on guide. Learn how this integration platform (iPaaS) can help you connect and automate your business processes with integrations, connectors, APIs, and best practices for a faster ROI. Over the course of this book, you will explore the powerful capabilities of SAP Integration Suite, including API Management, Open Connectors, Integration Advisor, Trading Partner Management, Migration Assessment, and Integration Assessment. With detailed explanations and real-world examples, this book is the perfect resource for anyone looking to unlock the full potential of SAP Integration Suite. With each chapter, you'll gain a greater understanding of why SAP Integration Suite can be the proverbial swiss army knife in your toolkit to design and develop enterprise integration scenarios, offering simplified integration, security, and governance for your applications. Author Jaspreet Bagga demonstrates howto create, publish, and monitor APIs with SAP API Management, and how to use its features to enhance your API lifecycle. He also provides a detailed walkthrough of how other capabilities of SAP Integration Suite can streamline your connectivity, design, development, and architecture methodology with a tool-based approach completely managed by SAP. Whether you are a developer, an architect, or a business user, this book will help you unlock the potential of SAP's Integration Suite platform, API Management, and accelerate your digital transformation. What You Will Learn Understand what APIs are, what they are used for, and why they are crucial for building effective and reliable applications Gain an understanding of SAP Integration Suite's features and benefits Study SAP Integration assessment process, patterns, and much more Explore tools and capabilities other than the Cloud Integration that address the full value chain of the enterprise integration components Who This Book Is For Web developers and application leads who want to learn SAP API Management.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

IBM Storage FlashSystem 9500 Product Guide for IBM Storage Virtualize 8.6

Engineering Data Mesh in Azure Cloud

The Definitive Guide to Data Integration

Azure Data Factory by Example: Practical Implementation for Data Engineers

Azure Data Factory Cookbook - Second Edition

IBM TS7700 Release 5.3 Guide

IBM and CMTG Cyber Resiliency: Building an Automated, VMware Aware Safeguarded Copy Solution to Provide Data Resilience

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.6

IBM Storage Fusion Multicloud Object Gateway

IBM SAN Volume Controller Model SV3 Product Guide (for IBM Storage Virtualize V8.6)

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

Data Exploration and Preparation with BigQuery

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

Cracking the Data Engineering Interview

IBM TS7700 R5 DS8000 Object Store User's Guide

Designing a Modern Application Data Stack

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.5.3

Amazon Redshift: The Definitive Guide

Learning and Operating Presto

IBM Storage as a Service Offering Guide

IBM Power E1050: Technical Overview and Introduction

IBM Power E1080 Technical Overview and Introduction

Serverless Machine Learning with Amazon Redshift ML

High-Performance Data Architectures

Introduction to Integration Suite Capabilities: Learn SAP API Management, Open Connectors, Integration Advisor and Trading Partner Management