O'Reilly Data Engineering Books

Offloading storage volumes from Safeguarded Copy to AWS S3 Object Storage with IBM FlashSystem Transparent Cloud Tiering

2022-11-22 O'Reilly Amazon

book

Shashank Shingornikaris , Manoj Kateja , Christopher Vollmar

data data-engineering storage-repositories cloud-storage AWS Cloud Computing

The focus of this IBM® Blueprint is to showcase a method to store volumes that are created by using Safeguarded Copy off-premise to Amazon S3 object storage that uses the IBM FlashSystem Transparent cloud tiering (TCT) feature. TCT enables volume data to be copied and transferred to object storage. The TCT feature supports creating connections to cloud service providers to store copies of volume data in private or public clouds. This feature is useful for organizations of all sizes when planning for disaster recovery operations or storing a copy of data as extra backup. TCT provides seamless integration between the storage system and public or private clouds for Safeguarded Copy volumes and non-Safeguarded Copy volumes.

IBM Elastic Storage System Introduction Guide

2022-11-21 O'Reilly Amazon

book

Stieg Klein , Chris Maestas

data data-engineering IBM Big Data Cloud Computing ELK

This IBM® Redpaper Redbookspublication provides an overview of the IBM Elastic Storage® Server (IBM ESS) and IBM Elastic Storage System (also IBM ESS). These scalable, high-performance data and file management solution, are built on IBM Spectrum® Scale technology. Providing reliability, performance, and scalability, IBM ESS can be implemented for a range of diverse requirements. The latest IBM ESS 3500 is the most innovative system that provides investment protection to expand or build a new Global Data Platform and use current storage. The system allows enhanced, non-disruptive upgrades to grow from flash to hybrid or from hard disk drives (HDDs) to hybrid. IBM ESS can scale up or out with two different storage mediums in the environment, and it is ready for technologies like 200 Gb Ethernet or InfiniBand NDR-200 connectivity. This publication helps you to understand the solution and its architecture. It describes ordering the best solution for your environment, planning the installation and integration of the solution into your environment, and correctly maintaining your solution. The solution is created from the following combination of physical and logical components: Hardware Operating system Storage Network Applications Knowledge of the IBM Elastic Storage Server and IBM Elastic Storage System components is key for planning an environment. This paper is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT specialists) who are responsible for delivering cost-effective cloud services and big data solutions. The content of this paper can help you to uncover insights among client's data so that you can take appropriate actions to optimize business results, product development, and scientific discoveries.

SQL Server 2022 Revealed: A Hybrid Data Platform Powered by Security, Performance, and Availability

2022-11-02 O'Reilly Amazon

book

Bob Ward

data data-engineering relational-databases microsoft-sql-server Analytics Azure

Know how to use the new capabilities and cloud integrations in SQL Server 2022. This book covers the many innovative integrations with the Azure Cloud that make SQL Server 2022 the most cloud-connected edition ever. The book covers cutting-edge features such as the blockchain-based Ledger for creating a tamper-evident record of changes to data over time that you can rely on to be correct and reliable. You'll learn about built-in Query Intelligence capabilities to help you to upgrade with confidence that your applications will perform at least as fast after the upgrade than before. In fact, you'll probably see an increase in performance from the upgrade, with no code changes needed. Also covered are innovations such as contained availability groups and data virtualization with S3 object storage. New cloud integrations covered in this book include Microsoft Azure Purview and the use of Azure SQL for high availability and disaster recovery. The bookcovers Azure Synapse Link with its built-in capabilities to take changes and put them into Synapse automatically. Anyone building their career around SQL Server will want this book for the valuable information it provides on building SQL skills from edge to the cloud. What You Will Learn Know how to use all of the new capabilities and cloud integrations in SQL Server 2022 Connect to Azure for disaster recovery, near real-time analytics, and security Leverage the Ledger to create a tamper-evident record of data changes over time Upgrade from prior releases and achieve faster and more consistent performance with no code changes Access data and storage in different and new formats, such as Parquet and S3, without moving the data and using your existing T-SQL skills Explore new application scenarios using innovations with T-SQL in areassuch as JSON and time series Who This Book Is For SQL Server professionals who want to upgrade their skills to the latest edition of SQL Server; those wishing to take advantage of new integrations with Microsoft Azure Purview (governance), Azure Synapse (analytics), and Azure SQL (HA and DR); and those in need of the increased performance and security offered by Query Intelligence and the new Ledger

Architecting Solutions with SAP Business Technology Platform

2022-10-28 O'Reilly Amazon

book

Serdar Simsekler , Eric Du

data data-engineering SAP Cloud Computing Cyber Security

Gain a comprehensive understanding of SAP Business Technology Platform (SAP BTP) and its role in the intelligent enterprise. This book provides you with the knowledge and skills to design and implement effective architectural solutions. You'll explore integration strategies, extensibility options, and data processing methods to innovate and enhance your organization's SAP ecosystem. What this Book will help me do Architect enterprise solutions with SAP BTP to address key integration challenges. Leverage SAP BTP tools for process automation and effective solution extensibility. Understand non-functional requirements such as operability and security. Drive innovation by integrating SAP's intelligent technologies into your designs. Utilize SAP BTP to derive actionable insights from business data for value generation. Author(s) Serdar Simsekler and None Du are experienced professionals in the field of SAP architecture and technology. They bring years of expertise in building enterprise solutions leveraging the latest SAP innovations. Their approachable writing style aims to connect technical concepts with practical enterprise applications, ensuring readers can directly apply the knowledge gained. Who is it for? This book is intended for technical architects, solution architects, and enterprise architects who are working with or intending to adopt SAP Business Technology Platform. It is ideal for those seeking to enhance their understanding of SAP's solution ecosystem and deliver innovative systems. A foundational knowledge of IT systems and basic cloud concepts is assumed, as is familiarity with the SAP framework.

Azure Data Engineering Cookbook - Second Edition

2022-09-26 O'Reilly Amazon

book

Nagaraj Venkatesan , Ahmad Osama , Luca Zanna

data data-engineering Analytics Azure ADF BI

Azure Data Engineering Cookbook is your ultimate guide to mastering data engineering on Microsoft's Azure platform. Through an engaging collection of recipes, this book breaks down procedures to build sophisticated data pipelines, leveraging tools like Azure Data Factory, Data Lake, Databricks, and Synapse Analytics. What this Book will help me do Efficiently process large datasets using Azure Synapse analytics and Azure Databricks pipelines. Transform and shape data within systems by leveraging Azure Synapse data flows. Implement and manage relational databases in Azure with performance tuning and administration. Configure data pipeline solutions integrated with Power BI for insightful reporting. Monitor, optimize, and ensure lineage tracking for your data systems efficiently with Purview and Log analytics. Author(s) Nagaraj Venkatesan is an experienced cloud architect specializing in Microsoft Azure, with years of hands-on data engineering expertise. Ahmad Osama is a seasoned data professional and author's shared emphasis is on practical learning and bridging this with actionable skills effectively. Who is it for? This book is essential for data engineers seeking expertise in Azure's rich engineering capabilities. It's tailored for professionals with a foundational knowledge of cloud services, looking to achieve advanced proficiency in Azure data engineering pipelines.

Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

2022-09-19 O'Reilly Amazon

book

Josephine Bush

data data-engineering relational-databases microsoft-sql-server AWS Amazon RDS

Know how to track changes and key events in your SQL Server databases in support of application troubleshooting, regulatory compliance, and governance. This book shows how to use key features in SQL Server ,such as SQL Server Audit and Extended Events, to track schema changes, permission changes, and changes to your data. You’ll even learn how to track queries run against specific tables in a database. Not all changes and events can be captured and tracked using SQL Server Audit and Extended Events, and the book goes beyond those features to also show what can be captured using common criteria compliance, change data capture, temporal tables, or querying the SQL Server log. You will learn how to audit just what you need to audit, and how to audit pretty much anything that happens on a SQL Server instance. This book will also help you set up cloud auditing with an emphasis on Azure SQL Database, Azure SQL Managed Instance, and AWS RDS SQL Server. You don’t need expensive, third-party auditing tools to make auditing work for you, and to demonstrate and provide value back to your business. This book will help you set up an auditing solution that works for you and your needs. It shows how to collect the audit data that you need, centralize that data for easy reporting, and generate audit reports using built-in SQL Server functionality for use by your own team, developers, and organization’s auditors. What You Will Learn Understand why auditing is important for troubleshooting, compliance, and governance Track changes and key events using SQL Server Audit and Extended Events Track SQL Server configuration changes for governance and troubleshooting Utilize change data capture and temporal tables to track data changes in SQL Server tables Centralize auditing data from all yourdatabases for easy querying and reporting Configure auditing on Azure SQL, Azure SQL Managed Instance, and AWS RDS SQL Server Who This Book Is For Database administrators who need to know what’s changing on their database servers, and those who are making the changes; database-savvy DevOps engineers and developers who are charged with troubleshooting processes and applications; developers and administrators who are responsible for generating reports in support of regulatory compliance reporting and auditing

SAP HANA Cloud in a Nutshell: Design, Develop, and Deploy Data Models using SAP HANA Cloud

2022-09-15 O'Reilly Amazon

book

Miguel Figueiredo

data data-engineering relational-databases sap-hana Cloud Computing Data Modelling

This book introduces SAP HANA Cloud and helps you develop an understanding of its key features, including technology, architecture, and data modeling. SAP HANA Cloud in a Nutshell will help you develop the skills needed to use the core features of the completely managed and in-memory cloud-based data foundation available in the SAP Business Technology Platform. The book covers modern modeling concepts and equips you with practical knowledge to unleash the best use of SAP HANA Cloud. As you progress, you will learn how to provision your own SAP HANA Cloud instance, understand how to work with different roles, and work with data modeling for analytical and transactional use cases. Additionally, you will learn how to pilot SAP BTP Cockpit and work with entitlements, quotas, account structure, spaces, instances, and cloud providers. You will learn how to perform administration tasks such as stop and start an SAP HANA Cloud instance and make it available for use. To fully leverage the knowledge this book offers, you will find practical step-by-step instructions for how to establish a cloud account model and create your first SAP HANA Cloud artifacts. The book is an important prerequisite for those who want to take full advantage of SAP HANA Cloud. What You Will Learn Master the concepts and terminology of SAP Business Technology Platform (BTP) and SAP HANA Cloud Understand the key roles of an SAP HANA Cloud implementation Become familiar with the key tools used by administrators, architects, and application developers Upgrade an SAP HANA Cloud database Understand how to work with SAP HANA Cloud modeling supporting analytical and transactional use cases Who This Book Is For SAP consultants, cloud engineers, and architects; application consultants and developers; and project stakeholders

Mastering MongoDB 6.x - Third Edition

2022-08-30 O'Reilly Amazon

book

Alex Giamas

data data-engineering nosql-databases MongoDB Cloud Computing Data Modelling

Mastering MongoDB 6.x is your complete guide to understanding MongoDB at depth and fully leveraging its capabilities. Learn to design, develop, and administer MongoDB databases that are high-performing, scalable, and secure. From schema modeling to using MongoDB Atlas tools, this book ensures you are well-equipped to build robust applications backed by MongoDB. What this Book will help me do Understand and apply advanced data modeling techniques for MongoDB to optimize data access. Utilize advanced querying capabilities, including aggregation, indexing, and transactions. Implement scalable and distributed systems using MongoDB features like replication and sharding. Administer MongoDB databases securely and efficiently using monitoring and backup tools. Master cloud-based solutions with MongoDB Atlas tools such as Serverless, Atlas Search, and Compass. Author(s) Alex Giamas, the author of Mastering MongoDB 6.x, is a seasoned expert in database systems and software engineering. With a deep knowledge of MongoDB gained through years of practical experience, Alex has contributed to numerous projects that utilize MongoDB to power large-scale applications. Passionate about sharing knowledge, Alex creates thorough, accessible guides to empower developers and administrators alike. Who is it for? This book is perfect for MongoDB developers and database administrators seeking to deepen their skills. If you're involved in designing, deploying, or managing greenfield or existing projects using MongoDB, this book is invaluable. Basic familiarity with MongoDB, shell commands, and database design concepts is recommended to fully benefit from the insights provided.

Serverless ETL and Analytics with AWS Glue

2022-08-30 O'Reilly Amazon

book

Albert Quiroga , Subramanya Vajiraya , Vishal Pathak , Noritaka Sekiyama , Ishan Gaur , Tomohiro Tanaka

data data-engineering etl AI/ML Analytics AWS

Discover how to harness AWS Glue for your ETL and data analysis workflows with "Serverless ETL and Analytics with AWS Glue." This comprehensive guide introduces readers to the capabilities of AWS Glue, from building data lakes to performing advanced ETL tasks, allowing you to create efficient, secure, and scalable data pipelines with serverless technology. What this Book will help me do Understand and utilize various AWS Glue features for data lake and ETL pipeline creation. Leverage AWS Glue Studio and DataBrew for intuitive data preparation workflows. Implement effective storage optimization techniques for enhanced data analytics. Apply robust data security measures, including encryption and access control, to protect data. Integrate AWS Glue with machine learning tools like SageMaker to build intelligent models. Author(s) The authors of this book include experts across the fields of data engineering and AWS technologies. With backgrounds in data analytics, software development, and cloud architecture, they bring a depth of practical experience. Their approach combines hands-on tutorials with conceptual clarity, ensuring a blend of foundational knowledge and actionable insights. Who is it for? This book is designed for ETL developers, data engineers, and data analysts who are familiar with data management concepts and want to extend their skills into serverless cloud solutions. If you're looking to master AWS Glue for building scalable and efficient ETL pipelines or are transitioning existing systems to the cloud, this book is ideal for you.

Building the Snowflake Data Cloud: Monetizing and Democratizing Your Data

2022-08-26 O'Reilly Amazon

book

Andrew Carruthers

data data-engineering Snowflake Cloud Computing DWH Cyber Security

Implement the Snowflake Data Cloud using best practices and reap the benefits of scalability and low-cost from the industry-leading, cloud-based, data warehousing platform. This book provides a detailed how-to explanation, and assumes familiarity with Snowflake core concepts and principles. It is a project-oriented book with a hands-on approach to designing, developing, and implementing your Data Cloud with security at the center. As you work through the examples, you will develop the skill, knowledge, and expertise to expand your capability by incorporating additional Snowflake features, tools, and techniques. Your Snowflake Data Cloud will be fit for purpose, extensible, and at the forefront of both Direct Share, Data Exchange, and Snowflake Marketplace. Building the Snowflake Data Cloud helps you transform your organization into monetizing the value locked up within your data. As the digital economy takes hold, with data volume, velocity, and variety growing at exponential rates, you need tools and techniques to quickly categorize, collate, summarize, and aggregate data. You also need the means to seamlessly distribute to release value. This book shows how Snowflake provides all these things and how to use them to your advantage. The book helps you succeed by delivering faster than you can deliver with legacy products and techniques. You will learn how to leverage what you already know, and what you don’t, all applied in a Snowflake Data Cloud context. After reading this book, you will discover and embrace the future where the Data Cloud is central. You will be able to position your organization to take advantage by identifying, adopting, and preparing your tooling for the coming wave of opportunity around sharing and monetizing valuable, corporate data. What You Will Learn Understand why Data Cloud is important tothe success of your organization Up-skill and adopt Snowflake, leveraging the benefits of cloud platforms Articulate the Snowflake Marketplace and identify opportunities to monetize data Identify tools and techniques to accelerate integration with Data Cloud Manage data consumption by monitoring and controlling access to datasets Develop data load and transform capabilities for use in future projects Who This Book Is For Solution architects seeking implementation patterns to integrate with a Data Cloud; data warehouse developers looking for tips, tools, and techniques to rapidly deliver data pipelines; sales managers who want to monetize their datasets and understand the opportunities that Data Cloud presents; and anyone who wishes to unlock value contained within their data silos

Snowflake: The Definitive Guide

2022-08-11 O'Reilly Amazon

book

Joyce Kay Avila

data data-engineering Snowflake Analytics Cloud Computing Data Analytics

Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you. You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily. You'll be able to: Efficiently capture, store, and process large amounts of data at an amazing speed Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace

Building a Red Hat OpenShift Environment on IBM Z

2022-08-10 O'Reilly Amazon

book

Alexandre de Oliveira , Manoy Srinivasan , Rakesh Krishnakumar , Anna Shugol , Wilhelm Mild , Elton de Souza , Lydia Parziale

data data-engineering IBM Agile/Scrum Cloud Computing Cyber Security

Cybersecurity is the most important arm of defense against cyberattacks. With the recent increase in cyberattacks, corporations must focus on how they are combating these new high-tech threats. When establishing best practices, a corporation must focus on employees' access to specific workspaces and information. IBM Z® focuses on allowing high processing virtual environments while maintaining a high level of security in each workspace. Organizations not only need to adjust their approach to security, but also their approach to IT environments. To meet new customer needs and expectations, organizations must take a more agile approach to their business. IBM® Z allows companies to work with hybrid and multi-cloud environments that allows more ease of use for the user and efficiency overall. Working with IBM Z, organizations can also work with many databases that are included in IBM Cloud Pak® for Data. IBM Cloud Pak for Data allows organizations to make more informed decisions with improved data usage. Along with the improved data usage, organizations can see the effects from working in a Red Hat OpenShift environment. Red Hat OpenShift is compatible across many hardware services and allows the user to run applications in the most efficient manner. The purpose of this IBM Redbooks® publication is to: Introduce IBM Z and LinuxONE platforms and how they work with the Red Hat OpenShift environment and IBMCloud Pak for Data Provide examples and the uses of IBM Z with Cloud Paks for Data that show data gravity, consistent development experience, and consolidation and business resiliency The target audience for this book is IBM Z Technical Specialists, IT Architects, and System Administrators.

Pro Database Migration to Azure: Data Modernization for the Enterprise

2022-08-08 O'Reilly Amazon

book

Dustin Dorsey , Matt Gordon , Denis McDowell , Kevin Kline

data data-engineering data-migration Azure Cloud Computing Microsoft

Migrate your existing, on-premises applications into the Microsoft Azure cloud platform. This book covers the best practices to plan, implement, and operationalize the migration of a database application from your organization’s data center to Microsoft’s Azure cloud platform. Data modernization and migration is a technologically complex endeavor that can also be taxing from a leadership and operational standpoint. This book covers not only the technology, but also the most important aspects of organization culture, communication, and politics that so frequently derail such projects. You will learn the most important steps to ensuring a successful migration and see battle-tested wisdom from industry veterans. From executive sponsorship, to executing the migration, to the important steps following migration, you will learn how to effectively conduct future migrations and ensure that your team and your database application delivers on the expected business value of the project. This book is unlike any other currently in the market. It takes you through the most critical business and technical considerations and workflows for moving your data and databases into the cloud, with special attention paid to those who are deploying to the Microsoft Data Platform in Azure, especially SQL Server. Although this book focuses on migrating on-premises SQL Server enterprises to hybrid or fully cloud-based Azure SQL Database and Azure SQL Managed Instances, it also cover topics involving migrating non-SQL Server database platforms such as Oracle, MySQL, and PostgreSQL applications to Microsoft Azure. What You Will Learn Plan a database migration that ensures smooth project progress, optimal performance, low operating cost, and minimal downtime Properly analyze and manage non-technical considerations, such as legal compliance, privacy, and team execution Perform athorough architectural analysis to select the best Azure services, performance tiers, and cost-containment features Avoid pitfalls and common reasons for failure relating to corporate culture, intra-office politics, and poor communications Secure the proper executive champions who can execute the business planning needed for success Apply proven criteria to determine your future-state architecture and your migration method Execute your migration using a process proven by the authors over years of successful projects Who This Book Is For IT leadership, strategic IT decision makers, project owners and managers, and enterprise and application architects. For anyone looking toward cloud migration projects as the next stage of growth in their careers. Also useful for enterprise DBAs and consultants who might be involved in such projects. Readers should have experience and be competent in designing, coding, implementing, and supporting database applications in an on-premises environment.

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

2022-08-04 O'Reilly Amazon

book

Lokesh Bhatt , Scott Vetter , Sabine Jordan , Wasif Mohammad , Turgut Genc

data data-engineering IBM ibm-power-systems Cloud Computing Linux

This IBM® Redbooks® publication is a guide to IBM Power Systems Private Cloud with Shared Utility Capacity featuring Power Enterprise Pools (PEP) 2.0. This technology enables multiple servers in an to share base processor and memory resources and draw on pre-paid credits when the base is exceeded. Previously, the Shared Utility Capacity feature supported IBM Power E950 (9040-MR9) and IBM Power E980 (9080-M9S). The feature was extended in August 2020 to include the scale-out IBM Power servers that were announced on 14 July 2020, and it received dedicated processor support later in the year. The IBM Power S922 (9009-22G), and IBM Power S924 (9009-42G) servers, which use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems (OSs), are now supported. The previous scale-out models of Power S922 (9009-22A), and Power S924 (9009-42A) servers cannot be added to an enterprise pool. With the availability of the IBM Power E1080 (9080-HEX) in September 2021, support for this system as part of a Shared Utility Pool has become available. The goal of this book is to provide an overview of the solution's environment and guidance for planning a deployment of it. The book also covers how to configure IBM Power Systems Private Cloud with Shared Utility Capacity. There are also chapters about migrating from PEP 1.0 to PEP 2.0 and various use cases. This publication is for professionals who want to acquire a better understanding of IBM Power Systems Private Cloud, and Shared Utility Capacity. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners This book expands the set of IBM Power documentation by providing a desktop reference that offers a detailed technical description of IBM Power Systems Private Cloud with Shared Utility Capacity.

IBM FlashSystem 5200 Product Guide

2022-07-22 O'Reilly Amazon

book

Vasfi Gucer , Jon Herd , Corne Lottering , Aldo Araujo Fonseca , Sandro De Santis , Leandro Torolho

data data-engineering IBM Cloud Computing Marketing SAS

This IBM® Redbooks® Product Guide publication describes the IBM FlashSystem® 5200 solution, which is a next-generation IBM FlashSystem control enclosure. It is an NVMe end-to-end platform that is targeted at the entry and midrange market and delivers the full capabilities of IBM FlashCore® technology. It also provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum® Virtualize, including the following features: Data reduction and deduplication Dynamic tiering Thin provisioning Snapshots Cloning Replication Data copy services Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for high availability (HA) Scale-out and scale-up configurations further enhance capacity and throughput for better availability. The IBM FlashSystem 5200 is a high-performance storage solution that is based on a revolutionary 1U form factor. It consists of 12 NVMe Flash Devices in a 1U storage enclosure drawer with full redundant canister components and no single point of failure. It is designed for businesses of all sizes, including small, remote, branch offices and regional clients. It is a smarter, self-optimizing solution that requires less management, which enables organizations to overcome their storage challenges. Flash has come of age and price point reductions mean that lower parts of the storage market are seeing the value of moving over to flash and NVMe--based solutions. The IBM FlashSystem 5200 advances this transition by providing incredibly dense tiers of flash in a more affordable package. With the benefit of IBM FlashCore Module compression and new QLC flash-based technology becoming available, a compelling argument exists to move away from Nearline SAS storage and on to NVMe. With the release of IBM FlashSystem 5200 Software V8.4, extra functions and features are available, including support for new Distributed RAID1 (DRAID1) features, GUI enhancements, Redirect-on-write for Data Reduction Pool (DRP) snapshots, and 3-site replication capabilities. This book is aimed at pre-sales and post-sales technical support and marketing and storage administrators.

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

2022-07-13 O'Reilly Amazon

book

Ron L'Esteve

data data-engineering storage-repositories data-lake AI/ML Analytics

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs. After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform. What You Will Learn Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform Benefit from the new Delta Lake open-source storage layer for data lakehouses Take advantage of schema evolution, change feeds, live tables, and more Writefunctional PySpark code for data lakehouse ELT jobs Optimize Apache Spark performance through partitioning, indexing, and other tuning options Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake Who This Book Is For Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform.

IBM TS7700 Release 5.2.2 Guide

2022-07-07 O'Reilly Amazon

book

Aderson Pacini , Yuki Asakura , Ole Asmussen , Nao Takemura , Lourie Goodall , Alberto Barajas Ortiz , Nielson ’Nino’ de Carvalho , Monica Falcone , Chen Zhu , Larry Coyne , Erich Moraga , Taisei Takai , Tomoaki Ogino , Michael Scott , Kousei Kawamura , Derek Erdmann , Trinidad Armando Rangel Ruiz , Nobuhiko Furuya , Joe Hew , Rin Fujiwara , Joe Swingler , Stefan Neff , Tony Makepeace , Takahiro Tsuda

data data-engineering IBM Cloud Computing Cloud Storage S3

This IBM® Redbooks® publication covers IBM TS7700 R5.2. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on 25 years of experience, the R5.2 release includes many features that enable improved performance, usability, and security. Highlights include IBM TS7700 Advanced Object Store, an all flash TS7770, grid resiliency enhancements, and Logical WORM retention. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud® Object Storage, Amazon S3, and RSTOR. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.2. The R5.2 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000® Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.2 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. Note: The latest Release 5.2 was split into two phases: R5.2 Phase 1 (also referred to as and ) R5.2 Phase 2 ( and R) TS7700 provides tape virtualization for the IBM z environment. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. Tape virtualization can help satisfy the following requirements in a data processing environment. New and existing capabilities of the TS7700 5.2.2 release includes the following highlights: Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication of virtual tape and TCT objects Grid access to all logical volume and object data that is independent of where it exists An all-flash TS7770 option for improved performance Full Advanced Object Store Grid Cloud support of DS8000 Transparent Cloud Tier Full AES256 encryption for data that is in-flight and at-rest Tight integration with IBM Z® and DFSMS policy management DS8000 Object Store AES256 in-flight encryption and compression Regulatory compliance through Logical WORM and LWORM Retention support Cloud Storage Tier support for archive, logical volume version, and disaster recovery Optional integration with physical tape 16 Gb IBM FICON® throughput that exceeds 5 GBps per TS7700 cluster Grid Resiliency Support with Control Unit Initiated Reconfiguration (CUIR) support IBM Z hosts view up to 3,968 common devices per TS7700 grid TS7770 Cache On-demand feature that is based capacity licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1160, IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM POWER9™ technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

Fundamentals of Data Engineering

2022-06-22 O'Reilly Amazon

book

Matt Housley , Joe Reis

data data-engineering Cloud Computing Data Engineering Data Governance Marketing

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle

IBM SAN Volume Controller Model SV3 Product Guide

2022-06-13 O'Reilly Amazon

book

Carsten Larsten , Shu Mookerjee , Konrad Trojok , Vasfi Gucer , Jon Herd , Hartmut Lonzer , Douwe van Terwisga , Kendall Williams , Corne Lottering

data data-engineering IBM Cloud Computing

This IBM® Redpaper Product Guide describes the IBM SAN Volume Controller model SV3 solution, which is a next-generation IBM SAN Volume Controller. Built with IBM Spectrum® Virtualize software and part of the IBM Spectrum Storage family, IBM SAN Volume Controller is an enterprise-class storage system. It helps organizations achieve better data economics by supporting the large-scale workloads that are critical to success. Data centers often contain a mix of storage systems. This situation can arise as a result of company mergers or as a deliberate acquisition strategy. Regardless of how they arise, mixed configurations add complexity to the data center. Different systems have different data services, which make it difficult to move data from one to another without updating automation. Different user interfaces increase the need for training and can make errors more likely. Different approaches to hybrid cloud complicate modernization strategies. Also, many different systems mean more silos of capacity, which can lead to inefficiency. To simplify the data center and to improve flexibility and efficiency in deploying storage, enterprises of all types and sizes turn to IBM SAN Volume Controller, which is built with IBM Spectrum Virtualize software. This software simplifies infrastructure and eliminates differences in management, function, and even hybrid cloud support. IBM SAN Volume Controller introduces a common approach to storage management, function, replication, and hybrid cloud that is independent of storage type. It is the key to modernizing and revitalizing your storage, but is as easy to understand. IBM SAN Volume Controller provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Data-at-rest encryption Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including three-site replication for high availability (HA)

IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment

2022-05-27 O'Reilly Amazon

book

Dino Quintero , Prashant Pandey , Diego Riesco , Nilabja Haldar , Edson Gomes Pereira , Vera Cruz , Antony Steel , Douglas Roach , Thomas Baumann , Youssef Largou

data data-engineering IBM ibm-power-systems Cloud Computing Data Management

This IBM® Redpaper publication delivers an updated guide for high availability and disaster recovery (HADR) planning in a multicloud environment for IBM Power. This publication describes the ideas from studies that were performed in a virtual collaborative team of IBM Business Partners, technical focal points, and product managers who used hands-on experience to implement case studies to show HADR management aspects to develop this technical update guide for a hybrid multicloud environment. The goal of this book is to deliver a HADR guide for backup and data management on-premises and in a multicloud environment. This document updates HADR on-premises and in the cloud with IBM PowerHA® SystemMirror®, IBM VM Recovery Manager (VMRM), and other solutions that are available on IBM Power for IBM AIX®, IBM i, and Linux. This publication highlights the available offerings at the time of writing for each operating system (OS) that is supported in IBM Power, including best practices. This book addresses topics for IT architects, IT specialists, sellers, and anyone looking to implement and manage HADR on-premises and in the cloud. Moreover, this publication provides documentation to transfer how-to skills to the technical teams and solution guidance to the sales team. This book complements the documentation that is available at IBM Documentation and aligns with the educational materials that are provided by IBM Systems Technical Training.

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

2022-05-19 O'Reilly Amazon

book

Dhiraj Kumar , Jessica Tischbierek , Johannes Rank , Elena Wolz , André Bögelsack , Utpal Chakraborty

data data-engineering SAP AWS Azure Cloud Computing

This book helps SAP architects and SAP Basis administrators deploy and operate SAP S/4HANA systems on the most common public cloud platforms. Market-leading cloud offerings are covered, including Amazon Web Services, Microsoft Azure, and Google Cloud. You will gain an end-to-end understanding of the initial implementation of SAP S/4HANA systems on those platforms. You will learn how to move away from the big monolithic SAP ERP systems and arrive at an environment with a central SAP S/4HANA system as the digital core surrounded by cloud-native services. The book begins by introducing the core concepts of Hyperscaler cloud platforms that are relevant to SAP. You will learn about the architecture of SAP S/4HANA systems on public cloud platforms, with specific content provided for each of the major platforms. The book simplifies the deployment of SAP S/4HANA systems in public clouds by providing step-by-step instructions and helping you deal with thecomplexity of such a deployment. Content in the book is based on best practices, industry lessons learned, and architectural blueprints, helping you develop deep insights into the operations of SAP S/4HANA systems on public cloud platforms. Reading this book enables you to build and operate your own SAP S/4HANA system in the public cloud with a minimum of effort. What You Will Learn Choose the right Hyperscaler platform for your future SAP S/4HANA workloads Start deploying your first SAP S/4HANA system in the public cloud Avoid typical pitfalls during your implementation Apply and leverage cloud-native services for your SAP S/4HANA system Save costs by choosing the right architecture and build a robust architecture for your most critical SAP systems Meet your business’ criteria for availability and performance by having the right sizing in place Identify further use cases whenoperating SAP S/4HANA in the public cloud Who This Book Is For SAP architects looking for an answer on how to move SAP S/4HANA systems from on-premises into the cloud; those planning to deploy to one of the three major platforms from Amazon Web Services, Microsoft Azure, and Google Cloud Platform; and SAP Basis administrators seeking a detailed and realistic description of how to get started on a migration to the cloud and how to drive that cloud implementation to completion

Observability Engineering

2022-05-06 O'Reilly Amazon

book

Charity Majors , Liz Fong-Jones , George Miranda

it-operations monitoring observability Analytics Cloud Computing

Observability is critical for building, changing, and understanding the software that powers complex modern systems. Teams that adopt observability are much better equipped to ship code swiftly and confidently, identify outliers and aberrant behaviors, and understand the experience of each and every user. This practical book explains the value of observable systems and shows you how to practice observability-driven development. Authors Charity Majors, Liz Fong-Jones, and George Miranda from Honeycomb explain what constitutes good observability, show you how to improve upon what you're doing today, and provide practical dos and don'ts for migrating from legacy tooling, such as metrics, monitoring, and log management. You'll also learn the impact observability has on organizational culture (and vice versa). You'll explore: How the concept of observability applies to managing software at scale The value of practicing observability when delivering complex cloud native applications and systems The impact observability has across the entire software development lifecycle How and why different functional teams use observability with service-level objectives How to instrument your code to help future engineers understand the code you wrote today How to produce quality code for context-aware system debugging and maintenance How data-rich analytics can help you debug elusive issues

Advanced SQL with SAS

2022-05-01 O'Reilly Amazon

book

Christian FG Schendera

data data-engineering SQL Cloud Computing Data Quality SAS

This book introduces advanced techniques for using PROC SQL in SAS. If you are a SAS programmer, analyst, or student who has mastered the basics of working with SQL, Advanced SQL with SAS® will help take your skills to the next level. Filled with practical examples with detailed explanations, this book demonstrates how to improve performance and speed for large data sets. Although the book addresses advanced topics, it is designed to progress from the simple and manageable to the complex and sophisticated. In addition to numerous tuning techniques, this book also touches on implicit and explicit pass-throughs, presents alternative SAS grid- and cloud-based processing environments, and compares SAS programming languages and approaches including FedSQL, CAS, DS2, and hash programming. Other topics include: Missing values and data quality with audit trails “Blind spots” like how missing values can affect even the simplest calculations and table joins SAS macro language and SAS macro programs SAS functions Integrity constraints SAS Dictionaries SAS Compute Server

IBM z16 Technical Introduction

2022-04-28 O'Reilly Amazon

book

Gerard Laumay , Roman Vogt , Ewerson Palacio , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Bill White , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga , Martijn Raave , Andre Spahni , Bo XU , Makus Ertl , Slav Martinksi

data data-engineering IBM Analytics Cloud Computing Cyber Security

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform that is built with the IBM Telum processor: the IBM z16 server. The IBM Z platform is recognized for its security, resiliency, performance, and scale. It is relied on for mission-critical workloads and as an essential element of hybrid cloud infrastructures. The IBM z16 server adds capabilities and value with innovative technologies that are needed to accelerate the digital transformation journey. This book explains how the IBM z16 server uses innovations and traditional IBM Z strengths to satisfy the growing demand for cloud, analytics, and a more flexible infrastructure. With the IBM z16 servers as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

CockroachDB: The Definitive Guide

2022-04-11 O'Reilly Amazon

book

Jesse Seldess , Ben Darnell , Guy Harrison

data data-engineering relational-databases cockroachdb Cloud Computing Data Modelling

Get the lowdown on CockroachDB, the distributed SQL database built to handle the demands of today's data-driven cloud applications. In this hands-on guide, software developers, architects, and DevOps/SRE teams will learn how to use CockroachDB to create applications that scale elastically and provide seamless delivery for end users while remaining indestructible. Teams will also learn how to migrate existing applications to CockroachDB's performant, cloud native data architecture. If you're familiar with distributed systems, you'll quickly discover the benefits of strong data correctness and consistency guarantees as well as optimizations for delivering ultra low latencies to globally distributed end users. You'll learn how to: Design and build applications for distributed infrastructure, including data modeling and schema design Migrate data into CockroachDB Read and write data and run ACID transactions across distributed infrastructure Plan a CockroachDB deployment for resiliency across single region and multi-region clusters Secure, monitor, and optimize your CockroachDB deployment

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Offloading storage volumes from Safeguarded Copy to AWS S3 Object Storage with IBM FlashSystem Transparent Cloud Tiering

IBM Elastic Storage System Introduction Guide

SQL Server 2022 Revealed: A Hybrid Data Platform Powered by Security, Performance, and Availability

Architecting Solutions with SAP Business Technology Platform

Azure Data Engineering Cookbook - Second Edition

Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

SAP HANA Cloud in a Nutshell: Design, Develop, and Deploy Data Models using SAP HANA Cloud

Mastering MongoDB 6.x - Third Edition

Serverless ETL and Analytics with AWS Glue

Building the Snowflake Data Cloud: Monetizing and Democratizing Your Data

Snowflake: The Definitive Guide

Building a Red Hat OpenShift Environment on IBM Z

Pro Database Migration to Azure: Data Modernization for the Enterprise

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

IBM FlashSystem 5200 Product Guide

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

IBM TS7700 Release 5.2.2 Guide

Fundamentals of Data Engineering

IBM SAN Volume Controller Model SV3 Product Guide

IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

Observability Engineering

Advanced SQL with SAS

IBM z16 Technical Introduction

CockroachDB: The Definitive Guide