O'Reilly Data Engineering Books

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale

2021-08-27 O'Reilly Amazon

book

John Sing , Prashanth Shetty , Wei Gong , Linda Cham

data data-engineering Hadoop cloudera Analytics CDP

This IBM® Redpaper publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum® Scale and Cloudera Data Platform (CDP) Private Cloud Base for performing in-place Cloudera Hadoop or Cloudera Spark-based analytics. It also covers the benefits of the integrated solution and gives guidance about the types of deployment models and considerations during the implementation of these models. August 2021 update added CES protocol support in Hadoop environment

Cloud Native Integration with Apache Camel: Building Agile and Scalable Integrations for Kubernetes Platforms

2021-08-25 O'Reilly Amazon

book

Guilherme Camposo

data data-engineering streaming-messaging camel Agile/Scrum API

Address the most common integration challenges, by understanding the ins and outs of the choices and exemplifying the solutions with practical examples on how to create cloud native applications using Apache Camel. Camel will be our main tool, but we will also see some complementary tools and plugins that can make our development and testing easier, such as Quarkus, and tools for more specific use cases, such as Apache Kafka and Keycloak. You will learn to connect with databases, create REST APIs, transform data, connect with message oriented software (MOMs), secure your services, and test using Camel. You will also learn software architecture patterns for integration and how to leverage container platforms, such as Kubernetes. This book is suitable for those who are eager to learn an integration tool that fits the Kubernetes world, and who want to explore the integration challenges that can be solved using containers. What You Will Learn Focus on how to solve integration challenges Understand the basics of the Quarkus as it’s the foundation for the application Acquire a comprehensive view on Apache Camel Deploy an application in Kubernetes Follow good practices Who This Book Is For Java developers looking to learn Apache Camel; Apache Camel developers looking to learn more about Kubernetes deployments; software architects looking to study integration patterns for Kubernetes based systems; system administrators (operations teams) looking to get a better understand of how technologies are integrated.

Developing Modern Applications with a Converged Database

2021-08-25 O'Reilly Amazon

book

Alice LaPlante

data data-engineering relational-databases Analytics API Blockchain

Single-purpose databases were designed to address specific problems and use cases. Given this narrow focus, there are inherent tradeoffs required when trying to accommodate multiple datatypes or workloads in your enterprise environment. The result is data fragmentation that spills over into application development, IT operations, data security, system scalability, and availability. In this report, author Alice LaPlante explains why developing modern, data-driven applications may be easier and more synergistic when using a converged database. Senior developers, architects, and technical decision-makers will learn cloud-native application development techniques for working with both structured and unstructured data. You'll discover ways to run transactional and analytical workloads on a single, unified data platform. This report covers: Benefits and challenges of using a converged database to develop data-driven applications How to use one platform to work with both structured and unstructured data that includes JSON, XML, text and files, spatial and graph, Blockchain, IoT, time series, and relational data Modern development practices on a converged database, including API-driven development, containers, microservices, and event streaming Use case examples including online food delivery, real-time fraud detection, and marketing based on real-time analytics and geospatial targeting

Data Engineering on Azure

2021-08-17 O'Reilly Amazon

book

Vlad Riscutia

data data-engineering AI/ML Analytics Azure Big Data

Build a data platform to the industry-leading standards set by Microsoft’s own infrastructure. In Data Engineering on Azure you will learn how to: Pick the right Azure services for different data scenarios Manage data inventory Implement production quality data modeling, analytics, and machine learning workloads Handle data governance Using DevOps to increase reliability Ingesting, storing, and distributing data Apply best practices for compliance and access control Data Engineering on Azure reveals the data management patterns and techniques that support Microsoft’s own massive data infrastructure. Author Vlad Riscutia, a data engineer at Microsoft, teaches you to bring an engineering rigor to your data platform and ensure that your data prototypes function just as well under the pressures of production. You'll implement common data modeling patterns, stand up cloud-native data platforms on Azure, and get to grips with DevOps for both analytics and machine learning. About the Technology Build secure, stable data platforms that can scale to loads of any size. When a project moves from the lab into production, you need confidence that it can stand up to real-world challenges. This book teaches you to design and implement cloud-based data infrastructure that you can easily monitor, scale, and modify. About the Book In Data Engineering on Azure you’ll learn the skills you need to build and maintain big data platforms in massive enterprises. This invaluable guide includes clear, practical guidance for setting up infrastructure, orchestration, workloads, and governance. As you go, you’ll set up efficient machine learning pipelines, and then master time-saving automation and DevOps solutions. The Azure-based examples are easy to reproduce on other cloud platforms. What's Inside Data inventory and data governance Assure data quality, compliance, and distribution Build automated pipelines to increase reliability Ingest, store, and distribute data Production-quality data modeling, analytics, and machine learning About the Reader For data engineers familiar with cloud computing and DevOps. About the Author Vlad Riscutia is a software architect at Microsoft. Quotes A definitive and complete guide on data engineering, with clear and easy-to-reproduce examples. - Kelum Prabath Senanayake, Echoworx An all-in-one Azure book, covering all a solutions architect or engineer needs to think about. - Albert Nogués, Danone A meaningful journey through the Azure ecosystem. You’ll be building pipelines and joining components quickly! - Todd Cook, Appen A gateway into the world of Azure for machine learning and DevOps engineers. - Krzysztof Kamyczek, Luxoft

Developing Modern Database Applications with PostgreSQL

2021-08-13 O'Reilly Amazon

book

Quan Ha Le , Marcelo Diaz

data data-engineering relational-databases postgresql API Cloud Computing

In "Developing Modern Database Applications with PostgreSQL", you will master the art of building database applications with the highly available and scalable PostgreSQL. Walk through a series of real-world projects that fully explore both the developmental and administrative aspects of PostgreSQL, all tied together through the example of a banking application. What this Book will help me do Set up high-availability PostgreSQL clusters using modern best practices. Monitor and tune database performance to handle enterprise-level workloads seamlessly. Automate testing and implement test-driven development strategies for robust applications. Leverage PostgreSQL along with DevOps pipelines to deploy applications on cloud platforms. Develop APIs and geospatial databases using popular tools like PostgREST and PostGIS. Author(s) The authors of this book, None Le and None Diaz, are experienced professionals in database technologies and software development. With a passion for PostgreSQL and its applications in modern computing, they bring a wealth of expertise and a practical approach to this book. Their methods focus on real-world applicability, ensuring that readers gain hands-on skills and practical knowledge. Who is it for? This book is perfect for database developers, administrators, and architects who want to advance their expertise in PostgreSQL. It is also suitable for software engineers and IT professionals aiming to tackle end-to-end database development projects. A basic knowledge of PostgreSQL and Linux will help you dive into the hands-on projects easily. If you're looking to take your PostgreSQL skills to the next level, this book is for you.

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

2021-08-06 O'Reilly Amazon

book

Ron C. L'Esteve

it-operations cloud-computing cloud-platforms microsoft-azure azure-analytics Analytics

Build efficient and scalable batch and real-time data ingestion pipelines, DevOps continuous integration and deployment pipelines, and advanced analytics solutions on the Azure Data Platform. This book teaches you to design and implement robust data engineering solutions using Data Factory, Databricks, Synapse Analytics, Snowflake, Azure SQL database, Stream Analytics, Cosmos database, and Data Lake Storage Gen2. You will learn how to engineer your use of these Azure Data Platform components for optimal performance and scalability. You will also learn to design self-service capabilities to maintain and drive the pipelines and your workloads. The approach in this book is to guide you through a hands-on, scenario-based learning process that will empower you to promote digital innovation best practices while you work through your organization’s projects, challenges, and needs. The clear examples enable you to use this book as a reference and guide for building data engineering solutions in Azure. After reading this book, you will have a far stronger skill set and confidence level in getting hands on with the Azure Data Platform. What You Will Learn Build dynamic, parameterized ELT data ingestion orchestration pipelines in Azure Data Factory Create data ingestion pipelines that integrate control tables for self-service ELT Implement a reusable logging framework that can be applied to multiple pipelines Integrate Azure Data Factory pipelines with a variety of Azure data sources and tools Transform data with Mapping Data Flows in Azure Data Factory Apply Azure DevOps continuous integration and deployment practices to your Azure Data Factory pipelines and development SQL databases Design and implement real-time streaming and advanced analytics solutions using Databricks, Stream Analytics, and Synapse Analytics Get started with a variety of Azure data services through hands-on examples Who This Book Is For Data engineers and data architects who are interested in learning architectural and engineering best practices around ELT and ETL on the Azure Data Platform, those who are creating complex Azure data engineering projects and are searching for patterns of success, and aspiring cloud and data professionals involved in data engineering, data governance, continuous integration and deployment of DevOps practices, and advanced analytics who want a full understanding of the many different tools and technologies that Azure Data Platform provides

Data Modeling for Azure Data Services

2021-07-30 O'Reilly Amazon

book

Peter ter Braake

data data-engineering data-models Azure ADF BI

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

SQL Server on Kubernetes: Designing and Building a Modern Data Platform

2021-07-30 O'Reilly Amazon

book

Anthony E. Nocentino , Ben Weissman

data data-engineering relational-databases microsoft-sql-server API Azure

Build a modern data platform by deploying SQL Server in Kubernetes. Modern application deployment needs to be fast and consistent to keep up with business objectives and Kubernetes is quickly becoming the standard for deploying container-based applications, fast. This book introduces Kubernetes and its core concepts. Then it shows you how to build and interact with a Kubernetes cluster. Next, it goes deep into deploying and operationalizing SQL Server in Kubernetes, both on premises and in cloud environments such as the Azure Cloud. You will begin with container-based application fundamentals and then go into an architectural overview of a Kubernetes container and how it manages application state. Then you will learn the hands-on skill of building a production-ready cluster. With your cluster up and running, you will learn how to interact with your cluster and perform common administrative tasks. Once you can admin the cluster, you will learn how to deploy applications and SQL Server in Kubernetes. You will learn about high-availability options, and about using Azure Arc-enabled Data Services. By the end of this book, you will know how to set up a Kubernetes cluster, manage a cluster, deploy applications and databases, and keep everything up and running. What You Will Learn Understand Kubernetes architecture and cluster components Deploy your applications into Kubernetes clusters Manage your containers programmatically through API objects and controllers Deploy and operationalize SQL Server in Kubernetes Implement high-availability SQL Server scenarios on Kubernetes using Azure Arc-enabled Data Services Make use of Kubernetes deployments for Big Data Clusters Who This Book Is For DBAs and IT architects who are ready to begin planning their next-generation data platform and want to understand what it takes to run SQL Server in a container in Kubernetes. SQL Server on Kubernetes is an excellent choice for those who want to understand the big picture of why Kubernetes is the next-generation deployment method for SQL Server but also want to understand the internals, or the how, of deploying SQL Server in Kubernetes. When finished with this book, you will have the vision and skills to successfully architect, build and maintain a modern data platform deploying SQL Server on Kubernetes.

Amazon Redshift Cookbook

2021-07-23 O'Reilly Amazon

book

Shruti Worlikar , Thiyagarajan Arumugam , Harshida Patel

data data-engineering relational-databases amazon-redshift Analytics Cloud Computing

Dive into the world of Amazon Redshift with this comprehensive cookbook, packed with practical recipes to build, optimize, and manage modern data warehousing solutions. From understanding Redshift's architecture to implementing advanced data warehousing techniques, this book provides actionable guidance to harness the power of Amazon Redshift effectively. What this Book will help me do Master the architecture and core concepts of Amazon Redshift to architect scalable data warehouses. Optimize data pipelines and automate ETL processes for seamless data ingestion and management. Leverage advanced features like concurrency scaling and Redshift Spectrum for enhanced analytics. Apply best practices for security and cost optimization in Redshift projects. Gain expertise in scaling data warehouse solutions to accommodate large-scale analytics needs. Author(s) Shruti Worlikar, None Arumugam, and None Patel are seasoned experts in data warehousing and analytics with extensive experience using Amazon Redshift. Their backgrounds in implementing scalable data solutions make their insights practical and grounded. Through their collaborative writing, they aim to make complex topics approachable to learners of various skill levels. Who is it for? This book is tailored for professionals such as data warehouse developers, data engineers, and data analysts looking to master Amazon Redshift. It suits intermediate to advanced practitioners with a basic understanding of data warehousing and cloud technologies. Readers seeking to optimize Redshift for cost, performance, and security will find this guide invaluable.

IBM TS4500 R7 Tape Library Guide

2021-07-15 O'Reilly Amazon

book

Jesus Eduardo Cervantes Rolon , Larry Coyne , Robert Beiderbeck , Khanh Ngo , Erwin Zwemmer , Fabian Corona Villarreal , Jeremy Tudgay

data data-engineering IBM Cloud Computing ELK SAS

The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and better integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve investments in IBM tape library products. Now, you can achieve a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 11 petabytes (PB) of uncompressed data in a single frame library or scale up to 2 PB per square foot to over 350 PB. The TS4500 offers the following benefits: High availability: Dual active accessors with integrated service bays reduce inactive service space by 40%. The Elastic Capacity option can be used to eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to another 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for IBM TS1160 while also supporting TS1155, TS1150, and TS1140 tape drive: The TS1160 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1160 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The TS1160 Tape Drive Model 60E delivers a dual 10 Gb or 25 Gb Ethernet host attachment interface that is optimized for cloud-based and hyperscale environments. The TS1160 Tape Drive Model 60F delivers a native data rate of 400 MBps, the same load/ready, locate speeds, and access times as the TS1155, and includes dual-port 16 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 8 tape drive: The LTO Ultrium 8 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 7, while still protecting your investment in the previous technology. Support of LTO 8 Type M cartridge (m8): The LTO Program introduced a new capability with LTO-8 drives. The ability of the LTO-8 drive to write 9 TB on a brand new LTO-7 cartridge instead of 6 TB as specified by the LTO-7 format. Such a cartridge is called an LTO-7 initialized LTO-8 Type M cartridge. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), command-line interface (CLI), and REST over SCSI (RoS) to obtain status information about library components. October 2020 - Added support for the 3592 model 60S tape drive that provides a dual-port 12 Gb SAS (Serial Attached SCSI) interface for host attachment.

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

2021-07-13 O'Reilly Amazon

book

Gareth Coates , Stefan Stefanov , Scott Vetter , Bernhard Buehler , Sridhar Murthy , Ashwini Deo , Sabine Jordan , Turgut Genc , Muhammad Farrukh Mahmood

data data-engineering IBM ibm-power-systems Cloud Computing Linux

This IBM® Redbooks® publication is a guide to IBM Power Private Cloud with Shared Utility Capacity featuring Power Enterprise Pools 2.0 (also known as PEP 2.0). This technology allows multiple servers in an to share base processor and memory resources, and draw upon pre-paid credits when the base is exceeded. Previously, the Shared Utility feature supported IBM Power System E950 (9040-MR9) and IBM Power System E980 (9080-M9S). It was extended in August 2020 to include the Scale-out Power Systems announced on July 14th 2020 and received dedicated processor support later in the year. The IBM Power System S922 (9009-22G), and IBM Power System S924 (9009-42G) servers which use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems are now supported. The previous Scale-out models: IBM Power System S922 (9009-22A), and IBM Power System S924 (9009-42A) servers cannot be added to an Enterprise Pool. The goal of this book is to provide an overview of the environment and guidance for planning a deployment. The paper also covers how to configure PEP 2.0. There are also chapters on migrating from PEP 1.0 to PEP 2.0 and various use cases. This publication is for professionals who want to acquire a better understanding of IBM Power Private Cloud, and Shared Utility. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners This book expands the set of Power Systems documentation by providing a desktop reference which offers a detailed technical description of IBM Power Private Cloud, and Shared Utility.

Machine Learning for Oracle Database Professionals: Deploying Model-Driven Applications and Automation Pipelines

2021-06-11 O'Reilly Amazon

book

Kai Yu , Heli Helskyaho , Jean Yu

data data-engineering oracle-database-solutions AI/ML Cloud Computing DataViz

Database developers and administrators will use this book to learn how to deploy machine learning models in Oracle Database and in Oracle’s Autonomous Database cloud offering. The book covers the technologies that make up the Oracle Machine Learning (OML) platform, including OML4SQL, OML Notebooks, OML4R, and OML4Py. The book focuses on Oracle Machine Learning as part of the Oracle Autonomous Database collaborative environment. Also covered are advanced topics such as delivery and automation pipelines. Throughout the book you will find practical details and hand-on examples showing you how to implement machine learning and automate deployment of machine learning. Discussion around the examples helps you gain a conceptual understanding of machine learning. Important concepts discussed include the methods involved, the algorithms to choose from, and mechanisms for process and deployment. Seasoned database professionals looking to make the leap into machine learning as a growth path will find much to like in this book as it helps you step up and use your current knowledge of Oracle Database to transition into providing machine learning solutions. What You Will Learn Use the Oracle Machine Learning (OML) Notebooks for data visualization and machine learning model building and evaluation Understand Oracle offerings for machine learning Develop machine learning with Oracle database using the built-in machine learning packages Develop and deploy machine learning models using OML4SQL and OML4R Leverage the Oracle Autonomous Database and its collaborative environment for Oracle Machine Learning Develop and deploy machine learning projects in Oracle Autonomous Database Build an automated pipeline that can detect and handle changes in data/model performance Who This Book Is For Database developers and administrators who want to learn about machine learning, developers who want to build models and applications using Oracle Database’s built-in machine learning feature set, and administrators tasked with supporting applications on Oracle Database that make use of the Oracle Machine Learning feature set

Azure Data Factory by Example: Practical Implementation for Data Engineers

2021-06-09 O'Reilly Amazon

book

Richard Swinbank

data data-engineering relational-databases microsoft-sql-server Azure ADF

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. The hands-on introduction to ADF found in this book is equally well-suited to data engineers embracing their first ETL/ELT toolset as it is to seasoned veterans of Microsoft’s SQL Server Integration Services (SSIS). The example-driven approach leads you through ADF pipeline construction from the ground up, introducing important ideas and making learning natural and engaging. SSIS users will find concepts with familiar parallels, while ADF-first readers will quickly master those concepts through the book’s steady building up of knowledge in successive chapters. Summaries of key concepts at the end of each chapter provide a ready reference that you can return to again and again. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

SAP HANA on IBM Power Systems Backup and Recovery Solutions

2021-05-27 O'Reilly Amazon

book

Dino Quintero , Adriana Melges Quintanilha Weingart , Pia Nymann , Rosane Goldstein , Andrei Socoliuc

data data-engineering IBM Cloud Computing ERP Linux

This IBM® Redpaper Redbooks publication provides guidance about a backup and recovery solution for SAP High-performance Analytic Appliance (HANA) running on IBM Power Systems. This publication provides case studies and how-to procedures that show backup and recovery scenarios. This publication provides information about how to protect data in an SAP HANA environment by using IBM Spectrum® Protect and IBM Spectrum Copy Data Manager. This publication focuses on the data protection solution, which is described through several scenarios. The information in this publication is distributed on an as-is basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM Spectrum Protect are supported and entitled, and where the issues are specific to a blueprint implementation. The goal of the publication is to describe the best aspects and options for backup, snapshots, and restore of SAP HANA Multitenant Database Container (MDC) single and multi-tenant installations on IBM Power Systems by using theoretical knowledge, hands-on exercises, and documenting the findings through sample scenarios. This document provides resources about the following processes: Describing how to determine the best option, including SAP Landscape aspects to back up, snapshot, and restore of SAP HANA MDC single and multi-tenant installations based on IBM Spectrum Computing Suite, Red Hat Linux Relax and Recover (ReAR), and other products. Documenting key aspects, such as recovery time objective (RTO) and recovery point objective (RPO), backup impact (load, duration, scheduling), quantitative savings (for example, data deduplication), integration and catalog currency, and tips and tricks that are not covered in the product documentation. Using IBM Cloud® Object Storage and documenting how to use IBM Spectrum Protect to back up to the cloud. SAP HANA 2.0 SPS 05 has this feature that is built in natively. IBM Spectrum Protect for Enterprise Resource Planning (ERP) has this feature too. Documenting Linux ReaR to cover operating system (OS) backup because ReAR is used by most backup products, such as IBM Spectrum Protect and Symantec Endpoint Protection (SEP) to back up OSs. This publication targets technical readers including IT specialists, systems architects, brand specialists, sales teams, and anyone looking for a guide about how to implement the best options for SAP HANA backup and recovery on IBM Power Systems. Moreover, this publication provides documentation to transfer the how-to-skills to the technical teams and solution guidance to the sales team. This publication complements the documentation that is available at IBM Knowledge Center, and it aligns with the educational materials that are provided by IBM Garage™ for Systems Technical Education and Training.

IBM PowerVC Version 2.0 Introduction and Configuration

2021-05-26 O'Reilly Amazon

book

Thierry Huché , Sachin P. Deshmukh , Scott Vetter , Stephen Lutz , Christopher Emefiene Osiegbu , Ahmed Mashhour , Borislav Ivanov Stoymirski

data data-engineering IBM Ansible Cloud Computing Linux

IBM® Power Virtualization Center (IBM® PowerVC™) is an advanced enterprise virtualization management offering for IBM Power Systems. This IBM Redbooks® publication introduces IBM PowerVC and helps you understand its functions, planning, installation, and setup. It also shows how IBM PowerVC can integrate with systems management tools such as Ansible or Terraform and that it also integrates well into a OpenShift container environment. IBM PowerVC Version 2.0.0 supports both large and small deployments, either by managing IBM PowerVM® that is controlled by the Hardware Management Console (HMC), or by IBM PowerVM NovaLink. With this capability, IBM PowerVC can manage IBM AIX®, IBM i, and Linux workloads that run on IBM POWER® hardware. IBM PowerVC is available as a Standard Edition, or as a Private Cloud Edition. IBM PowerVC includes the following features and benefits: Virtual image capture, import, export, deployment, and management Policy-based virtual machine (VM) placement to improve server usage Snapshots and cloning of VMs or volumes for backup or testing purposes Support of advanced storage capabilities such as IBM SVC vdisk mirroring of IBM Global Mirror Management of real-time optimization and VM resilience to increase productivity VM Mobility with placement policies to reduce the burden on IT staff in a simple-to-install and easy-to-use graphical user interface (GUI) Automated Simplified Remote Restart for improved availability of VMs ifor when a host is down Role-based security policies to ensure a secure environment for common tasks The ability to enable an administrator to enable Dynamic Resource Optimization on a schedule IBM PowerVC Private Cloud Edition includes all of the IBM PowerVC Standard Edition features and enhancements: A self-service portal that allows the provisioning of new VMs without direct system administrator intervention. There is an option for policy approvals for the requests that are received from the self-service portal. Pre-built deploy templates that are set up by the cloud administrator that simplify the deployment of VMs by the cloud user. Cloud management policies that simplify management of cloud deployments. Metering data that can be used for chargeback. This publication is for experienced users of IBM PowerVM and other virtualization solutions who want to understand and implement the next generation of enterprise virtualization management for Power Systems. Unless stated otherwise, the content of this publication refers to IBM PowerVC Version 2.0.0.

Architecting Data-Intensive SaaS Applications

2021-05-25 O'Reilly Amazon

book

Pui Kei Johnston Chu , Kevin McGinley , William Waddington , Gjorgji Georgievski , Dinesh Kulkarni

data data-engineering AI/ML Analytics Cloud Computing IoT

Through explosive growth in the past decade, data now drives significant portions of our lives, from crowdsourced restaurant recommendations to AI systems identifying effective medical treatments. Software developers have unprecedented opportunity to build data applications that generate value from massive datasets across use cases such as customer 360, application health and security analytics, the IoT, machine learning, and embedded analytics. With this report, product managers, architects, and engineering teams will learn how to make key technical decisions when building data-intensive applications, including how to implement extensible data pipelines and share data securely. The report includes design considerations for making these decisions and uses the Snowflake Data Cloud to illustrate best practices. This report explores: Why data applications matter: Get an introduction to data applications and some of the most common use cases Evaluating platforms for building data apps: Evaluate modern data platforms to confidently consider the merits of potential solutions Building scalable data applications: Learn design patterns and best practices for storage, compute, and security Handling and processing data: Explore techniques and real-world examples for building data pipelines to support data applications Designing for data sharing: Learn best practices for sharing data in modern data applications

Distributed Data Systems with Azure Databricks

2021-05-25 O'Reilly Amazon

book

Alan Bernardo Palacio

data data-engineering storage-repositories data-lake AI/ML Azure

In 'Distributed Data Systems with Azure Databricks', you will explore the capabilities of Microsoft Azure Databricks as a platform for building and managing big data pipelines. Learn how to process, transform, and analyze data at scale while developing expertise in training distributed machine learning models and integrating them into enterprise workflows. What this Book will help me do Design and implement Extract, Transform, Load (ETL) pipelines using Azure Databricks. Conduct distributed training of machine learning models using TensorFlow and Horovod. Integrate Azure Databricks with Azure Data Factory for optimized data pipeline orchestration. Utilize Delta Engine for efficient querying and analysis of data within Delta Lake. Employ Databricks Structured Streaming to manage real-time production-grade data flows. Author(s) None Palacio is an experienced data engineer and cloud computing specialist, with extensive knowledge of the Microsoft Azure platform. With years of practical application of Databricks in enterprise settings, Palacio provides clear, actionable insights through relatable examples. They bring a passion for innovative solutions to the field of big data automation. Who is it for? This book is ideal for data engineers, machine learning engineers, and software developers looking to master Azure Databricks for large-scale data processing and analysis. Readers should have basic familiarity with cloud platforms, understanding of data pipelines, and a foundational grasp of Python and machine learning concepts. It is perfect for those wanting to create scalable and manageable data workflows.

IBM Power System IC922 Technical Overview and Introduction

2021-05-20 O'Reilly Amazon

book

Scott Vetter , Stephen Lutz , YoungHoon Cho

data data-engineering IBM ibm-power-systems AI/ML Cloud Computing

This IBM® Redpaper publication is a comprehensive guide that covers the IBM Power System IC922 (9183-22X) server that uses IBM POWER9™ processor-based technology and supports Linux operating systems (OSs). The objective of this paper is to introduce the system offerings and their capacities and available features. The Power IC922 server is built to deliver powerful computing, scaling efficiency, and storage capacity in a cost-optimized design to meet the evolving data challenges of the artificial intelligence (AI) era. It includes the following features: High throughput and performance for high-value Linux workloads, such as inferencing data or storage-rich workloads, or cloud. Potentially low acquisition cost through system optimization, such as using industry standard memory and warranty. Two IBM POWER9 processor-based single-chip module (SCM) devices that provide high performance with 24, 32, or 40 fully activated cores and a maximum 2 TB of memory. Up to six NVIDIA T4 graphics processing unit (GPU) accelerators. Up to twenty-four 2.5-inch SAS/SATA drives. One dedicated and one shared 1 Gb Intelligent Platform Management Interface (IPMI) port.. This publication is for professionals who want to acquire a better understanding of IBM Power Systems products. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power IC922 server.

SAP S/4HANA Embedded Analytics: Experiences in the Field

2021-05-19 O'Reilly Amazon

book

Freek Keijzer

data data-engineering SAP Agile/Scrum Analytics BI

Imagine you are a business user, consultant, or developer about to enter an SAP S/4HANA implementation project. You are well-versed with SAP’s product portfolio and you know that the preferred reporting option in S/4HANA is embedded analytics. But what exactly is embedded analytics? And how can it be implemented? And who can do it: a business user, a functional consultant specialized in financial or logistics processes? Or does a business intelligence expert or a programmer need to be involved? Good questions! This book will answer these questions, one by one. It will also take you on the same journey that the implementation team needs to follow for every reporting requirement that pops up: start with assessing a more standard option and only move on to a less standard option if the requirement cannot be fulfilled. In consecutive chapters, analytical apps delivered by SAP, apps created using Smart Business Services, and Analytical Queries developed either using tiles or in adevelopment environment are explained in detail with practical examples. The book also explains which option is preferred in which situation. The book covers topics such as in-memory computing, cloud, UX, OData, agile development, and more.Author Freek Keijzer writes from the perspective of an implementation consultant, focusing on functionality that has proven itself useful in the field. Practical examples are abundant, ranging from “codeless” to “hardcore coding.” What You Will Learn Know the difference between static reporting and interactive querying on real-time data Understand which options are available for analytics in SAP S/4HANA Understand which option to choose in which situation Know how to implement these options Who This Book is For SAP power users, functional consultants, developers

Data Pipelines with Apache Airflow

2021-05-09 O'Reilly Amazon

book

Julian de Ruiter , Bas Harenslak

data data-engineering apache-airflow AI/ML Airflow Cloud Computing

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack. About the Technology Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task. About the Book Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs. What's Inside Build, test, and deploy Airflow pipelines as DAGs Automate moving and transforming data Analyze historical datasets using backfilling Develop custom components Set up Airflow in production environments About the Reader For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills. About the Authors Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer. Quotes An Airflow bible. Useful for all kinds of users, from novice to expert. - Rambabu Posa, Sai Aashika Consultancy An easy-to-follow exploration of the benefits of orchestrating your data pipeline jobs with Airflow. - Daniel Lamblin, Coupang The one reference you need to create, author, schedule, and monitor workflows with Apache Airflow. Clear recommendation. - Thorsten Weber, bbv Software Services AG By far the best resource for Airflow. - Jonathan Wood, LexisNexis

IBM TS7700 Release 5.1 Guide

2021-05-04 O'Reilly Amazon

book

Marcelo Lopes de Moraes , Aderson Pacini , Ole Asmussen , Nao Takemura , Lourie Goodall , Alberto Barajas Ortiz , Takeshi Nohta , Monica Falcone , Chen Zhu , Larry Coyne , Taisei Takai , Tomoaki Ogino , Michael Scott , Kousei Kawamura , Derek Erdmann , Nobuhiko Furuya , Joe Hew , Rin Fujiwara , Joe Swingler , Stefan Neff , Sosuke Matsui , Takahiro Tsuda

data data-engineering IBM Cloud Computing Cloud Storage S3

This IBM® Redbooks® publication covers IBM TS7700 R5.1. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on over 20 years of virtual tape experience, the TS7770 supports the ability to store virtual tape volumes in an object store. The TS7700 supported off loading to physical tape for over two decades. Off loading to physical tape behind a TS7700 is utilized by hundreds of organizations around the world. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud® Object Storage and Amazon S3. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.1. The R5.1 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000® Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.1 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. TS7700 provides tape virtualization for the IBM z environment. Tape virtualization can help satisfy the following requirements in a data processing environment: Improved reliability and resiliency Reduction in the time that is needed for the backup and restore process Reduction of services downtime that is caused by physical tape drive and library outages Reduction in cost, time, and complexity by moving primary workloads to virtual tape Increased efficient procedures for managing daily batch, backup, recall, and restore processing On-premises and off-premises object store cloud storage support as an alternative to physical tape for archive and disaster recovery New and existing capabilities of the TS7700 5.1 include the following highlights: Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication Full AES256 encryption for replication data that is in-flight and at-rest Tight integration with IBM Z and DFSMS policy management Optional target for DS8000 Transparent Cloud Tier using DFSMS DS8000 Object Store AES256 in-flight encryption and compression Optional Cloud Storage Tier support for archive and disaster recovery 16 Gb IBM FICON® throughput up to 5 GBps per TS7700 cluster IBM Z hosts view up to 3,968 common devices per TS7700 grid Grid access to all data independent of where it exists TS7770 Cache On-demand feature that is based capacity licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM POWER9™ technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM z15 Technical Introduction

2021-05-03 O'Reilly Amazon

book

Frank Packheiser , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Bill White , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga

data data-engineering IBM Agile/Scrum Analytics Cloud Computing

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform, the IBM z15™. It includes information about the Z environment and how it helps integrate data and transactions more securely. It also provides insight for faster and more accurate business decisions. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, and occupies an industry-standard footprint. It is offered as a single air-cooled 19-inch frame called the z15 T02, or as a multi-frame (1 to 4 19-inch frames) called the z15 T01. Both z15 models excel at the following tasks:: Using hybrid multicloud integration services Securing and protecting data with encryption everywhere Providing resilience with key to zero downtime Transforming a transactional platform into a data powerhouse Getting more out of the platform with operational analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and IBM Z technologies This book explains how this system uses innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

Azure Data Engineering Cookbook

2021-04-05 O'Reilly Amazon

book

Nagaraj Venkatesan , Ahmad Osama

data data-engineering Analytics Azure ADF Cloud Computing

Dive into the world of data engineering with 'Azure Data Engineering Cookbook' to master building efficient ETL workflows using Microsoft Azure Data services. Whether you're working on batch processing solutions or real-time analytics, this book is your guide to implementing effective, scalable data operations. What this Book will help me do Design and implement efficient ETL pipelines for batch and real-time processing on MS Azure. Understand the use of Azure Blob storage for managing large data sets. Ingest, process, and analyze data using tools like Azure Synapse and Databricks. Develop and secure automation pipelines using Azure Data Factory. Leverage Azure Stream Analytics for real-time data processing workflows. Author(s) Ahmad Osama and Nagaraj Venkatesan bring years of expertise in cloud solutions and data engineering. Renowned for their practical teaching approach, they have helped countless professionals master the intricacies of Azure. Their focus is on equipping readers with actionable skills for real-world data challenges. Who is it for? This book is ideal for data engineers and database professionals aiming to hone their expertise in advanced Azure data engineering tasks. Readers should have a working knowledge of Azure fundamentals and basic data engineering concepts. If you're a technical architect or ETL developer seeking to transition or enhance your skills in Azure's ecosystem, you'll find immense value here.

High Performant File System Workloads for AI and HPC on AWS using IBM Spectrum Scale

2021-03-31 O'Reilly Amazon

book

Sanjay Sudam

data data-engineering IBM AI/ML AWS Amazon EC2

This IBM® Redpaper® publication is intended to facilitate the deployment and configuration of the IBM Spectrum® Scale based high-performance storage solutions for the scalable data and AI solutions on Amazon Web Services (AWS). Configuration, testing results, and tuning guidelines for running the IBM Spectrum Scale based high-performance storage solutions for the data and AI workloads on AWS are the focus areas of the paper. The LAB Validation was conducted with the Red Hat Linux nodes to IBM Spectrum Scale by using the various Amazon Elastic Compute Cloud (EC2) instances. Simultaneous workloads are simulated across multiple Amazon EC2 nodes running with Red Hat Linux to determine scalability against the IBM Spectrum Scale clustered file system. Solution architecture, configuration details, and performance tuning demonstrate how to maximize data and AI application performance with IBM Spectrum Scale on AWS.

Effortless App Development with Oracle Visual Builder

2021-03-26 O'Reilly Amazon

book

Ankur Jain

data data-engineering oracle-database-solutions API Cloud Computing JavaScript

In "Effortless App Development with Oracle Visual Builder," you will explore how to quickly design, develop, and deploy robust web and mobile applications using Oracle Visual Builder's intuitive drag-and-drop features. This book equips you with the know-how to simplify application development tasks, making it perfect for professionals looking to boost productivity. What this Book will help me do Master the core architecture and features of Oracle Visual Builder to develop real-world applications effectively. Learn to create, manage, and leverage business objects and connect to various SaaS APIs within your applications. Build scalable and secure web and mobile applications using practical examples and clear implementation guidelines. Discover best practices for application lifecycle management, debugging, and troubleshooting VB applications. Extend Oracle and non-Oracle SaaS applications through hands-on knowledge tailored to real-world scenarios. Author(s) None Jain is an experienced developer and technical writer specializing in Oracle Visual Builder and cloud-based application development. With years of hands-on experience building and deploying cloud applications, they bring expertise and a practical approach to education. Their engaging writing style focuses on enabling readers to learn and apply new skills confidently. Who is it for? This book is perfectly suited for developers, UI designers, and IT professionals who want to master Oracle Visual Builder for developing web and mobile applications. If you already have experience with technologies like JavaScript, UI frameworks, and REST APIs, and seek to create intuitive applications using a simplified interface, this book is for you. Whether you're in the early stages of learning VB or looking to refine your skills, this book serves as a valuable guide.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale

Cloud Native Integration with Apache Camel: Building Agile and Scalable Integrations for Kubernetes Platforms

Developing Modern Applications with a Converged Database

Data Engineering on Azure

Developing Modern Database Applications with PostgreSQL

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

Data Modeling for Azure Data Services

SQL Server on Kubernetes: Designing and Building a Modern Data Platform

Amazon Redshift Cookbook

IBM TS4500 R7 Tape Library Guide

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

Machine Learning for Oracle Database Professionals: Deploying Model-Driven Applications and Automation Pipelines

Azure Data Factory by Example: Practical Implementation for Data Engineers

SAP HANA on IBM Power Systems Backup and Recovery Solutions

IBM PowerVC Version 2.0 Introduction and Configuration

Architecting Data-Intensive SaaS Applications

Distributed Data Systems with Azure Databricks

IBM Power System IC922 Technical Overview and Introduction

SAP S/4HANA Embedded Analytics: Experiences in the Field

Data Pipelines with Apache Airflow

IBM TS7700 Release 5.1 Guide

IBM z15 Technical Introduction

Azure Data Engineering Cookbook

High Performant File System Workloads for AI and HPC on AWS using IBM Spectrum Scale

Effortless App Development with Oracle Visual Builder