O'Reilly Data Engineering Books

Storage Multi-tenancy for Red Hat OpenShift Container Platform with IBM Storage

2021-06-18 O'Reilly Amazon

book

Gauthier Siri

data data-engineering IBM

With IBM® Spectrum Virtualize and the Object-Based Access Control, you can implement multi-tenancy and secure storage usage in a Red Hat OpenShift environment. This IBM Redpaper® publication shows you how to secure the storage usage from the Openshift user to the IBM Spectrum® Virtualize array. You see how to restrict storage usage in a Red Hat Openshift Container Platform to avoid the over-consumption of storage by one or more user. These uses cases can be expanded to the use of this control to provide assistance with billing.

IBM Fibre Channel Endpoint Security for IBM DS8900F and IBM Z

2021-06-15 O'Reilly Amazon

book

Roger Hathorn , Jacob Sheppard , Matthew Houzenga , Robert Tondini , Bertrand Dufrasne , Alexander Warmuth

data data-engineering IBM Cyber Security

This IBM® Redbooks® publication will help you install, configure, and use the new IBM Fibre Channel Endpoint Security function. The focus of this publication is about securing the connection between an IBM DS8900F and the IBM z15™. The solution is delivered with two levels of link security supported: support for link authentication on Fibre Channel links and support for link encryption of data in flight (which also includes link authentication). This solution is targeted for clients needing to adhere to Payment Card Industry (PCI) or other emerging data security standards, and those who are seeking to reduce or eliminate insider threats regarding unauthorized access to data.

97 Things Every Data Engineer Should Know

2021-06-14 O'Reilly Amazon

book

Tobias Macey

data data-engineering AI/ML Data Engineering DWH ETL/ELT

Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges. Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers. Topics include: The Importance of Data Lineage - Julien Le Dem Data Security for Data Engineers - Katharine Jarmul The Two Types of Data Engineering and Data Engineers - Jesse Anderson Six Dimensions for Picking an Analytical Data Warehouse - Gleb Mezhanskiy The End of ETL as We Know It - Paul Singman Building a Career as a Data Engineer - Vijay Kiran Modern Metadata for the Modern Data Stack - Prukalpa Sankar Your Data Tests Failed! Now What? - Sam Bail

Machine Learning for Oracle Database Professionals: Deploying Model-Driven Applications and Automation Pipelines

2021-06-11 O'Reilly Amazon

book

Kai Yu , Heli Helskyaho , Jean Yu

data data-engineering oracle-database-solutions AI/ML Cloud Computing DataViz

Database developers and administrators will use this book to learn how to deploy machine learning models in Oracle Database and in Oracle’s Autonomous Database cloud offering. The book covers the technologies that make up the Oracle Machine Learning (OML) platform, including OML4SQL, OML Notebooks, OML4R, and OML4Py. The book focuses on Oracle Machine Learning as part of the Oracle Autonomous Database collaborative environment. Also covered are advanced topics such as delivery and automation pipelines. Throughout the book you will find practical details and hand-on examples showing you how to implement machine learning and automate deployment of machine learning. Discussion around the examples helps you gain a conceptual understanding of machine learning. Important concepts discussed include the methods involved, the algorithms to choose from, and mechanisms for process and deployment. Seasoned database professionals looking to make the leap into machine learning as a growth path will find much to like in this book as it helps you step up and use your current knowledge of Oracle Database to transition into providing machine learning solutions. What You Will Learn Use the Oracle Machine Learning (OML) Notebooks for data visualization and machine learning model building and evaluation Understand Oracle offerings for machine learning Develop machine learning with Oracle database using the built-in machine learning packages Develop and deploy machine learning models using OML4SQL and OML4R Leverage the Oracle Autonomous Database and its collaborative environment for Oracle Machine Learning Develop and deploy machine learning projects in Oracle Autonomous Database Build an automated pipeline that can detect and handle changes in data/model performance Who This Book Is For Database developers and administrators who want to learn about machine learning, developers who want to build models and applications using Oracle Database’s built-in machine learning feature set, and administrators tasked with supporting applications on Oracle Database that make use of the Oracle Machine Learning feature set

Azure Data Factory by Example: Practical Implementation for Data Engineers

2021-06-09 O'Reilly Amazon

book

Richard Swinbank

data data-engineering relational-databases microsoft-sql-server Azure ADF

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. The hands-on introduction to ADF found in this book is equally well-suited to data engineers embracing their first ETL/ELT toolset as it is to seasoned veterans of Microsoft’s SQL Server Integration Services (SSIS). The example-driven approach leads you through ADF pipeline construction from the ground up, introducing important ideas and making learning natural and engaging. SSIS users will find concepts with familiar parallels, while ADF-first readers will quickly master those concepts through the book’s steady building up of knowledge in successive chapters. Summaries of key concepts at the end of each chapter provide a ready reference that you can return to again and again. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases

2021-06-09 O'Reilly Amazon

book

Nils Haustein

data data-engineering IBM Cyber Security

This IBM Redpaper™ publication introduces the IBM Spectrum Scale immutability function. It shows how to set it up and presents different ways for managing immutable and append-only files. This publication also provides guidance for implementing IT security aspects in an IBM Spectrum Scale cluster by addressing regulatory requirements. It also describes two typical use cases for managing immutable files. One use case involves applications that manage file immutability; the other use case presents a solution to automatically set files to immutable within a IBM Spectrum Scale immutable fileset.

IBM Spectrum Archive Enterprise Edition V1.3.1.2: Installation and Configuration Guide

2021-06-03 O'Reilly Amazon

book

Larry Coyne , Arnold Byron Lua , Hiroyuki Miyoshi , Khanh Ngo

data data-engineering IBM

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum® Archive Enterprise Edition (EE) Version 1.3.1.2 for the IBM TS4500, IBM TS3500, IBM TS4300, and IBM TS3310 tape libraries. IBM Spectrum Archive Enterprise Edition enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale based environment. It helps encourage the use of tape as a critical tier in the storage environment. This is the ninth edition of IBM Spectrum Archive Installation and Configuration Guide. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 8, 7, 6, and 5 tape drives in IBM® TS3310, TS3500, TS4300, and TS4500 tape libraries. In addition, IBM TS1160, TS1155, TS1150, and TS1140 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM customers, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Database-Driven Web Development: Learn to Operate at a Professional Level with PERL and MySQL

2021-05-28 O'Reilly Amazon

book

Thomas Valentine

data data-engineering relational-databases MySQL HTML JavaScript

Learn to operate at a professional level with HTML, CSS, DOM, JavaScript, PERL and the MySQL database. With plain language explanations and step-by-step examples, you will understand the key facets of web development that today’s employers are looking for. Encapsulating knowledge that is usually found in many books rather than one, this is your one-stop tutorial to becoming a web professional. You will learn how to use the PERL scripting language and the MySQL database to create powerful web applications. Each chapter will become progressively more challenging as you progress through experimentation and ultimately master database-driven web development via the web applications studied in the last chapters. Including practical tips and guidance gleaned from 20+ years of working as a web developer, Thomas Valentine provides you with all the information you need to prosper as a professional database-driven web professional. What You'll Learn Leverage standard web technologies to benefit a database-driven approach Create an effective web development workstation with databases in mind Use the PERL scripting language and the MySQL database effectively Maximize the Apache Web Server Who This Book Is For The primary audience for this book are those who know already know web development basics and web developers who want to master database driven web development. The skills required to understand the concepts put forth are a working knowledge of PERL and basic MySQL.

SAP HANA on IBM Power Systems Backup and Recovery Solutions

2021-05-27 O'Reilly Amazon

book

Dino Quintero , Adriana Melges Quintanilha Weingart , Pia Nymann , Rosane Goldstein , Andrei Socoliuc

data data-engineering IBM Cloud Computing ERP Linux

This IBM® Redpaper Redbooks publication provides guidance about a backup and recovery solution for SAP High-performance Analytic Appliance (HANA) running on IBM Power Systems. This publication provides case studies and how-to procedures that show backup and recovery scenarios. This publication provides information about how to protect data in an SAP HANA environment by using IBM Spectrum® Protect and IBM Spectrum Copy Data Manager. This publication focuses on the data protection solution, which is described through several scenarios. The information in this publication is distributed on an as-is basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM Spectrum Protect are supported and entitled, and where the issues are specific to a blueprint implementation. The goal of the publication is to describe the best aspects and options for backup, snapshots, and restore of SAP HANA Multitenant Database Container (MDC) single and multi-tenant installations on IBM Power Systems by using theoretical knowledge, hands-on exercises, and documenting the findings through sample scenarios. This document provides resources about the following processes: Describing how to determine the best option, including SAP Landscape aspects to back up, snapshot, and restore of SAP HANA MDC single and multi-tenant installations based on IBM Spectrum Computing Suite, Red Hat Linux Relax and Recover (ReAR), and other products. Documenting key aspects, such as recovery time objective (RTO) and recovery point objective (RPO), backup impact (load, duration, scheduling), quantitative savings (for example, data deduplication), integration and catalog currency, and tips and tricks that are not covered in the product documentation. Using IBM Cloud® Object Storage and documenting how to use IBM Spectrum Protect to back up to the cloud. SAP HANA 2.0 SPS 05 has this feature that is built in natively. IBM Spectrum Protect for Enterprise Resource Planning (ERP) has this feature too. Documenting Linux ReaR to cover operating system (OS) backup because ReAR is used by most backup products, such as IBM Spectrum Protect and Symantec Endpoint Protection (SEP) to back up OSs. This publication targets technical readers including IT specialists, systems architects, brand specialists, sales teams, and anyone looking for a guide about how to implement the best options for SAP HANA backup and recovery on IBM Power Systems. Moreover, this publication provides documentation to transfer the how-to-skills to the technical teams and solution guidance to the sales team. This publication complements the documentation that is available at IBM Knowledge Center, and it aligns with the educational materials that are provided by IBM Garage™ for Systems Technical Education and Training.

IBM PowerVC Version 2.0 Introduction and Configuration

2021-05-26 O'Reilly Amazon

book

Thierry Huché , Sachin P. Deshmukh , Scott Vetter , Stephen Lutz , Christopher Emefiene Osiegbu , Ahmed Mashhour , Borislav Ivanov Stoymirski

data data-engineering IBM Ansible Cloud Computing Linux

IBM® Power Virtualization Center (IBM® PowerVC™) is an advanced enterprise virtualization management offering for IBM Power Systems. This IBM Redbooks® publication introduces IBM PowerVC and helps you understand its functions, planning, installation, and setup. It also shows how IBM PowerVC can integrate with systems management tools such as Ansible or Terraform and that it also integrates well into a OpenShift container environment. IBM PowerVC Version 2.0.0 supports both large and small deployments, either by managing IBM PowerVM® that is controlled by the Hardware Management Console (HMC), or by IBM PowerVM NovaLink. With this capability, IBM PowerVC can manage IBM AIX®, IBM i, and Linux workloads that run on IBM POWER® hardware. IBM PowerVC is available as a Standard Edition, or as a Private Cloud Edition. IBM PowerVC includes the following features and benefits: Virtual image capture, import, export, deployment, and management Policy-based virtual machine (VM) placement to improve server usage Snapshots and cloning of VMs or volumes for backup or testing purposes Support of advanced storage capabilities such as IBM SVC vdisk mirroring of IBM Global Mirror Management of real-time optimization and VM resilience to increase productivity VM Mobility with placement policies to reduce the burden on IT staff in a simple-to-install and easy-to-use graphical user interface (GUI) Automated Simplified Remote Restart for improved availability of VMs ifor when a host is down Role-based security policies to ensure a secure environment for common tasks The ability to enable an administrator to enable Dynamic Resource Optimization on a schedule IBM PowerVC Private Cloud Edition includes all of the IBM PowerVC Standard Edition features and enhancements: A self-service portal that allows the provisioning of new VMs without direct system administrator intervention. There is an option for policy approvals for the requests that are received from the self-service portal. Pre-built deploy templates that are set up by the cloud administrator that simplify the deployment of VMs by the cloud user. Cloud management policies that simplify management of cloud deployments. Metering data that can be used for chargeback. This publication is for experienced users of IBM PowerVM and other virtualization solutions who want to understand and implement the next generation of enterprise virtualization management for Power Systems. Unless stated otherwise, the content of this publication refers to IBM PowerVC Version 2.0.0.

Architecting Data-Intensive SaaS Applications

2021-05-25 O'Reilly Amazon

book

Pui Kei Johnston Chu , Kevin McGinley , William Waddington , Gjorgji Georgievski , Dinesh Kulkarni

data data-engineering AI/ML Analytics Cloud Computing IoT

Through explosive growth in the past decade, data now drives significant portions of our lives, from crowdsourced restaurant recommendations to AI systems identifying effective medical treatments. Software developers have unprecedented opportunity to build data applications that generate value from massive datasets across use cases such as customer 360, application health and security analytics, the IoT, machine learning, and embedded analytics. With this report, product managers, architects, and engineering teams will learn how to make key technical decisions when building data-intensive applications, including how to implement extensible data pipelines and share data securely. The report includes design considerations for making these decisions and uses the Snowflake Data Cloud to illustrate best practices. This report explores: Why data applications matter: Get an introduction to data applications and some of the most common use cases Evaluating platforms for building data apps: Evaluate modern data platforms to confidently consider the merits of potential solutions Building scalable data applications: Learn design patterns and best practices for storage, compute, and security Handling and processing data: Explore techniques and real-world examples for building data pipelines to support data applications Designing for data sharing: Learn best practices for sharing data in modern data applications

Distributed Data Systems with Azure Databricks

2021-05-25 O'Reilly Amazon

book

Alan Bernardo Palacio

data data-engineering storage-repositories data-lake AI/ML Azure

In 'Distributed Data Systems with Azure Databricks', you will explore the capabilities of Microsoft Azure Databricks as a platform for building and managing big data pipelines. Learn how to process, transform, and analyze data at scale while developing expertise in training distributed machine learning models and integrating them into enterprise workflows. What this Book will help me do Design and implement Extract, Transform, Load (ETL) pipelines using Azure Databricks. Conduct distributed training of machine learning models using TensorFlow and Horovod. Integrate Azure Databricks with Azure Data Factory for optimized data pipeline orchestration. Utilize Delta Engine for efficient querying and analysis of data within Delta Lake. Employ Databricks Structured Streaming to manage real-time production-grade data flows. Author(s) None Palacio is an experienced data engineer and cloud computing specialist, with extensive knowledge of the Microsoft Azure platform. With years of practical application of Databricks in enterprise settings, Palacio provides clear, actionable insights through relatable examples. They bring a passion for innovative solutions to the field of big data automation. Who is it for? This book is ideal for data engineers, machine learning engineers, and software developers looking to master Azure Databricks for large-scale data processing and analysis. Readers should have basic familiarity with cloud platforms, understanding of data pipelines, and a foundational grasp of Python and machine learning concepts. It is perfect for those wanting to create scalable and manageable data workflows.

IBM Power System IC922 Technical Overview and Introduction

2021-05-20 O'Reilly Amazon

book

Scott Vetter , Stephen Lutz , YoungHoon Cho

data data-engineering IBM ibm-power-systems AI/ML Cloud Computing

This IBM® Redpaper publication is a comprehensive guide that covers the IBM Power System IC922 (9183-22X) server that uses IBM POWER9™ processor-based technology and supports Linux operating systems (OSs). The objective of this paper is to introduce the system offerings and their capacities and available features. The Power IC922 server is built to deliver powerful computing, scaling efficiency, and storage capacity in a cost-optimized design to meet the evolving data challenges of the artificial intelligence (AI) era. It includes the following features: High throughput and performance for high-value Linux workloads, such as inferencing data or storage-rich workloads, or cloud. Potentially low acquisition cost through system optimization, such as using industry standard memory and warranty. Two IBM POWER9 processor-based single-chip module (SCM) devices that provide high performance with 24, 32, or 40 fully activated cores and a maximum 2 TB of memory. Up to six NVIDIA T4 graphics processing unit (GPU) accelerators. Up to twenty-four 2.5-inch SAS/SATA drives. One dedicated and one shared 1 Gb Intelligent Platform Management Interface (IPMI) port.. This publication is for professionals who want to acquire a better understanding of IBM Power Systems products. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power IC922 server.

SAP S/4HANA Embedded Analytics: Experiences in the Field

2021-05-19 O'Reilly Amazon

book

Freek Keijzer

data data-engineering SAP Agile/Scrum Analytics BI

Imagine you are a business user, consultant, or developer about to enter an SAP S/4HANA implementation project. You are well-versed with SAP’s product portfolio and you know that the preferred reporting option in S/4HANA is embedded analytics. But what exactly is embedded analytics? And how can it be implemented? And who can do it: a business user, a functional consultant specialized in financial or logistics processes? Or does a business intelligence expert or a programmer need to be involved? Good questions! This book will answer these questions, one by one. It will also take you on the same journey that the implementation team needs to follow for every reporting requirement that pops up: start with assessing a more standard option and only move on to a less standard option if the requirement cannot be fulfilled. In consecutive chapters, analytical apps delivered by SAP, apps created using Smart Business Services, and Analytical Queries developed either using tiles or in adevelopment environment are explained in detail with practical examples. The book also explains which option is preferred in which situation. The book covers topics such as in-memory computing, cloud, UX, OData, agile development, and more.Author Freek Keijzer writes from the perspective of an implementation consultant, focusing on functionality that has proven itself useful in the field. Practical examples are abundant, ranging from “codeless” to “hardcore coding.” What You Will Learn Know the difference between static reporting and interactive querying on real-time data Understand which options are available for analytics in SAP S/4HANA Understand which option to choose in which situation Know how to implement these options Who This Book is For SAP power users, functional consultants, developers

Implementing Order to Cash Process in SAP

2021-05-14 O'Reilly Amazon

book

Chandrakant Agarwal

data data-engineering SAP CRM Data Management Master Data Management

Immerse yourself in the pivotal Order to Cash (OTC) process in SAP with this comprehensive guide! By leveraging the functionalities of SAP CRM, SAP APO, SAP TMS, and SAP LES, integrated with SAP ECC, this book provides a detailed walkthrough to enhance your business operations and system understanding. What this Book will help me do Understand master data management across different SAP modules to ensure integrated operations. Explore and implement the key functions of sales processes and customer relationship management in SAP CRM. Master the concepts of order fulfillment, including ATP checks, leveraging SAP APO. Dive deep into transportation planning and freight management processes using SAP TMS. Gain insights into logistics execution and customer invoicing using SAP ECC. Author(s) None Agarwal is an experienced SAP consultant specializing in enterprise integration and process optimization. With an extensive background in SAP modules such as CRM, APO, TMS, and LES, Agarwal brings real-world experience into this work. Passionate about helping others leverage SAP software to its fullest, Agarwal writes accessible and actionable guides. Who is it for? This book is tailored for SAP consultants, solution architects, and managers tasked with process optimization in SAP environments. If you're seeking to integrate SAP CRM, TMS, or APO modules effectively into your operations, this book has been designed for you. Readers are expected to have a foundational understanding of SAP ECC and its core principles. Ideal for individuals aiming to enhance their enterprise's OTC processes.

Electronic Health Records with Epic and IBM FlashSystem 9200 Blueprint Version 2 Release 3

2021-05-12 O'Reilly Amazon

book

IBM

data data-engineering IBM

This information is intended to facilitate the deployment of IBM® FlashSystem for the Epic Corporation electronic health record (EHR) solution by describing the requirements and specifications for configuring IBM FlashSystem® 9200 and its parameters. The document also describes the steps that are required to configure the server that host the EHR application. To complete the tasks, you must have a working knowledge of IBM FlashSystem 9200 and Epic applications. The information in this document is distributed on an "as is" basis, without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM FlashSystem storage devices are supported and entitled and where the issues are not specific to a blueprint implementation.

Applied Modeling Techniques and Data Analysis 1

2021-05-11 O'Reilly Amazon

book

Christos H. Skiadas , Christina Parpoula , Yannis Dimotikalis , Alex Karagrigoriou

data data-engineering data-models AI/ML Big Data

BIG DATA, ARTIFICIAL INTELLIGENCE AND DATA ANALYSIS SET Coordinated by Jacques Janssen Data analysis is a scientific field that continues to grow enormously, most notably over the last few decades, following rapid growth within the tech industry, as well as the wide applicability of computational techniques alongside new advances in analytic tools. Modeling enables data analysts to identify relationships, make predictions, and to understand, interpret and visualize the extracted information more strategically. This book includes the most recent advances on this topic, meeting increasing demand from wide circles of the scientific community. Applied Modeling Techniques and Data Analysis 1 is a collective work by a number of leading scientists, analysts, engineers, mathematicians and statisticians, working on the front end of data analysis and modeling applications. The chapters cover a cross section of current concerns and research interests in the above scientific areas. The collected material is divided into appropriate sections to provide the reader with both theoretical and applied information on data analysis methods, models and techniques, along with appropriate applications.

Applied Modeling Techniques and Data Analysis 2

2021-05-11 O'Reilly Amazon

book

Christos H. Skiadas , Christina Parpoula , Yannis Dimotikalis , Alex Karagrigoriou

data data-engineering data-models AI/ML Big Data

BIG DATA, ARTIFICIAL INTELLIGENCE AND DATA ANALYSIS SET Coordinated by Jacques Janssen Data analysis is a scientific field that continues to grow enormously, most notably over the last few decades, following rapid growth within the tech industry, as well as the wide applicability of computational techniques alongside new advances in analytic tools. Modeling enables data analysts to identify relationships, make predictions, and to understand, interpret and visualize the extracted information more strategically. This book includes the most recent advances on this topic, meeting increasing demand from wide circles of the scientific community. Applied Modeling Techniques and Data Analysis 2 is a collective work by a number of leading scientists, analysts, engineers, mathematicians and statisticians, working on the front end of data analysis and modeling applications. The chapters cover a cross section of current concerns and research interests in the above scientific areas. The collected material is divided into appropriate sections to provide the reader with both theoretical and applied information on data analysis methods, models and techniques, along with appropriate applications.

Understanding Log Analytics at Scale, 2nd Edition

2021-05-10 O'Reilly Amazon

book

Matt Gillespie , Charles Givre

data data-engineering log-data Analytics Cyber Security

Using log analytics provides organizations with powerful and necessary capabilities for IT security. By analyzing log data, you can drive critical business outcomes, such as identifying security threats or opportunities to build new products. Log analytics also helps improve business efficiency, application, infrastructure, and uptime. In the second edition of this report, data architects and IT infrastructure leads will learn how to get up to speed on log data, log analytics, and log management. Log data, the list of recorded events from software and hardware, typically includes the IP address, time of event, date of event, and more. You'll explore how proactively planned data storage and delivery extends enterprise IT capabilities critical to security analytics deployments. Explore what log analytics is--and why log data is so vital Learn how log analytics helps organizations achieve better business outcomes Use log analytics to address specific business problems Examine the current state of log analytics, including common issues Make the right storage deployments for log analytics use cases Understand how log analytics will evolve in the future With this in-depth report, you'll be able to identify the points your organization needs to consider to achieve successful business outcomes from your log data.

Data Pipelines with Apache Airflow

2021-05-09 O'Reilly Amazon

book

Julian de Ruiter , Bas Harenslak

data data-engineering apache-airflow AI/ML Airflow Cloud Computing

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack. About the Technology Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task. About the Book Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs. What's Inside Build, test, and deploy Airflow pipelines as DAGs Automate moving and transforming data Analyze historical datasets using backfilling Develop custom components Set up Airflow in production environments About the Reader For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills. About the Authors Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer. Quotes An Airflow bible. Useful for all kinds of users, from novice to expert. - Rambabu Posa, Sai Aashika Consultancy An easy-to-follow exploration of the benefits of orchestrating your data pipeline jobs with Airflow. - Daniel Lamblin, Coupang The one reference you need to create, author, schedule, and monitor workflows with Apache Airflow. Clear recommendation. - Thorsten Weber, bbv Software Services AG By far the best resource for Airflow. - Jonathan Wood, LexisNexis

IBM FlashSystem A9000 and A9000R Business Continuity Solutions

2021-05-05 O'Reilly Amazon

book

Lisa Martinez , Francesco Anderloni , Bert Dufrasne , Roger Eriksson

data data-engineering IBM

This edition applies to FlashSystem A9000 and A9000R, Model 415 and 425, with system software Version 12.3 IBM® FlashSystem A9000 and IBM FlashSystem® A9000R provide copy functions suited for various data protection scenarios that enable you to enhance your business continuance, disaster recovery, data migration, and backup solutions. These functions allow point-in-time copies, known as snapshots, and also include remote copy capabilities in either synchronous or asynchronous mode. Furthermore, support for IBM Hyper-Scale Mobility enables a seamless migration of IBM FlashSystem A9000 or A9000R volumes to another with no interference to the host. Starting with software level V12.1, the IBM HyperSwap® feature delivers always-on, high availability (HA) storage service for storage volumes in a production environment. Starting with version 12.2, asynchronous replication between the IBM XIV® Gen3 and FlashSystem A9000 or A9000R is supported. Starting with Version 12.2.1, Hyper-Scale Mobility is enabled between XIV Gen3 and FlashSystem A9000 or A9000R. Version 12.3 offers Multi-site replication solution that entails both High Availability (HA) and Disaster Recovery (DR)function by combining HyperSwap and Asynchronous replication to a third site. This IBM Redpaper™ publication is intended for anyone who needs a detailed and practical understanding of the IBM FlashSystem A9000 and IBM FlashSystem A9000R replication and business continuity functions.

IBM TS7700 Release 5.1 Guide

2021-05-04 O'Reilly Amazon

book

Marcelo Lopes de Moraes , Aderson Pacini , Ole Asmussen , Nao Takemura , Lourie Goodall , Alberto Barajas Ortiz , Takeshi Nohta , Monica Falcone , Chen Zhu , Larry Coyne , Taisei Takai , Tomoaki Ogino , Michael Scott , Kousei Kawamura , Derek Erdmann , Nobuhiko Furuya , Joe Hew , Rin Fujiwara , Joe Swingler , Stefan Neff , Sosuke Matsui , Takahiro Tsuda

data data-engineering IBM Cloud Computing Cloud Storage S3

This IBM® Redbooks® publication covers IBM TS7700 R5.1. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on over 20 years of virtual tape experience, the TS7770 supports the ability to store virtual tape volumes in an object store. The TS7700 supported off loading to physical tape for over two decades. Off loading to physical tape behind a TS7700 is utilized by hundreds of organizations around the world. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud® Object Storage and Amazon S3. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.1. The R5.1 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000® Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.1 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. TS7700 provides tape virtualization for the IBM z environment. Tape virtualization can help satisfy the following requirements in a data processing environment: Improved reliability and resiliency Reduction in the time that is needed for the backup and restore process Reduction of services downtime that is caused by physical tape drive and library outages Reduction in cost, time, and complexity by moving primary workloads to virtual tape Increased efficient procedures for managing daily batch, backup, recall, and restore processing On-premises and off-premises object store cloud storage support as an alternative to physical tape for archive and disaster recovery New and existing capabilities of the TS7700 5.1 include the following highlights: Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication Full AES256 encryption for replication data that is in-flight and at-rest Tight integration with IBM Z and DFSMS policy management Optional target for DS8000 Transparent Cloud Tier using DFSMS DS8000 Object Store AES256 in-flight encryption and compression Optional Cloud Storage Tier support for archive and disaster recovery 16 Gb IBM FICON® throughput up to 5 GBps per TS7700 cluster IBM Z hosts view up to 3,968 common devices per TS7700 grid Grid access to all data independent of where it exists TS7770 Cache On-demand feature that is based capacity licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM POWER9™ technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM z15 Technical Introduction

2021-05-03 O'Reilly Amazon

book

Frank Packheiser , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Bill White , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga

data data-engineering IBM Agile/Scrum Analytics Cloud Computing

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform, the IBM z15™. It includes information about the Z environment and how it helps integrate data and transactions more securely. It also provides insight for faster and more accurate business decisions. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, and occupies an industry-standard footprint. It is offered as a single air-cooled 19-inch frame called the z15 T02, or as a multi-frame (1 to 4 19-inch frames) called the z15 T01. Both z15 models excel at the following tasks:: Using hybrid multicloud integration services Securing and protecting data with encryption everywhere Providing resilience with key to zero downtime Transforming a transactional platform into a data powerhouse Getting more out of the platform with operational analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and IBM Z technologies This book explains how this system uses innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM Z Functional Matrix

2021-05-03 O'Reilly Amazon

book

Octavian Lascu

data data-engineering IBM

This IBM® Redpaper™ publication provides a list of features and functions that are supported on IBM Z, including the IBM z15™ (z15) - Machine type 8561, IBM z14™ (z14) - Machine types 3906 and 3907, IBM z13®(z13), and IBM z13s®(z13s).

SAP SuccessFactors Talent: Volume 2: A Complete Guide to Configuration, Administration, and Best Practices: Succession and Development

2021-04-26 O'Reilly Amazon

book

Susan Traynor , Venki Krishnamoorthy , Michael A. Wellens

data data-engineering SAP

Take an in-depth look at SAP SuccessFactors talent modules with this complete guide to configuration, administration, and best practices. This two-volume series follows a logical progression of SAP SuccessFactors modules that should be configured to complete a comprehensive talent management solution. The authors walk you through fully functional simple implementations in the primary chapters for each module before diving into advanced topics in subsequent chapters. In volume 2, you will explore the development module in three more chapters by learning to configure and use development plans, career worksheets, and mentoring. Then, the book examines succession management, covering topics such as configuring, administering, and using the 9-box, the Talent Review form, nominations, succession org charts, talent pools, and succession presentations. The authors then sum up with a review of what you learned and final conclusions. Within each topic, the book touches on the integration points with other modules as well as internationalization. The authors also provide recommendations and insights from real world experience. Having finished the book, you will have an understanding of what comprises a complete SAP SuccessFactors talent management solution and how to configure, administer, and use each module within it. What You Will Learn Work with the career worksheet Build mentoring into your SAP SuccessFactors solution Display and update relevant talent data in a succession org chart Who This Book Is ForImplementation partners and customers who are project managers, configuration specialists, analysts, or system administrators.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Storage Multi-tenancy for Red Hat OpenShift Container Platform with IBM Storage

IBM Fibre Channel Endpoint Security for IBM DS8900F and IBM Z

97 Things Every Data Engineer Should Know

Machine Learning for Oracle Database Professionals: Deploying Model-Driven Applications and Automation Pipelines

Azure Data Factory by Example: Practical Implementation for Data Engineers

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases

IBM Spectrum Archive Enterprise Edition V1.3.1.2: Installation and Configuration Guide

Database-Driven Web Development: Learn to Operate at a Professional Level with PERL and MySQL

SAP HANA on IBM Power Systems Backup and Recovery Solutions

IBM PowerVC Version 2.0 Introduction and Configuration

Architecting Data-Intensive SaaS Applications

Distributed Data Systems with Azure Databricks

IBM Power System IC922 Technical Overview and Introduction

SAP S/4HANA Embedded Analytics: Experiences in the Field

Implementing Order to Cash Process in SAP

Electronic Health Records with Epic and IBM FlashSystem 9200 Blueprint Version 2 Release 3

Applied Modeling Techniques and Data Analysis 1

Applied Modeling Techniques and Data Analysis 2

Understanding Log Analytics at Scale, 2nd Edition

Data Pipelines with Apache Airflow

IBM FlashSystem A9000 and A9000R Business Continuity Solutions

IBM TS7700 Release 5.1 Guide

IBM z15 Technical Introduction

IBM Z Functional Matrix

SAP SuccessFactors Talent: Volume 2: A Complete Guide to Configuration, Administration, and Best Practices: Succession and Development