IBM

IBM Spectrum Archive Enterprise Edition V1.2.4: Installation and Configuration Guide

2017-08-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Wei Zheng Ong , Illarion Borisevich , Larry Coyne , Khanh Ngo , Stefan Neff

data data-engineering

Abstract This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive (formerly IBM Linear Tape File System™ (LTFS)) Enterprise Edition (EE) V1.2.4.0 for the IBM TS3310, IBM TS3500, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale™ based environment and helps encourage the use of tape as a critical tier in the storage environment. This is the fourth edition of IBM Spectrum Archive V1.2 (SG24-8333) although it is based on the prior editions of IBM Linear Tape File System Enterprise Edition V1.1.1.2: Installation and Configuration Guide, SG24-8143. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 7, 6, and 5 tape drives in IBM TS3310, TS3500, and TS4500 tape libraries. In addition, IBM TS1155, TS1150, and TS1140 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

IBM z14 Technical Introduction

2017-07-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Esra Ufacik , Frank Packheiser , John Troy , Bill White , Octavian Lascu , Michal Kordyzon , Hervey Kamga , Bo XU

Agile/Scrum Analytics Cloud Computing Cyber Security data data-engineering

Abstract This IBM® Redpaper Redbooks® publication introduces the latest IBM Z platform, the IBM z14®. It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 is state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to the digital era and the trust economy. These capabilities include: - Securing data with pervasive encryption - Transforming a transactional platform into a data powerhouse - Getting more out of the platform with IT Operational Analytics - Providing resilience with key to zero downtime - Accelerating digital transformation with agile service delivery - Revolutionizing business processes - Blending open source and Z technologies This book explains how this system uses both new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and security. With the z14 as the base, applications can run in a trusted, reliable, and secure environment that both improves operations and lessens business risk.

Mastering Apache Spark 2.x - Second Edition

2017-07-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Romeo Kienzler

AI/ML Analytics Big Data Cloud Computing Data Analytics Kubernetes Scala Spark SQL apache-spark data data-engineering

Mastering Apache Spark 2.x is the essential guide to harnessing the power of big data processing. Dive into real-time data analytics, machine learning, and cluster computing using Apache Spark's advanced features and modules like Spark SQL and MLlib. What this Book will help me do Gain proficiency in Spark's batch and real-time data processing with SparkSQL. Master techniques for machine learning and deep learning using SparkML and SystemML. Understand the principles of Spark's graph processing with GraphX and GraphFrames. Learn to deploy Apache Spark efficiently on platforms like Kubernetes and IBM Cloud. Optimize Spark cluster performance by configuring parameters effectively. Author(s) Romeo Kienzler is a seasoned professional in big data and machine learning technologies. With years of experience in cloud-based distributed systems, Romeo brings practical insights into leveraging Apache Spark. He combines his deep technical expertise with a clear and engaging writing style. Who is it for? This book is tailored for intermediate Apache Spark users eager to deepen their knowledge in Spark 2.x's advanced features. Ideal for data engineers and big data professionals seeking to enhance their analytics pipelines with Spark. A basic understanding of Spark and Scala is necessary. If you're aiming to optimize Spark for real-world applications, this book is crafted for you.

IBM Db2: Investigating Automatic Storage Table Spaces and Data Skew

2017-07-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by George Wangelien , Zachary Hoggard

data data-engineering ibm-db2 relational-databases

The scope of this IBM® Redpaper™ publication is to provide a high-level overview of automatic storage table spaces, table space maps, table space extent maps, and physically unbalanced data across automatic storage table space containers (that is, data skew). The objective of this paper is to investigate causes of data skew and make suggestions for how to resolve it. This paper is for Database Administrators (DBAs) of IBM Db2®; the DBAs should have general Db2 knowledge and skills. The environment used for the creation of this document is Db2 Version 11.1, and an IBM AIX® operating system. This document is based on results of testing various scenarios.

IBM Spectrum Accelerate Deployment, Usage, and Maintenance

2017-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Markus Oscheka , Bertrand Dufrasne , Abilio Oliveira , Grant Kabobel

Agile/Scrum Cloud Computing data data-engineering

Abstract This edition applies to IBM® Spectrum Accelerate V11.5.4. IBM Spectrum Accelerate™, a member of IBM Spectrum Storage™, is an agile, software-defined storage solution for enterprise and cloud that builds on the customer-proven and mature IBM XIV® storage software. The key characteristic of Spectrum Accelerate is that it can be easily deployed and run on purpose-built or existing hardware that is chosen by the customer. IBM Spectrum Accelerate enables rapid deployment of high-performance and scalable block data storage infrastructure over commodity hardware on-premises or off-premises. This IBM Redbooks® publication provides a broad understanding of IBM Spectrum Accelerate. The book introduces Spectrum Accelerate and describes planning and preparation that are essential for a successful deployment of the solution. The deployment is described through a step-by-step approach, by using a graphical user interface (GUI) based method or a simple command-line interface (CLI) based procedure. Chapters in this book describe the logical configuration of the system, host support and business continuity functions, and migration. Although it makes many references to the XIV storage software, the book also emphasizes where IBM Spectrum Accelerate differs from XIV. Finally, a substantial portion of the book is dedicated to maintenance and troubleshooting to provide detailed guidance for the customer support personnel.

IBM Z Connectivity Handbook

2017-07-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Esra Ufacik , Frank Packheiser , John Troy , Bill White , Octavian Lascu , Michal Kordyzon , Hervey Kamga , Bo XU

data data-engineering

Abstract This IBM® Redbooks® publication describes the connectivity options that are available for use within and beyond the data center for the IBM Z family of mainframes, which includes these systems: IBM z14 IBM z13® IBM z13s™ IBM zEnterprise® EC12 (zEC12) IBM zEnterprise BC12 (zBC12) This book highlights the hardware and software components, functions, typical uses, coexistence, and relative merits of these connectivity features. It helps readers understand the connectivity alternatives that are available when planning and designing their data center infrastructures. The changes to this edition are based on the IBM Z hardware announcement dated 17 July, 2017. This book is intended for data center planners, IT professionals, systems engineers, and network planners who are involved in the planning of connectivity solutions for IBM mainframes.

Implementing OpenStack SwiftHLM with IBM Spectrum Archive EE or IBM Spectrum Protect for Space Management

2017-06-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dominic Mller-Wicke , Larry Coyne , Khanh Ngo , Slavisa Sarafijanovic , Simon Lorenz , Harald Seipp , Takeshi Ishimoto

Data Management data data-engineering

The Swift High Latency Media project seeks to create a high-latency storage back end that makes it easier for users to perform bulk operations of data tiering within a Swift data ring. In today's world, data is produced at significantly higher rates than a decade ago. The storage and data management solutions of the past can no longer keep up with the data demands of today. The policies and structures that decide and execute how that data is used, discarded, or retained determines how efficiently the data is used. The need for intelligent data management and storage is more critical now than ever before. Traditional management approaches hide cost-effective, high-latency media (HLM) storage, such as tape or optical disk archive back ends, underneath a traditional file system. The lack of HLM-aware file system interfaces and software makes it difficult for users to understand and control data access on HLM storage. Coupled with data-access latency, this lack of understanding results in slow responses and potential time-outs that affect the user experience. The Swift HLM project addresses this challenge. Running OpenStack Swift on top of HLM storage allows you to cheaply store and efficiently access large amounts of infrequently used object data. Data that is stored on tape storage can be easily adopted to an Object Storage data interface. This IBM® Redpaper™ publication describes the Swift High Latency Media project and provides guidance for installation and configuration.

IBM PowerHA SystemMirror V7.2.1 for IBM AIX Updates

2017-05-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Bunphot Chuprasertsuk , Fabio Martins , Bernhard Buehler , Matthew W Radford , Shawn Bodily , Maria-Katharina Esser , Anthony Steel , Bing He

data data-engineering

Abstract This IBM® Redbooks® publication helps strengthen the position of the IBM PowerHA® SystemMirror® solution with a well-defined and documented deployment models within an IBM Power Systems™ virtualized environment, which provides customers with a planned foundation for business resilience and disaster recovery for their IBM Power Systems infrastructure solutions. This publication addresses topics to help meet customers' complex high availability and disaster recovery requirements on IBM Power Systems servers to help maximize their systems' availability and resources, and provide technical documentation to transfer the how-to-skills to users and support teams. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing high availability and disaster recovery solutions and support with IBM PowerHA SystemMirror Standard and Enterprise Editions on IBM Power Systems servers.

Oracle on IBM z Systems

2017-05-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Helene Grosch , David J Simpson , Armelle Chevé , Moshe Reder , Narjisse Zaki , Lydia Parziale , Sam Amsavelu

Cloud Computing Linux Oracle data data-engineering oracle-database-solutions

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® z Systems®. The enterprise-grade Linux on IBM z Systems solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from IBM z Systems®. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V7.8

2017-05-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jon Tate , Frank Enders , Catarina Castro , Giulio Fiscella , Dharmesh Kamdar , Paulo Tomiyoshi Takeda

data data-engineering ibm-system-storage ibm-system-storage-san-volume-controller

Abstract This IBM® Redbooks® publication is a detailed technical guide to the IBM System Storage® SAN Volume Controller, which is powered by IBM Spectrum Virtualize™ Version 7.8. IBM SAN Volume Controller is a virtualization appliance solution, which maps virtualized volumes that are visible to hosts and applications to physical volumes on storage devices. Each server within the storage area network (SAN) has its own set of virtual storage addresses that are mapped to physical addresses. If the physical addresses change, the server continues running by using the same virtual addresses that it had before. Therefore, volumes or storage can be added or moved while the server is still running. The IBM virtualization technology improves the management of information at the "block" level in a network, which enables applications and servers to share storage devices on a network.

POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition

2017-05-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Wainer dos Santos Moschetta , Joseph Apuzzo , John Dunham , Mauricio Faria de Oliveira , Desnes Augusto Nunes Rosario , Markus Hilger , Alexander Pozdneev

data data-engineering

Abstract This IBM® Redbooks® publication documents and addresses topics to provide step-by-step customizable application and programming solutions to tune application and workloads to use IBM Power Systems™ hardware architecture. This publication explores, tests, and documents the solution to use the architectural technologies and the software solutions that are available from IBM to help solve challenging technical and business problems. This publication also demonstrates and documents that the combination of IBM high-performance computing (HPC) solutions (hardware and software) delivers significant value to technical computing clients who are in need of cost-effective, highly scalable, and robust solutions. First, the book provides a high-level overview of the HPC solution, including all of the components that makes the HPC cluster: IBM Power System S822LC (8335-GTB), software components, interconnect switches, and the IBM Spectrum™ Scale parallel file system. Then, the publication is divided in three parts: Part 1 focuses on the developers, Part 2 focuses on the administrators, and Part 3 focuses on the evaluators and planners of the solution. The IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights from vast amounts of client’s data so they can optimize business results, product development, and scientific discoveries.

IBM DB2 Web Query for i: The Nuts and Bolts

2017-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rob Bestgen , Doug Mack , Lin Su , Simona Pacchiarini , Kathryn Steinbrink , Hernando Bedoya , Kevin Trisko , Jim Bainbridge , Mike Cain

BI HTML Java Microsoft SQL SQL Server data data-engineering ibm-db2 relational-databases

Abstract Business Intelligence (BI) is a broad term that relates to applications that analyze data to understand and act on the key metrics that drive profitability in an enterprise. Key to analyzing that data is providing fast, easy access to it while delivering it in formats or tools that best fit the needs of the user. At the core of any BI solution are user query and reporting tools that provide intuitive access to data supporting a spectrum of users from executives to “power users,” from spreadsheet aficionados to the external Internet consumer. IBM® DB2® Web Query for i offers a set of modernized tools for a more robust, extensible, and productive reporting solution than the popular IBM Query for System i® tool (also known as IBM Query/400). IBM DB2 Web Query for i preserves investments in the reports that are developed with Query/400 by offering a choice of importing definitions into the new technology or continuing to run existing Query/400 reports as is. But, it also offers significant productivity and performance enhancements by leveraging the latest in DB2 for i query optimization technology. The DB2 Web Query for i product is a web-based query and report writing product that offers enhanced capabilities over the IBM Query for iSeries product (also commonly known as Query/400). IBM DB2 Web Query for i includes Query for iSeries technology to assist customers in their transition to DB2 Web Query. It offers a more modernized, Java based solution for a more robust, extensible, and productive reporting solution. DB2 Web Query provides the ability to query or build reports against data that is stored in DB2 for i (or Microsoft SQL Server) databases through browser-based user interface technologies: Build reports with ease through the web-based, ribbon-like InfoAssist tool that leverages a common look and feel that can extend the number of personnel that can generate their own reports. Simplify the management of reports by significantly reducing the number of report definitions that are required through the use of parameter driven reports. Deliver data to users in many different formats, including directly into spreadsheets, or in boardroom-quality PDF format, or viewed from the browser in HTML. Leverage advanced reporting functions, such as matrix reporting, ranking, color coding, drill-down, and font customization to enhance the visualization of DB2 data. DB2 Web Query offers features to import Query/400 definitions and enhance their look and functions. By using it, you can add OLAP-like slicing and dicing to the reports or view reports in disconnected mode for users on the go. This IBM Redbooks® publication provides a broad understanding of what can be done with the DB2 Web Query product. This publication is a companion of DB2 Web Query Tutorials, SG24-8378, which has a group of self-explanatory tutorials to help you get up to speed quickly.

Oracle on LinuxONE

2017-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Helene Grosch , David J Simpson , Armelle Chevé , Moshe Reder , Narjisse Zaki , Lydia Parziale , Sam Amsavelu

Cloud Computing Linux Oracle data data-engineering oracle-database-solutions

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® LinuxONE. The enterprise-grade Linux on LinuxONE solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from LinuxONE. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

IBM z Systems Qualified DWDM Ciena 6500 Packet-Optical Platform Platform Release 10.21

2017-05-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andrew Crimmins , Octavian Lascu , Pasquale PJ Catalano

data data-engineering

This IBM® Redpaper™ publication is one in a series that describes IBM z Systems® qualified dense wavelength division multiplexing (DWDM) vendor products for IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) solutions with Server Time Protocol (STP). The protocols that are described in this paper are used for IBM supported solutions that require cross-site connectivity of a multisite Parallel Sysplex or remote copy technologies, which can include GDPS and non GDPS applications. GDPS qualification testing is conducted at the IBM Vendor Solutions Connectivity (VSC) Lab in Poughkeepsie, NY. IBM and Ciena completed qualification testing of the Ciena 6500 Packet-Optical Packet-Optical platform. This paper describes the applicable environments, protocols, and topologies that are qualified for and supported by z Systems for connecting through the Ciena 6500 Packet-Optical platform hardware and software, release level 10.21. This paper is intended for anyone who wants to learn more about Ciena 6500 Packet-Optical release level 10.21. This document is not meant to determine qualified products. To ensure that the planned products to be implemented are qualified, registered users can see the IBM Resource Link® for current information about qualified DWDM vendor products. For more information about IBM Redbooks® publications for z Systems qualified DWDM vendor products, see the IBM Redbooks website.

IBM Geographically Dispersed Resiliency for IBM Power Systems

2017-05-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Bunphot Chuprasertsuk , Fabio Martins , Bernhard Buehler , Matthew W Radford , Shawn Bodily , Maria-Katharina Esser , Anthony Steel , Bing He

data data-engineering

Abstract This IBM® Redbooks® publication introduces and provides a broad understanding of the new IBM Geographically Dispersed Resiliency for IBM Power Systems™ solution. The IBM Geographically Dispersed Resiliency for Power Systems solution is a set of software components that together provide a disaster recovery (DR) mechanism for virtual machines (VMs) running on an IBM POWER7® processor-based server or later. This document describes various components, subsystems, and tasks that are associated with the IBM Geographically Dispersed Resiliency for Power Systems solution. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for providing high availability (HA) and DR solutions and support on IBM Power Systems servers.

IBM zPDT Guide and Reference: System z Personal Development Tool

2017-05-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bill Ogden

Linux data data-engineering

Abstract This IBM® Redbooks® publication provides both introductory information and technical details about the IBM System z® Personal Development Tool (IBM zPDT®), which produces a small System z environment suitable for application development. zPDT is a PC Linux application. When zPDT is installed (on Linux), normal System z operating systems (such as IBM z/OS®) can be run on it. zPDT provides the basic System z architecture and emulated IBM 3390 disk drives, 3270 interfaces, OSA interfaces, and so on. The systems that are discussed in this document are complex. They have elements of Linux (for the underlying PC machine), IBM z/Architecture® (for the core zPDT elements), System z I/O functions (for emulated I/O devices), z/OS (the most common System z operating system), and various applications and subsystems under z/OS. The reader is assumed to be familiar with general concepts and terminology of System z hardware and software elements, and with basic PC Linux characteristics. This book provides the primary documentation for zPDT.

IBM GDPS Family: An introduction to Concepts and Capabilities

2017-04-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sim Schindel , David Clitherow , Marie-France Narbey , John Thompson (EY)

data data-engineering

Abstract This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings, and the additional planning and implementation services available from IBM are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you do read all the chapters, be aware that some information is intentionally repeated.

Implementing the IBM Storwize V7000 and IBM Spectrum Virtualize V7.8

2017-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jon Tate , Frank Enders , Catarina Castro , Giulio Fiscella , Dharmesh Kamdar , Paulo Tomiyoshi Takeda

Marketing data data-engineering

Abstract Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® introduces the IBM Storwize® V7000 solution powered by IBM Spectrum Virtualize™, which is an innovative storage offering that delivers essential storage efficiency technologies and exceptional ease of use and performance, all integrated into a compact, modular design that is offered at a competitive, midrange price. The IBM Storwize V7000 solution incorporates some of the top IBM technologies that are typically found only in enterprise-class storage systems, raising the standard for storage efficiency in midrange disk systems. This cutting-edge storage system extends the comprehensive storage portfolio from IBM and can help change the way organizations address the ongoing information explosion. This IBM Redbooks® publication introduces the features and functions of the IBM Storwize V7000 and IBM Spectrum Virtualize V7.8 system through several examples. This book is aimed at pre-sales and post-sales technical support and marketing and storage administrators. It helps you understand the architecture of the Storwize V7000, how to implement it, and how to take advantage of its industry-leading functions and features.

DS8000 Copy Services

2017-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lukasz Drózda , Warren Stanley , Roland Wolf , Lisa Gundy , Alcides Bertazi , Axel Westphal , Michael Frankenberg , Bert Dufrasne , Cay-Uwe Kulzer

data data-engineering

Abstract This IBM® Redbooks® publication helps you plan, install, tailor, configure, and manage Copy Services on the IBM DS8000® operating in an IBM z Systems® or Open Systems environment. This book helps you design and implement a new Copy Services installation or migrate from an existing installation. It includes hints and tips to maximize the effectiveness of your installation, and information about tools and products to automate Copy Services functions. It is intended for anyone who needs a detailed and practical understanding of the DS8000 Copy Services.

Understanding Metadata

2017-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Gidley , Federico Castanedo

Big Data Data Governance Data Lake Informatica Teradata Trifacta data data-engineering metadata

One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging. This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture. This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include: Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab Tooling from open source projects, including Teradata Kylo and Informatica Startups such as Trifacta and Zaloni that provide best of breed technology

talk-data.com

Activity Trend

Top Events

Top Speakers

IBM Spectrum Archive Enterprise Edition V1.2.4: Installation and Configuration Guide

IBM z14 Technical Introduction

Mastering Apache Spark 2.x - Second Edition

IBM Db2: Investigating Automatic Storage Table Spaces and Data Skew

IBM Spectrum Accelerate Deployment, Usage, and Maintenance

IBM Z Connectivity Handbook

Implementing OpenStack SwiftHLM with IBM Spectrum Archive EE or IBM Spectrum Protect for Space Management

IBM PowerHA SystemMirror V7.2.1 for IBM AIX Updates

Oracle on IBM z Systems

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V7.8

POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition

IBM DB2 Web Query for i: The Nuts and Bolts

Oracle on LinuxONE

IBM z Systems Qualified DWDM Ciena 6500 Packet-Optical Platform Platform Release 10.21

IBM Geographically Dispersed Resiliency for IBM Power Systems

IBM zPDT Guide and Reference: System z Personal Development Tool

IBM GDPS Family: An introduction to Concepts and Capabilities

Implementing the IBM Storwize V7000 and IBM Spectrum Virtualize V7.8

DS8000 Copy Services

Understanding Metadata