talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3406

Collection of O'Reilly books on Data Engineering.

Filtering by: data ×

Sessions & talks

Showing 626–650 of 3406 · Newest first

Search within this event →
IBM Power Systems Infrastructure I/O for SAP Applications

This IBM® Redpaper publication describes practical experiences to run SAP workloads to take advantage of IBM Power Systems I/O capabilities. With IBM POWER® processor-based servers, you have the flexibility to fit seamlessly new applications and workloads into a single data center, and even consolidate them into a single server. This approach highlights all viable options and describes the pros and cons of each one to select the correct option for a specific data center. The target audiences of this book are architects, IT specialists, and systems administrators deploying SAP workloads, who spend much time and effort managing, provisioning, and monitoring SAP software systems and landscapes on IBM Power Systems servers.

MOS Study Guide for Microsoft Access Expert Exam MO-500

Advance your everyday proficiency with Access 2019. And earn the credential that proves it! Demonstrate your expertise with Microsoft Access! Designed to help you practice and prepare for Microsoft Office Specialist (MOS): Access 2019 certification, this official Study Guide delivers: In-depth preparation for each MOS objective Detailed procedures to help build the skills measured by the exam Hands-on tasks to practice what you've learned Practice files and sample solutions Sharpen the skills measured by these objectives: Create and manage databases Build tables Create queries Create forms Create reports About MOS A Microsoft Office Specialist (MOS) certification validates your proficiency with Microsoft Office programs, demonstrating that you can meet globally recognized performance standards. Hands-on experience with the technology is required to successfully pass Microsoft Certification exams.

Geographical Modeling

The modeling of cities and territories has progressed greatly in the last 20 years. This is firstly due to geographic information systems, followed by the availability of large amounts of georeferenced data – both on the Internet and through the use of connected objects. In addition, the rise in performance of computational methods for the simulation and exploration of dynamic models has facilitated advancement. Geographical Modeling presents previously unpublished information on the main advances achieved by these new approaches. Each of the six chapters builds a bibliographic review and precisely describes the methods used, highlighting their advantages and discussing their interpretations. They are all illustrated by many examples. The book also explains with clarity the theoretical foundations of geographical analysis, the delicate operations of model selection, and the applications of fractals and scaling laws. These applications include gaining knowledge of the morphology of cities and the organization of urban transport, and finding new methods of building and exploring simulation models and visualizations of data and results.

IBM FlashSystem 9200R Rack Solution Product Guide

The FlashSystem 9200 combines the performance of flash and end-to-end Non-Volatile Memory Express (NVMe) with the reliability and innovation of IBM® FlashCore technology, the ultra-low latency of Storage Class Memory (SCM), the rich features of IBM Spectrum® Virtualize and AI predictive storage management, and proactive support by Storage Insights. All of these features are included in a powerful 2U enterprise-class, blazing fast storage all-flash array.

Building a Unified Data Infrastructure

The vast majority of businesses today already have a documented data strategy. But only a third of these forward-thinking companies have evolved into data-driven organizations or even begun to move toward a data culture. Most have yet to treat data as a business asset, much less use data and analytics to compete in the marketplace. What’s the solution? This insightful report demonstrates the importance of creating a holistic data infrastructure approach. You’ll learn how data virtualization (DV), master data management (MDM), and metadata-management capabilities can help your organization meet business objectives. Chief data officers, enterprise architects, analytics leaders, and line-of-business executives will understand the benefits of combining these capabilities into a unified data platform. Explore three separate business contexts that depend on data: operations, analytics, and governance Learn a pragmatic and holistic approach to building a unified data infrastructure Understand the critical capabilities of this approach, including the ability to work with existing technology Apply six best practices for combining data management capabilities

Streaming Integration

Data is being generated at an unrelenting pace, and data storage capacity can’t keep up. Enterprises must modernize the way they use and manage data by collecting, processing, and analyzing it in real time—in other words, streaming. This practical report explains everything organizations need to know to begin their streaming integration journey and make the most of their data. Authors Steve Wilkes and Alok Pareek detail the key attributes and components of an enterprise-grade streaming integration platform, along with stream processing and analysis techniques that will help companies reap immediate value from their data and solve their most pressing business challenges. Learn how to collect and handle large volumes of data at scale See how streams move data between threads, processes, servers, and data centers Get your data in the form you need and analyze it in real time Dive into the pros and cons of data targets such as databases, Hadoop, and cloud services for specific use cases Ensure your streaming integration infrastructure scales, is secure, works 24/7, and can handle failure

The Evolving Role of the Data Engineer

Companies working to become data driven often view data scientists as heroes, but that overlooks the vital role that data engineers play in the process. While data scientists focus on finding new insights from datasets, data engineers deal with preparation—obtaining, cleaning, and creating enhanced versions of the data an organization needs. In this report, Andy Oram examines how the role of data engineer has quickly evolved. DBAs, software engineers, developers, and students will explore the responsibilities of modern data engineers and the skills and tools necessary to do the job. You’ll learn how to deal with software engineering concepts such as rapid and continuous development, automation and orchestration, modularity, and traceability. Decision makers considering a move to the cloud will also benefit from the in-depth discussion this report provides. This report covers: Major tasks of data engineers today The different levels of structure in data and ways to maximize its value Capabilities of third-party cloud options Tools for ingestion, transfer, and enrichment Using containers and VMs to run the tools Software engineering development Automation and orchestration of data engineering

IBM Spectrum Virtualize HyperSwap SAN Implementation and Design Best Practices

In this paper, we outline some IBM® Spectrum Virtualize HyperSwap® SAN implementation and design best practices for optimum resiliency of the SAN Volume Controller cluster. It provides IBM Spectrum® Virtualize HyperSwap and Stretched Cluster configuration details. Note: In this book, for brevity, we use HyperSwap to refer to both HyperSwap and Stretched Cluster. The documentation there details the minimum requirements. However, it does not describe the design of the storage area network (SAN) in detail, nor does it describe the recommended way to implement those requirements on a SAN. In this IBM Redpaper publication, we outline some of the best practices for SAN design and implementation that leads to optimum resiliency of the SAN Volume Controller (SVC) cluster, and we explain why each recommendation is made. This paper is SAN vendor-neutral wherever possible. Any mention of a specific SAN switch vendor, or terms used by a specific switch vendor, is made only where relevant to a specific context, and does not imply an endorsement of a specific switch vendor. Note: Some of the figures in this document might not depict redundant fabrics or storage configurations. This was done for simplicity, and it should be assumed that any recommendations made for fabric design assume that there are two redundant fabrics.

IBM Spectrum LSF Suite: Installation Best Practices Guide

This IBM® Redpaper publication describes IBM Spectrum® LSF® Suite best practices installation topics, application checks for workload management, and high availability configurations by using theoretical knowledge and hands-on exercises. These findings are documented by way of sample scenarios. This publication addresses topics for sellers, IT architects, IT specialists, and anyone who wants to implement and manage a high-performing workload management solution with LSF. Moreover, this guide provides documentation to transfer how-to-skills to the technical teams, and solution guidance to the sales team. This publication compliments documentation that is available at IBM Knowledge Center, and aligns with educational materials that are provided by IBM Systems.

IBM DS8000 Encryption for data at rest, Transparent Cloud Tiering, and Endpoint Security (DS8000 Release 9.0)

IBM® experts recognize the need for data protection, both from hardware or software failures, and from physical relocation of hardware, theft, and retasking of existing hardware. The IBM DS8000® supports encryption-capable hard disk drives (HDDs) and flash drives. These Full Disk Encryption (FDE) drive sets are used with key management services that are provided by IBM Security Key Lifecycle Manager software or Gemalto SafeNet KeySecure to allow encryption for data at rest. Use of encryption technology involves several considerations that are critical for you to understand to maintain the security and accessibility of encrypted data. Failure to follow the requirements that are described in the IBM Redpaper can result in an encryption deadlock. Starting with Release 8.5 code, the DS8000 also supports Transparent Cloud Tiering (TCT) data object encryption. With TCT encryption, data is encrypted before it is transmitted to the cloud. The data remains encrypted in cloud storage and is decrypted after it is transmitted back to the IBM DS8000. Starting with DS8000 Release 9.0, the DS8900F provides Fibre Channel Endpoint Security when communicating with an IBM z15™, which supports link authentication and the encryption of data that is in-flight. For more information, see IBM Fibre Channel Endpoint Security for IBM DS8900F and IBM Z, SG24-8455. This edition focuses on IBM Security Key Lifecycle Manager Version 3.0.1.3 or later, which enables support Key Management Interoperability Protocol (KMIP) with the DS8000 Release 9.0 code or later and updated DS GUI for encryption functions.

SAP HANA on IBM Power Systems Architectural Summary

This IBM® Redpaper publication delivers SAP HANA architectural concepts for successful implementation on IBM Power Systems servers. This publication addresses topics for sellers, IT architects, IT specialists, and anyone who wants to understand how to take advantage of running SAP HANA workloads on Power Systems servers. Moreover, this guide provides documentation to transfer how-to skills to the technical teams, and it provides solution guidance to the sales team. This publication complements documentation that is available at IBM Knowledge Center, and it aligns with educational materials that are provided by IBM Systems.

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases

This IBM Redpaper™ publication introduces the IBM Spectrum Scale immutability function. It shows how to set it up and presents different ways for managing immutable and append-only files. This publication also provides guidance for implementing IT security aspects in an IBM Spectrum Scale cluster by addressing regulatory requirements. It also describes two typical use cases for managing immutable files. One use case involves applications that manage file immutability; the other use case presents a solution to automatically set files to immutable within a IBM Spectrum Scale immutable fileset.

Block Storage Migration in Open Environments

Companies need to migrate data not only when technology needs to be replaced, but also for consolidation, load balancing, and disaster recovery (DR). Data migration is a critical operation, and this book explains the phases and steps to ensure a smooth migration. Topics range from planning and preparation to execution and validation. The book explains, from a generic standpoint, the appliance-based, storage-based, and host-based techniques that can be used to accomplish the migration. Each method is explained through practical migration scenarios and for various operating systems. This publication addresses the aspects of data migration efforts while focusing on fixed block storage systems in open environment with the IBM® FlashSystem 9100 as the target system. Therefore, the book also emphasizes various migration techniques using the Spectrum Virtualize built-in functions. This document targets storage administrators, storage network administrators, system designers, architects, and IT professionals who design, administer or plan data migrations in large data Centers. The aim is to ensure that you are aware of the current thinking, methods, and products that IBM can make available to you. These items are provided to ensure a data migration process that is as efficient and problem-free as possible. The material presented in this book was developed with versions of the referenced products as of February, 2020.

IBM Spectrum Protect Plus Practical Guidance for Deployment, Configuration, and Usage

IBM® Spectrum Protect Plus is a data protection solution that provides near-instant recovery, replication, retention, and reuse for virtual machines, databases, and applications in hybrid multicloud environments. IBM Knowledge Center for IBM Spectrum® Protect Plus provides extensive documentation for installation, deployment, and usage. In addition, IBM Spectrum Protect Plus Blueprint (https://ibm.biz/IBMSpectrumProtectPlusBlueprints) provides guidance about how to build and size an IBM Spectrum Protect Plus solution. The goal of this IBM Redpaper publication is to summarize and complement the available information by providing useful hints and tips based on the authors' practical experience in installing and supporting IBM Spectrum Protect Plus in actual customer environments. Over time, our aim is to compile a set of best practices that cover all aspects of the product, from planning and installation to tuning, maintenance, and troubleshooting.

Building an Anonymization Pipeline

How can you use data in a way that protects individual privacy but still provides useful and meaningful analytics? With this practical book, data architects and engineers will learn how to establish and integrate secure, repeatable anonymization processes into their data flows and analytics in a sustainable manner. Luk Arbuckle and Khaled El Emam from Privacy Analytics explore end-to-end solutions for anonymizing device and IoT data, based on collection models and use cases that address real business needs. These examples come from some of the most demanding data environments, such as healthcare, using approaches that have withstood the test of time. Create anonymization solutions diverse enough to cover a spectrum of use cases Match your solutions to the data you use, the people you share it with, and your analysis goals Build anonymization pipelines around various data collection models to cover different business needs Generate an anonymized version of original data or use an analytics platform to generate anonymized outputs Examine the ethical issues around the use of anonymized data

IBM z15 Technical Introduction

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform, the IBM z15™. It includes information about the Z environment and how it helps integrate data and transactions more securely. It also provides insight for faster and more accurate business decisions. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, and occupies an industry-standard footprint. It is offered as a single air-cooled 19-inch frame called the z15 T02, or as a multi-frame (1 to 4 19-inch frames) called the z15 T01. Both z15 models excel at the following tasks: Using hybrid multicloud integration services Securing and protecting data with encryption everywhere Providing resilience with key to zero downtime Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and IBM Z technologies This book explains how this system uses innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

Protecting Data Privacy Beyond the Trusted System of Record

To help you safeguard your sensitive data and provide ease of auditability and control, IBM introduced a new capability for IBM Z® called IBM Data Privacy Passports. It can help minimize the risk and impact of data loss and privacy breaches when collecting and storing sensitive data. Data Privacy Passports can manage how data is shared securely through a central control of user access. Data Privacy Passports can protect data wherever it goes. Security policies are kept and honored whenever the data is accessed. Future data access may be revoked remotely via Data Privacy Passports, long after data leaves the system of record, and sensitive data may even be made unusable simply by destroying its encryption key. Data Privacy Passports is designed to help reduce the time that is spent by staff to protect data and ensure privacy throughout its lifecycle via a central point of control. This IBM Redguide presents a business view of Data Privacy Passports, including how data privacy and protection concerns are addressed. We also explore how value is gained through various business model examples.

IBM Spectrum Scale CSI Driver for Container Persistent Storage

IBM® Spectrum Scale is a proven, scalable, high-performance data and file management solution. It provides world-class storage management with extreme scalability, flash accelerated performance, automatic policy-based storage that has tiers of flash through disk to tape. It also provides support for various protocols, such as NFS, SMB, Object, HDFS, and iSCSI. Containers can leverage the performance, information lifecycle management (ILM), scalability, and multisite data management to give the full flexibility on storage as they experience on the runtime. Container adoption is increasing in all industries, and they sprawl across multiple nodes on a cluster. The effective management of containers is necessary because their number will probably reach a far greater number than virtual machines today. Kubernetes is the standard container management platform currently being used. Data management is of ultimate importance, and often is forgotten because the first workloads containerized are ephemeral. For data management, many drivers with different specifications were available. A specification named Container Storage Interface (CSI) was created and is now adopted by all major Container Orchestrator Systems available. Although other container orchestration systems exist, Kubernetes became the standard framework for container management. It is a very flexible open source platform used as the base for most cloud providers and software companies' container orchestration systems. Red Hat OpenShift is one of the most reliable enterprise-grade container orchestration systems based on Kubernetes, designed and optimized to easily deploy web applications and services. OpenShift enables developers to focus on the code, while the platform takes care of all of the complex IT operations and processes. This IBM Redbooks® publication describes how the CSI Driver for IBM file storage enables IBM Spectrum® Scale to be used as persistent storage for stateful applications running in Kubernetes clusters. Through the Container Storage Interface Driver for IBM file storage, Kubernetes persistent volumes (PVs) can be provisioned from IBM Spectrum Scale. Therefore, the containers can be used with stateful microservices, such as database applications (MongoDB, PostgreSQL, and so on).

Cassandra: The Definitive Guide, 3rd Edition

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third edition—updated for Cassandra 4.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s nonrelational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 4

IBM Storage for Red Hat OpenShift is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift environment. This blueprint includes Red Hat OpenShift Container Platform and uses Container Storage Interface (CSI) standards. IBM Storage brings enterprise data services to containers. In this blueprint, learn how to: · Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! · Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform utilizing new open source Container Storage interface (CSI) drivers · Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform is designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

SAP HANA Data Management and Performance on IBM Power Systems

This IBM® Redpaper Redbooks publication provides information and concepts about how to take advantage of SAP HANA and IBM Power Systems features to manage data and performance efficiently. The target audience of this book includes architects, IT specialists, and systems administrators who deploy SAP HANA and manage data and SAP system performance.

Modern Big Data Architectures

Provides an up-to-date analysis of big data and multi-agent systems The term Big Data refers to the cases, where data sets are too large or too complex for traditional data-processing software. With the spread of new concepts such as Edge Computing or the Internet of Things, production, processing and consumption of this data becomes more and more distributed. As a result, applications increasingly require multiple agents that can work together. A multi-agent system (MAS) is a self-organized computer system that comprises multiple intelligent agents interacting to solve problems that are beyond the capacities of individual agents. Modern Big Data Architectures examines modern concepts and architecture for Big Data processing and analytics. This unique, up-to-date volume provides joint analysis of big data and multi-agent systems, with emphasis on distributed, intelligent processing of very large data sets. Each chapter contains practical examples and detailed solutions suitable for a wide variety of applications. The author, an internationally-recognized expert in Big Data and distributed Artificial Intelligence, demonstrates how base concepts such as agent, actor, and micro-service have reached a point of convergence—enabling next generation systems to be built by incorporating the best aspects of the field. This book: Illustrates how data sets are produced and how they can be utilized in various areas of industry and science Explains how to apply common computational models and state-of-the-art architectures to process Big Data tasks Discusses current and emerging Big Data applications of Artificial Intelligence Modern Big Data Architectures: A Multi-Agent Systems Perspective is a timely and important resource for data science professionals and students involved in Big Data analytics, and machine and artificial learning.

Open Source Data Pipelines for Intelligent Applications

For decades, businesses have used information about their customers to make critical decisions on what to stock in inventory, which items to recommend to customers, and when to run promotions. But the advent of big data early in this century changed the game considerably. The key to achieving a competitive advantage today is the ability to process and store ever-increasing amounts of information that affect those decisions. In this report, solutions specialists from Red Hat provide an architectural guide to help you navigate the modern data analytics ecosystem. You’ll learn how the industry has evolved and examine current approaches to storage. That includes a deep dive into the anatomy of a portable data platform architecture, along with several aspects of running data pipelines and intelligent applications with Kubernetes. Explore the history of open source data processing and the evolution of container scheduling Get a concise overview of intelligent applications Learn how to use storage with Kubernetes to produce effective intelligent applications Understand how to structure applications on Kubernetes in your platform architecture Delve into example pipeline architectures for deploying intelligent applications on Kubernetes

SAP HANA Platform Migration

This IBM® Redpaper publication provides SAP HANA platform migration information and details for successful planning for migration to IBM Power Systems servers. This publication addresses topics for sellers, IT architects, IT specialists, and anyone who wants to migrate and manage SAP workloads on Power Systems servers. Moreover, this guide provides documentation to transfer how-to skills to the technical teams, and it provides solution guidance to the sales team. This publication complements documentation that is available at IBM Knowledge Center, and it aligns with educational materials that are provided by IBM Systems.