talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3432

Collection of O'Reilly books on Data Engineering.

Sessions & talks

Showing 476–500 of 3432 · Newest first

Search within this event →
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

Build efficient and scalable batch and real-time data ingestion pipelines, DevOps continuous integration and deployment pipelines, and advanced analytics solutions on the Azure Data Platform. This book teaches you to design and implement robust data engineering solutions using Data Factory, Databricks, Synapse Analytics, Snowflake, Azure SQL database, Stream Analytics, Cosmos database, and Data Lake Storage Gen2. You will learn how to engineer your use of these Azure Data Platform components for optimal performance and scalability. You will also learn to design self-service capabilities to maintain and drive the pipelines and your workloads. The approach in this book is to guide you through a hands-on, scenario-based learning process that will empower you to promote digital innovation best practices while you work through your organization’s projects, challenges, and needs. The clear examples enable you to use this book as a reference and guide for building data engineering solutions in Azure. After reading this book, you will have a far stronger skill set and confidence level in getting hands on with the Azure Data Platform. What You Will Learn Build dynamic, parameterized ELT data ingestion orchestration pipelines in Azure Data Factory Create data ingestion pipelines that integrate control tables for self-service ELT Implement a reusable logging framework that can be applied to multiple pipelines Integrate Azure Data Factory pipelines with a variety of Azure data sources and tools Transform data with Mapping Data Flows in Azure Data Factory Apply Azure DevOps continuous integration and deployment practices to your Azure Data Factory pipelines and development SQL databases Design and implement real-time streaming and advanced analytics solutions using Databricks, Stream Analytics, and Synapse Analytics Get started with a variety of Azure data services through hands-on examples Who This Book Is For Data engineers and data architects who are interested in learning architectural and engineering best practices around ELT and ETL on the Azure Data Platform, those who are creating complex Azure data engineering projects and are searching for patterns of success, and aspiring cloud and data professionals involved in data engineering, data governance, continuous integration and deployment of DevOps practices, and advanced analytics who want a full understanding of the many different tools and technologies that Azure Data Platform provides

IBM GDPS Family: An Introduction to Concepts and Capabilities

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

Data Modeling for Azure Data Services

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

SQL Server on Kubernetes: Designing and Building a Modern Data Platform

Build a modern data platform by deploying SQL Server in Kubernetes. Modern application deployment needs to be fast and consistent to keep up with business objectives and Kubernetes is quickly becoming the standard for deploying container-based applications, fast. This book introduces Kubernetes and its core concepts. Then it shows you how to build and interact with a Kubernetes cluster. Next, it goes deep into deploying and operationalizing SQL Server in Kubernetes, both on premises and in cloud environments such as the Azure Cloud. You will begin with container-based application fundamentals and then go into an architectural overview of a Kubernetes container and how it manages application state. Then you will learn the hands-on skill of building a production-ready cluster. With your cluster up and running, you will learn how to interact with your cluster and perform common administrative tasks. Once you can admin the cluster, you will learn how to deploy applications and SQL Server in Kubernetes. You will learn about high-availability options, and about using Azure Arc-enabled Data Services. By the end of this book, you will know how to set up a Kubernetes cluster, manage a cluster, deploy applications and databases, and keep everything up and running. What You Will Learn Understand Kubernetes architecture and cluster components Deploy your applications into Kubernetes clusters Manage your containers programmatically through API objects and controllers Deploy and operationalize SQL Server in Kubernetes Implement high-availability SQL Server scenarios on Kubernetes using Azure Arc-enabled Data Services Make use of Kubernetes deployments for Big Data Clusters Who This Book Is For DBAs and IT architects who are ready to begin planning their next-generation data platform and want to understand what it takes to run SQL Server in a container in Kubernetes. SQL Server on Kubernetes is an excellent choice for those who want to understand the big picture of why Kubernetes is the next-generation deployment method for SQL Server but also want to understand the internals, or the how, of deploying SQL Server in Kubernetes. When finished with this book, you will have the vision and skills to successfully architect, build and maintain a modern data platform deploying SQL Server on Kubernetes.

Designing Big Data Platforms

DESIGNING BIG DATA PLATFORMS Provides expert guidance and valuable insights on getting the most out of Big Data systems An array of tools are currently available for managing and processing data—some are ready-to-go solutions that can be immediately deployed, while others require complex and time-intensive setups. With such a vast range of options, choosing the right tool to build a solution can be complicated, as can determining which tools work well with each other. Designing Big Data Platforms provides clear and authoritative guidance on the critical decisions necessary for successfully deploying, operating, and maintaining Big Data systems. This highly practical guide helps readers understand how to process large amounts of data with well-known Linux tools and database solutions, use effective techniques to collect and manage data from multiple sources, transform data into meaningful business insights, and much more. Author Yusuf Aytas, a software engineer with a vast amount of big data experience, discusses the design of the ideal Big Data platform: one that meets the needs of data analysts, data engineers, data scientists, software engineers, and a spectrum of other stakeholders across an organization. Detailed yet accessible chapters cover key topics such as stream data processing, data analytics, data science, data discovery, and data security. This real-world manual for Big Data technologies: Provides up-to-date coverage of the tools currently used in Big Data processing and management Offers step-by-step guidance on building a data pipeline, from basic scripting to distributed systems Highlights and explains how data is processed at scale Includes an introduction to the foundation of a modern data platform Designing Big Data Platforms: How to Use, Deploy, and Maintain Big Data Systems is a must-have for all professionals working with Big Data, as well researchers and students in computer science and related fields.

Identity in Modern Applications

Mapping a person, place, or thing to a software resource in a verifiable manner is the basis of identity. Confirming that identity is a complex process, particularly when the identity mapping has to be verified genuine and authentic. Everything on the internet that houses private information is tied to identity and identity management. In this report, author Lee Atchison shows C-suite execs, engineering execs, architects, and others involved in building software applications the modern identity management techniques available to safeguard that simple access point. You'll learn how and why these techniques constantly need to keep up with modern application development, and you'll understand the growing sophistication of the people who safely interact or maliciously tamper with them. Explore the complex process of mapping a person, place, or thing to a software resource in a verifiable manner Get examples of real-world authentication, including methods and best practices for working with application credentials Understand the differences between single-factor and multifactor authentication Learn why every authentication method has flaws, including today's state-of-the-art processes Explore authorization, the process for granting users access to specific resources, and how it differs from authentication Understand trust relationships using trust systems to create more secure applications and systems

Amazon Redshift Cookbook

Dive into the world of Amazon Redshift with this comprehensive cookbook, packed with practical recipes to build, optimize, and manage modern data warehousing solutions. From understanding Redshift's architecture to implementing advanced data warehousing techniques, this book provides actionable guidance to harness the power of Amazon Redshift effectively. What this Book will help me do Master the architecture and core concepts of Amazon Redshift to architect scalable data warehouses. Optimize data pipelines and automate ETL processes for seamless data ingestion and management. Leverage advanced features like concurrency scaling and Redshift Spectrum for enhanced analytics. Apply best practices for security and cost optimization in Redshift projects. Gain expertise in scaling data warehouse solutions to accommodate large-scale analytics needs. Author(s) Shruti Worlikar, None Arumugam, and None Patel are seasoned experts in data warehousing and analytics with extensive experience using Amazon Redshift. Their backgrounds in implementing scalable data solutions make their insights practical and grounded. Through their collaborative writing, they aim to make complex topics approachable to learners of various skill levels. Who is it for? This book is tailored for professionals such as data warehouse developers, data engineers, and data analysts looking to master Amazon Redshift. It suits intermediate to advanced practitioners with a basic understanding of data warehousing and cloud technologies. Readers seeking to optimize Redshift for cost, performance, and security will find this guide invaluable.

Learning PHP, MySQL & JavaScript, 6th Edition

Build interactive, data-driven websites with the potent combination of open source technologies and web standards, even if you have only basic HTML knowledge. With the latest edition of this popular hands-on guide, you'll tackle dynamic web programming using the most recent versions of today's core technologies: PHP, MySQL, JavaScript, CSS, HTML5, jQuery, and the powerful React library. Web designers will learn how to use these technologies together while picking up valuable web programming practices along the way, including how to optimize websites for mobile devices. You'll put everything together to build a fully functional social networking site suitable for both desktop and mobile browsers. Explore MySQL from database structure to complex queries Use the MySQL PDO extension, PHP's improved MySQL interface Create dynamic PHP web pages that tailor themselves to the user Manage cookies and sessions and maintain a high level of security Enhance JavaScript with the React library Use Ajax calls for background browser-server communication Style your web pages by acquiring CSS skills Implement HTML5 features, including geolocation, audio, video, and the canvas element Reformat your websites into mobile web apps

IBM FlashSystem 9200 and 9100 Best Practices and Performance Guidelines

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM FlashSystem® 9100. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, remote copy services, and hosts. It explains how you can optimize disk performance with the IBM System Storage® Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting. This book is intended for experienced storage, SAN, IBM FlashSystem, SAN Volume Controller and Storwize® administrators and technicians. Understanding his book requires advanced knowledge of these environments. Important, IBM FlashSystem 9200: On 11th February 2020 IBM announced the arrival of the IBM FlashSystem 9200 to the family. This book was written specifically for IBM FlashSystem 9100, however most of the general principles will apply to the IBM FlashSystem 9200. If you are in any doubt as to their applicability to the FlashSystem 9200 then you should work with your local IBM representative. This book will be updated to include FlashSystem 9200 in due course.

IBM FlashSystem and VMware Implementation and Best Practices Guide

This IBM® Redbooks® publication details the configuration and best practices for using IBM's FlashSystem family of storage products within a VMware environment. This book was published in 2021 and specifically addresses Spectrum Virtualize Version 8.4 with VMware vSphere Version 7.0. Topics illustrate planning, configuring, operations, and preferred practices that include integration of FlashSystem storage systems with the VMware vCloud suite of applications: - vSphere Web Client (VWC) - vStorage APIs for Storage Awareness (VASA) - vStorage APIs for Array Integration (VAAI) - Site Recovery Manager (SRM) - vSphere Metro Storage Cluster (vMSC) This book is intended for presales consulting engineers, sales engineers, and IBM clients who want to deploy IBM FlashSystem® storage systems in virtualized data centers that are based on VMware vSphere.

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration. No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well. Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining. Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented. T-SQL is supported in SQL Server,Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you. What You Will Learn Describe distribution of variables with statistical measures Find associations between pairs of variables Evaluate the quality of the data you are analyzing Perform time-series analysis on your data Forecast values of a continuous variable Perform market-basket analysis to predict customer purchasing patterns Predict target variable outcomes from one or more input variables Categorize passages of text by extracting and analyzing keywords Who This Book Is For Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

Best Practices Guide for Databases on IBM FlashSystem

The purpose of this IBM® Redpaper® document is to provide best practice guidelines to design and implement IBM FlashSystem® storage for database workloads. The recommended settings and values are based on lab testing, proof of concept (PoC) and experience drawn from customer implementations. Suggestions that are presented in this document are applicable to most production database environments to increase performance of I/O and availability. However, more considerations might be required while designing, configuring, and implementing storage for extreme transactional, analytical, and database cluster environments. Customers are migrating database storage to IBM FlashSystem largely due to low latency performance of the IBM FlashSystem family of Storage. Using IBM FlashSystem, IBM customers are able to achieve low latency for queries and transactions from milliseconds to microseconds, realize a multi-fold increase in application level transactions per second, increase CPU efficiency and reduce database licensing costs. Recent additions of data reduction technologies to IBM FlashSystem further increase overall TCO benefits. All IBM FlashSystem models now offer compression, which can reduce database storage by 40 - 80% depending on database software. In addition to best practices that are described in this document, the IBM FlashSystem Worldwide Solutions Engineering Team can further assist customers with performing analysis of current database workloads for IBM FlashSystem benefits, perform PoCs at our labs, and help with implementation.

IBM TS4500 R7 Tape Library Guide

The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and better integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve investments in IBM tape library products. Now, you can achieve a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 11 petabytes (PB) of uncompressed data in a single frame library or scale up to 2 PB per square foot to over 350 PB. The TS4500 offers the following benefits: High availability: Dual active accessors with integrated service bays reduce inactive service space by 40%. The Elastic Capacity option can be used to eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to another 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for IBM TS1160 while also supporting TS1155, TS1150, and TS1140 tape drive: The TS1160 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1160 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The TS1160 Tape Drive Model 60E delivers a dual 10 Gb or 25 Gb Ethernet host attachment interface that is optimized for cloud-based and hyperscale environments. The TS1160 Tape Drive Model 60F delivers a native data rate of 400 MBps, the same load/ready, locate speeds, and access times as the TS1155, and includes dual-port 16 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 8 tape drive: The LTO Ultrium 8 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 7, while still protecting your investment in the previous technology. Support of LTO 8 Type M cartridge (m8): The LTO Program introduced a new capability with LTO-8 drives. The ability of the LTO-8 drive to write 9 TB on a brand new LTO-7 cartridge instead of 6 TB as specified by the LTO-7 format. Such a cartridge is called an LTO-7 initialized LTO-8 Type M cartridge. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), command-line interface (CLI), and REST over SCSI (RoS) to obtain status information about library components. October 2020 - Added support for the 3592 model 60S tape drive that provides a dual-port 12 Gb SAS (Serial Attached SCSI) interface for host attachment.

Data Lakes For Dummies

Take a dive into data lakes “Data lakes” is the latest buzz word in the world of data storage, management, and analysis. Data Lakes For Dummies decodes and demystifies the concept and helps you get a straightforward answer the question: “What exactly is a data lake and do I need one for my business?” Written for an audience of technology decision makers tasked with keeping up with the latest and greatest data options, this book provides the perfect introductory survey of these novel and growing features of the information landscape. It explains how they can help your business, what they can (and can’t) achieve, and what you need to do to create the lake that best suits your particular needs. With a minimum of jargon, prolific tech author and business intelligence consultant Alan Simon explains how data lakes differ from other data storage paradigms. Once you’ve got the background picture, he maps out ways you can add a data lake to your business systems; migrate existing information and switch on the fresh data supply; clean up the product; and open channels to the best intelligence software for to interpreting what you’ve stored. Understand and build data lake architecture Store, clean, and synchronize new and existing data Compare the best data lake vendors Structure raw data and produce usable analytics Whatever your business, data lakes are going to form ever more prominent parts of the information universe every business should have access to. Dive into this book to start exploring the deep competitive advantage they make possible—and make sure your business isn’t left standing on the shore.

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

This IBM® Redbooks® publication is a guide to IBM Power Private Cloud with Shared Utility Capacity featuring Power Enterprise Pools 2.0 (also known as PEP 2.0). This technology allows multiple servers in an to share base processor and memory resources, and draw upon pre-paid credits when the base is exceeded. Previously, the Shared Utility feature supported IBM Power System E950 (9040-MR9) and IBM Power System E980 (9080-M9S). It was extended in August 2020 to include the Scale-out Power Systems announced on July 14th 2020 and received dedicated processor support later in the year. The IBM Power System S922 (9009-22G), and IBM Power System S924 (9009-42G) servers which use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems are now supported. The previous Scale-out models: IBM Power System S922 (9009-22A), and IBM Power System S924 (9009-42A) servers cannot be added to an Enterprise Pool. The goal of this book is to provide an overview of the environment and guidance for planning a deployment. The paper also covers how to configure PEP 2.0. There are also chapters on migrating from PEP 1.0 to PEP 2.0 and various use cases. This publication is for professionals who want to acquire a better understanding of IBM Power Private Cloud, and Shared Utility. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners This book expands the set of Power Systems documentation by providing a desktop reference which offers a detailed technical description of IBM Power Private Cloud, and Shared Utility.

Self-Sovereign Identity

In a world of changing privacy regulations, identity theft, and online anonymity, identity is a precious and complex concept. Self-Sovereign Identity (SSI) is a set of technologies that move control of digital identity from third party “identity providers” directly to individuals, and it promises to be one of the most important trends for the coming decades. Now in Self-Sovereign Identity, privacy and personal data experts Drummond Reed and Alex Preukschat lay out a roadmap for a future of personal sovereignty powered by the Blockchain and cryptography. Cutting through the technical jargon with dozens of practical use cases from experts across all major industries, it presents a clear and compelling argument for why SSI is a paradigm shift, and shows how you can be ready to be prepared for it. About the Technology Trust on the internet is at an all-time low. Large corporations and institutions control our personal data because we’ve never had a simple, safe, strong way to prove who we are online. Self-sovereign identity (SSI) changes all that. About the Book In Self-Sovereign Identity: Decentralized digital identity and verifiable credentials, you’ll learn how SSI empowers us to receive digitally-signed credentials, store them in private wallets, and securely prove our online identities. It combines a clear, jargon-free introduction to this blockchain-inspired paradigm shift with interesting essays written by its leading practitioners. Whether for property transfer, ebanking, frictionless travel, or personalized services, the SSI model for digital trust will reshape our collective future. What's Inside The architecture of SSI software and services The technical, legal, and governance concepts behind SSI How SSI affects global business industry-by-industry Emerging standards for SSI About the Reader For technology and business readers. No prior SSI, cryptography, or blockchain experience required. About the Authors Drummond Reed is the Chief Trust Officer at Evernym, a technology leader in SSI. Alex Preukschat is the co-founder of SSIMeetup.org and AlianzaBlockchain.org. Quotes This book is a comprehensive roadmap to the most crucial fix for today’s broken Internet. - Brian Behlendorf, GM for Blockchain, Healthcare and Identity at the Linux Foundation If trusted relationships over the Internet are important to you or your business, this book is for you. - John Jordan, Executive Director, Trust over IP Foundation Decentralized identity represents not only a wide range of trust-enabling technologies, but also a paradigm shift in our increasingly digital-first world. - Rouven Heck, Executive Director, Decentralized Identity Foundation

Implementation Guide for IBM Elastic Storage System 3000

This IBM® Redbooks publication introduces and describes the IBM Elastic Storage® Server 3000 (ESS 3000) as a scalable, high-performance data and file management solution. The solution is built on proven IBM Spectrum® Scale technology, formerly IBM General Parallel File System (IBM GPFS). IBM Elastic Storage System 3000 is an all-Flash array platform. This storage platform uses NVMe-attached drives in ESS 3000 to provide significant performance improvements as compared to SAS-attached flash drives. This book provides a technical overview of the ESS 3000 solution and helps you to plan the installation of the environment. We also explain the use cases where we believe it fits best. Our goal is to position this book as the starting point document for customers that would use ESS 3000 as part of their IBM Spectrum Scale setups. This book is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective storage solutions with ESS 3000.

Data Fabric as Modern Data Architecture

Data fabric is a hot concept in data management today. By encompassing the data ecosystem your company already has in place, this architectural design pattern provides your staff with one reliable place to go for data. In this report, author Alice LaPlante shows CIOs, CDOs, and CAOs how data fabric enables their users to spend more time analyzing than wrangling data. The best way to thrive during this intense period of digital transformation is through data. But after roaring through 2019, progress on getting the most out of data investments has lost steam. Only 38% of companies now say they've created a data-driven organization. This report describes how a data fabric can help you reach the all-important goal of data democratization. Learn how data fabric handles data prep, data delivery, and serves as a data catalog Use data fabric to handle data variety, a top challenge for many organizations Learn how data fabric spans any environment to support data for users and use cases from any source Examine data fabric's capabilities including data and metadata management, data quality, integration, analytics, visualization, and governance Get five pieces of advice for getting started with data fabric

Storage Multi-tenancy for Red Hat OpenShift Container Platform with IBM Storage

With IBM® Spectrum Virtualize and the Object-Based Access Control, you can implement multi-tenancy and secure storage usage in a Red Hat OpenShift environment. This IBM Redpaper® publication shows you how to secure the storage usage from the Openshift user to the IBM Spectrum® Virtualize array. You see how to restrict storage usage in a Red Hat Openshift Container Platform to avoid the over-consumption of storage by one or more user. These uses cases can be expanded to the use of this control to provide assistance with billing.

IBM Fibre Channel Endpoint Security for IBM DS8900F and IBM Z

This IBM® Redbooks® publication will help you install, configure, and use the new IBM Fibre Channel Endpoint Security function. The focus of this publication is about securing the connection between an IBM DS8900F and the IBM z15™. The solution is delivered with two levels of link security supported: support for link authentication on Fibre Channel links and support for link encryption of data in flight (which also includes link authentication). This solution is targeted for clients needing to adhere to Payment Card Industry (PCI) or other emerging data security standards, and those who are seeking to reduce or eliminate insider threats regarding unauthorized access to data.

97 Things Every Data Engineer Should Know

Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges. Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers. Topics include: The Importance of Data Lineage - Julien Le Dem Data Security for Data Engineers - Katharine Jarmul The Two Types of Data Engineering and Data Engineers - Jesse Anderson Six Dimensions for Picking an Analytical Data Warehouse - Gleb Mezhanskiy The End of ETL as We Know It - Paul Singman Building a Career as a Data Engineer - Vijay Kiran Modern Metadata for the Modern Data Stack - Prukalpa Sankar Your Data Tests Failed! Now What? - Sam Bail

Machine Learning for Oracle Database Professionals: Deploying Model-Driven Applications and Automation Pipelines

Database developers and administrators will use this book to learn how to deploy machine learning models in Oracle Database and in Oracle’s Autonomous Database cloud offering. The book covers the technologies that make up the Oracle Machine Learning (OML) platform, including OML4SQL, OML Notebooks, OML4R, and OML4Py. The book focuses on Oracle Machine Learning as part of the Oracle Autonomous Database collaborative environment. Also covered are advanced topics such as delivery and automation pipelines. Throughout the book you will find practical details and hand-on examples showing you how to implement machine learning and automate deployment of machine learning. Discussion around the examples helps you gain a conceptual understanding of machine learning. Important concepts discussed include the methods involved, the algorithms to choose from, and mechanisms for process and deployment. Seasoned database professionals looking to make the leap into machine learning as a growth path will find much to like in this book as it helps you step up and use your current knowledge of Oracle Database to transition into providing machine learning solutions. What You Will Learn Use the Oracle Machine Learning (OML) Notebooks for data visualization and machine learning model building and evaluation Understand Oracle offerings for machine learning Develop machine learning with Oracle database using the built-in machine learning packages Develop and deploy machine learning models using OML4SQL and OML4R Leverage the Oracle Autonomous Database and its collaborative environment for Oracle Machine Learning Develop and deploy machine learning projects in Oracle Autonomous Database Build an automated pipeline that can detect and handle changes in data/model performance Who This Book Is For Database developers and administrators who want to learn about machine learning, developers who want to build models and applications using Oracle Database’s built-in machine learning feature set, and administrators tasked with supporting applications on Oracle Database that make use of the Oracle Machine Learning feature set

Azure Data Factory by Example: Practical Implementation for Data Engineers

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. The hands-on introduction to ADF found in this book is equally well-suited to data engineers embracing their first ETL/ELT toolset as it is to seasoned veterans of Microsoft’s SQL Server Integration Services (SSIS). The example-driven approach leads you through ADF pipeline construction from the ground up, introducing important ideas and making learning natural and engaging. SSIS users will find concepts with familiar parallels, while ADF-first readers will quickly master those concepts through the book’s steady building up of knowledge in successive chapters. Summaries of key concepts at the end of each chapter provide a ready reference that you can return to again and again. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases

This IBM Redpaper™ publication introduces the IBM Spectrum Scale immutability function. It shows how to set it up and presents different ways for managing immutable and append-only files. This publication also provides guidance for implementing IT security aspects in an IBM Spectrum Scale cluster by addressing regulatory requirements. It also describes two typical use cases for managing immutable files. One use case involves applications that manage file immutability; the other use case presents a solution to automatically set files to immutable within a IBM Spectrum Scale immutable fileset.

IBM Spectrum Archive Enterprise Edition V1.3.1.2: Installation and Configuration Guide

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum® Archive Enterprise Edition (EE) Version 1.3.1.2 for the IBM TS4500, IBM TS3500, IBM TS4300, and IBM TS3310 tape libraries. IBM Spectrum Archive Enterprise Edition enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale based environment. It helps encourage the use of tape as a critical tier in the storage environment. This is the ninth edition of IBM Spectrum Archive Installation and Configuration Guide. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 8, 7, 6, and 5 tape drives in IBM® TS3310, TS3500, TS4300, and TS4500 tape libraries. In addition, IBM TS1160, TS1155, TS1150, and TS1140 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM customers, IBM Business Partners, IBM specialist sales representatives, and technical specialists.