O'Reilly Data Engineering Books

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale

2021-08-27 O'Reilly Amazon

book

John Sing , Prashanth Shetty , Wei Gong , Linda Cham

data data-engineering Hadoop cloudera Analytics CDP

This IBM® Redpaper publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum® Scale and Cloudera Data Platform (CDP) Private Cloud Base for performing in-place Cloudera Hadoop or Cloudera Spark-based analytics. It also covers the benefits of the integrated solution and gives guidance about the types of deployment models and considerations during the implementation of these models. August 2021 update added CES protocol support in Hadoop environment

Cloud Native Integration with Apache Camel: Building Agile and Scalable Integrations for Kubernetes Platforms

2021-08-25 O'Reilly Amazon

book

Guilherme Camposo

data data-engineering streaming-messaging camel Agile/Scrum API

Address the most common integration challenges, by understanding the ins and outs of the choices and exemplifying the solutions with practical examples on how to create cloud native applications using Apache Camel. Camel will be our main tool, but we will also see some complementary tools and plugins that can make our development and testing easier, such as Quarkus, and tools for more specific use cases, such as Apache Kafka and Keycloak. You will learn to connect with databases, create REST APIs, transform data, connect with message oriented software (MOMs), secure your services, and test using Camel. You will also learn software architecture patterns for integration and how to leverage container platforms, such as Kubernetes. This book is suitable for those who are eager to learn an integration tool that fits the Kubernetes world, and who want to explore the integration challenges that can be solved using containers. What You Will Learn Focus on how to solve integration challenges Understand the basics of the Quarkus as it’s the foundation for the application Acquire a comprehensive view on Apache Camel Deploy an application in Kubernetes Follow good practices Who This Book Is For Java developers looking to learn Apache Camel; Apache Camel developers looking to learn more about Kubernetes deployments; software architects looking to study integration patterns for Kubernetes based systems; system administrators (operations teams) looking to get a better understand of how technologies are integrated.

Developing Modern Applications with a Converged Database

2021-08-25 O'Reilly Amazon

book

Alice LaPlante

data data-engineering relational-databases Analytics API Blockchain

Single-purpose databases were designed to address specific problems and use cases. Given this narrow focus, there are inherent tradeoffs required when trying to accommodate multiple datatypes or workloads in your enterprise environment. The result is data fragmentation that spills over into application development, IT operations, data security, system scalability, and availability. In this report, author Alice LaPlante explains why developing modern, data-driven applications may be easier and more synergistic when using a converged database. Senior developers, architects, and technical decision-makers will learn cloud-native application development techniques for working with both structured and unstructured data. You'll discover ways to run transactional and analytical workloads on a single, unified data platform. This report covers: Benefits and challenges of using a converged database to develop data-driven applications How to use one platform to work with both structured and unstructured data that includes JSON, XML, text and files, spatial and graph, Blockchain, IoT, time series, and relational data Modern development practices on a converged database, including API-driven development, containers, microservices, and event streaming Use case examples including online food delivery, real-time fraud detection, and marketing based on real-time analytics and geospatial targeting

Cyber Resiliency Solution using IBM Spectrum Virtualize

2021-08-20 O'Reilly Amazon

book

IBM

data data-engineering IBM

This document is intended to facilitate the solution for Safeguarded Copy for cyber resiliency and logical air gap solution for IBM FlashSystem and SAN Volume Controller. The document showcases the configuration and end-to-end architecture for configuring the logical air-gap solution for cyber resiliency by using the Safeguarded Copy feature in IBM FlashSystem and IBM SAN Volume Control storage. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM FlashSystem or IBM SAN Volume Controller storage devices are supported and entitled and where the issues are specific to a blueprint implementation.

Data Modeling with SAP BW/4HANA 2.0: Implementing Agile Data Models Using Modern Modeling Concepts

2021-08-18 O'Reilly Amazon

book

Konrad Zaleski

data data-engineering data-models Agile/Scrum BI Data Modelling

Gain practical guidance for implementing data models on the SAP BW/4HANA platform using modern modeling concepts. You will walk through the various modeling scenarios such as exposing HANA tables and views through BW/4HANA, creating virtual and hybrid data models, and integrating SAP and non-SAP data into a single data model. Data Modeling with SAP BW/4HANA 2.0 gives you the skills you need to use the new SAP BW/HANA features and objects, covers modern modelling concepts, and equips you with the practical knowledge of how to use the best of the HANA and BW/4HANA worlds. What You Will Learn Discover the new modeling features in SAP BW/4HANA Combine SAP HANA and SAP BW/4HANA artifacts Leverage virtualization when designing and building data models Build hybrid data models combining InfoObject, OpenODS, and a field-based approach Integrate SAP and non-SAP data into single model Who This Book Is For BI consultants, architects, developers, and analysts working in the SAP BW/4HANA environment.

Data Engineering on Azure

2021-08-17 O'Reilly Amazon

book

Vlad Riscutia

data data-engineering AI/ML Analytics Azure Big Data

Build a data platform to the industry-leading standards set by Microsoft’s own infrastructure. In Data Engineering on Azure you will learn how to: Pick the right Azure services for different data scenarios Manage data inventory Implement production quality data modeling, analytics, and machine learning workloads Handle data governance Using DevOps to increase reliability Ingesting, storing, and distributing data Apply best practices for compliance and access control Data Engineering on Azure reveals the data management patterns and techniques that support Microsoft’s own massive data infrastructure. Author Vlad Riscutia, a data engineer at Microsoft, teaches you to bring an engineering rigor to your data platform and ensure that your data prototypes function just as well under the pressures of production. You'll implement common data modeling patterns, stand up cloud-native data platforms on Azure, and get to grips with DevOps for both analytics and machine learning. About the Technology Build secure, stable data platforms that can scale to loads of any size. When a project moves from the lab into production, you need confidence that it can stand up to real-world challenges. This book teaches you to design and implement cloud-based data infrastructure that you can easily monitor, scale, and modify. About the Book In Data Engineering on Azure you’ll learn the skills you need to build and maintain big data platforms in massive enterprises. This invaluable guide includes clear, practical guidance for setting up infrastructure, orchestration, workloads, and governance. As you go, you’ll set up efficient machine learning pipelines, and then master time-saving automation and DevOps solutions. The Azure-based examples are easy to reproduce on other cloud platforms. What's Inside Data inventory and data governance Assure data quality, compliance, and distribution Build automated pipelines to increase reliability Ingest, store, and distribute data Production-quality data modeling, analytics, and machine learning About the Reader For data engineers familiar with cloud computing and DevOps. About the Author Vlad Riscutia is a software architect at Microsoft. Quotes A definitive and complete guide on data engineering, with clear and easy-to-reproduce examples. - Kelum Prabath Senanayake, Echoworx An all-in-one Azure book, covering all a solutions architect or engineer needs to think about. - Albert Nogués, Danone A meaningful journey through the Azure ecosystem. You’ll be building pipelines and joining components quickly! - Todd Cook, Appen A gateway into the world of Azure for machine learning and DevOps engineers. - Krzysztof Kamyczek, Luxoft

Developing Modern Database Applications with PostgreSQL

2021-08-13 O'Reilly Amazon

book

Quan Ha Le , Marcelo Diaz

data data-engineering relational-databases postgresql API Cloud Computing

In "Developing Modern Database Applications with PostgreSQL", you will master the art of building database applications with the highly available and scalable PostgreSQL. Walk through a series of real-world projects that fully explore both the developmental and administrative aspects of PostgreSQL, all tied together through the example of a banking application. What this Book will help me do Set up high-availability PostgreSQL clusters using modern best practices. Monitor and tune database performance to handle enterprise-level workloads seamlessly. Automate testing and implement test-driven development strategies for robust applications. Leverage PostgreSQL along with DevOps pipelines to deploy applications on cloud platforms. Develop APIs and geospatial databases using popular tools like PostgREST and PostGIS. Author(s) The authors of this book, None Le and None Diaz, are experienced professionals in database technologies and software development. With a passion for PostgreSQL and its applications in modern computing, they bring a wealth of expertise and a practical approach to this book. Their methods focus on real-world applicability, ensuring that readers gain hands-on skills and practical knowledge. Who is it for? This book is perfect for database developers, administrators, and architects who want to advance their expertise in PostgreSQL. It is also suitable for software engineers and IT professionals aiming to tackle end-to-end database development projects. A basic knowledge of PostgreSQL and Linux will help you dive into the hands-on projects easily. If you're looking to take your PostgreSQL skills to the next level, this book is for you.

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize Version 8.4

2021-08-06 O'Reilly Amazon

book

Katja Kratt , Pawel Brodacki , Denis Olshanskiy , Konrad Trojok , Jordan Fincher , Vasfi Gucer , Hartmut Lonzer , Sidney Varoni Junior , Sergey Kubin , Corne Lottering , Rodrigo Jungi Suzuki , Ibrahim Alade Rufai , Jackson Shea , Leandro Torolho , Tiago Bastos

data data-engineering IBM Marketing

Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® introduces the IBM FlashSystem® solution that is powered by IBM Spectrum® Virtualize V8.4. This innovative storage offering delivers essential storage efficiency technologies and exceptional ease of use and performance, all integrated into a compact, modular design that is offered at a competitive, midrange price. The solution incorporates some of the top IBM technologies that are typically found only in enterprise-class storage systems, which raises the standard for storage efficiency in midrange disk systems. This cutting-edge storage system extends the comprehensive storage portfolio from IBM and can help change the way organizations address the ongoing information explosion. This IBM Redbooks® publication introduces the features and functions of an IBM Spectrum Virtualize V8.4 system through several examples. This book is aimed at pre-sales and post-sales technical support and marketing and storage administrators. It helps you understand the architecture, how to implement it, and how to take advantage of its industry-leading functions and features.

IBM GDPS Family: An Introduction to Concepts and Capabilities

2021-08-04 O'Reilly Amazon

book

John Thompson , Joerg Klemm , David Draper , Brian Cooper , Mairi Jane Lee , Vijay Radhakrishnan , Marie-France Narbey , Lydia Parziale

data data-engineering IBM

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

Data Modeling for Azure Data Services

2021-07-30 O'Reilly Amazon

book

Peter ter Braake

data data-engineering data-models Azure ADF BI

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

SQL Server on Kubernetes: Designing and Building a Modern Data Platform

2021-07-30 O'Reilly Amazon

book

Anthony E. Nocentino , Ben Weissman

data data-engineering relational-databases microsoft-sql-server API Azure

Build a modern data platform by deploying SQL Server in Kubernetes. Modern application deployment needs to be fast and consistent to keep up with business objectives and Kubernetes is quickly becoming the standard for deploying container-based applications, fast. This book introduces Kubernetes and its core concepts. Then it shows you how to build and interact with a Kubernetes cluster. Next, it goes deep into deploying and operationalizing SQL Server in Kubernetes, both on premises and in cloud environments such as the Azure Cloud. You will begin with container-based application fundamentals and then go into an architectural overview of a Kubernetes container and how it manages application state. Then you will learn the hands-on skill of building a production-ready cluster. With your cluster up and running, you will learn how to interact with your cluster and perform common administrative tasks. Once you can admin the cluster, you will learn how to deploy applications and SQL Server in Kubernetes. You will learn about high-availability options, and about using Azure Arc-enabled Data Services. By the end of this book, you will know how to set up a Kubernetes cluster, manage a cluster, deploy applications and databases, and keep everything up and running. What You Will Learn Understand Kubernetes architecture and cluster components Deploy your applications into Kubernetes clusters Manage your containers programmatically through API objects and controllers Deploy and operationalize SQL Server in Kubernetes Implement high-availability SQL Server scenarios on Kubernetes using Azure Arc-enabled Data Services Make use of Kubernetes deployments for Big Data Clusters Who This Book Is For DBAs and IT architects who are ready to begin planning their next-generation data platform and want to understand what it takes to run SQL Server in a container in Kubernetes. SQL Server on Kubernetes is an excellent choice for those who want to understand the big picture of why Kubernetes is the next-generation deployment method for SQL Server but also want to understand the internals, or the how, of deploying SQL Server in Kubernetes. When finished with this book, you will have the vision and skills to successfully architect, build and maintain a modern data platform deploying SQL Server on Kubernetes.

Designing Big Data Platforms

2021-07-27 O'Reilly Amazon

book

Yusuf Aytas

data data-engineering Analytics Big Data Computer Science Data Analytics

DESIGNING BIG DATA PLATFORMS Provides expert guidance and valuable insights on getting the most out of Big Data systems An array of tools are currently available for managing and processing data—some are ready-to-go solutions that can be immediately deployed, while others require complex and time-intensive setups. With such a vast range of options, choosing the right tool to build a solution can be complicated, as can determining which tools work well with each other. Designing Big Data Platforms provides clear and authoritative guidance on the critical decisions necessary for successfully deploying, operating, and maintaining Big Data systems. This highly practical guide helps readers understand how to process large amounts of data with well-known Linux tools and database solutions, use effective techniques to collect and manage data from multiple sources, transform data into meaningful business insights, and much more. Author Yusuf Aytas, a software engineer with a vast amount of big data experience, discusses the design of the ideal Big Data platform: one that meets the needs of data analysts, data engineers, data scientists, software engineers, and a spectrum of other stakeholders across an organization. Detailed yet accessible chapters cover key topics such as stream data processing, data analytics, data science, data discovery, and data security. This real-world manual for Big Data technologies: Provides up-to-date coverage of the tools currently used in Big Data processing and management Offers step-by-step guidance on building a data pipeline, from basic scripting to distributed systems Highlights and explains how data is processed at scale Includes an introduction to the foundation of a modern data platform Designing Big Data Platforms: How to Use, Deploy, and Maintain Big Data Systems is a must-have for all professionals working with Big Data, as well researchers and students in computer science and related fields.

Identity in Modern Applications

2021-07-25 O'Reilly Amazon

book

Lee Atchison

data data-engineering data-security-privacy data security & privacy

Mapping a person, place, or thing to a software resource in a verifiable manner is the basis of identity. Confirming that identity is a complex process, particularly when the identity mapping has to be verified genuine and authentic. Everything on the internet that houses private information is tied to identity and identity management. In this report, author Lee Atchison shows C-suite execs, engineering execs, architects, and others involved in building software applications the modern identity management techniques available to safeguard that simple access point. You'll learn how and why these techniques constantly need to keep up with modern application development, and you'll understand the growing sophistication of the people who safely interact or maliciously tamper with them. Explore the complex process of mapping a person, place, or thing to a software resource in a verifiable manner Get examples of real-world authentication, including methods and best practices for working with application credentials Understand the differences between single-factor and multifactor authentication Learn why every authentication method has flaws, including today's state-of-the-art processes Explore authorization, the process for granting users access to specific resources, and how it differs from authentication Understand trust relationships using trust systems to create more secure applications and systems

Amazon Redshift Cookbook

2021-07-23 O'Reilly Amazon

book

Shruti Worlikar , Thiyagarajan Arumugam , Harshida Patel

data data-engineering relational-databases amazon-redshift Analytics Cloud Computing

Dive into the world of Amazon Redshift with this comprehensive cookbook, packed with practical recipes to build, optimize, and manage modern data warehousing solutions. From understanding Redshift's architecture to implementing advanced data warehousing techniques, this book provides actionable guidance to harness the power of Amazon Redshift effectively. What this Book will help me do Master the architecture and core concepts of Amazon Redshift to architect scalable data warehouses. Optimize data pipelines and automate ETL processes for seamless data ingestion and management. Leverage advanced features like concurrency scaling and Redshift Spectrum for enhanced analytics. Apply best practices for security and cost optimization in Redshift projects. Gain expertise in scaling data warehouse solutions to accommodate large-scale analytics needs. Author(s) Shruti Worlikar, None Arumugam, and None Patel are seasoned experts in data warehousing and analytics with extensive experience using Amazon Redshift. Their backgrounds in implementing scalable data solutions make their insights practical and grounded. Through their collaborative writing, they aim to make complex topics approachable to learners of various skill levels. Who is it for? This book is tailored for professionals such as data warehouse developers, data engineers, and data analysts looking to master Amazon Redshift. It suits intermediate to advanced practitioners with a basic understanding of data warehousing and cloud technologies. Readers seeking to optimize Redshift for cost, performance, and security will find this guide invaluable.

Learning PHP, MySQL & JavaScript, 6th Edition

2021-07-22 O'Reilly Amazon

book

Robin Nixon

data data-engineering relational-databases MySQL HTML JavaScript

Build interactive, data-driven websites with the potent combination of open source technologies and web standards, even if you have only basic HTML knowledge. With the latest edition of this popular hands-on guide, you'll tackle dynamic web programming using the most recent versions of today's core technologies: PHP, MySQL, JavaScript, CSS, HTML5, jQuery, and the powerful React library. Web designers will learn how to use these technologies together while picking up valuable web programming practices along the way, including how to optimize websites for mobile devices. You'll put everything together to build a fully functional social networking site suitable for both desktop and mobile browsers. Explore MySQL from database structure to complex queries Use the MySQL PDO extension, PHP's improved MySQL interface Create dynamic PHP web pages that tailor themselves to the user Manage cookies and sessions and maintain a high level of security Enhance JavaScript with the React library Use Ajax calls for background browser-server communication Style your web pages by acquiring CSS skills Implement HTML5 features, including geolocation, audio, video, and the canvas element Reformat your websites into mobile web apps

IBM FlashSystem 9200 and 9100 Best Practices and Performance Guidelines

2021-07-19 O'Reilly Amazon

book

Sang-hyun Kim , Jon Tate , Dirk Peitzmann , Jon Herd , Sergey Kubin , Antonio Rainero , Tiago Moreira Candelaria Bastos

data data-engineering IBM

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM FlashSystem® 9100. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, remote copy services, and hosts. It explains how you can optimize disk performance with the IBM System Storage® Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting. This book is intended for experienced storage, SAN, IBM FlashSystem, SAN Volume Controller and Storwize® administrators and technicians. Understanding his book requires advanced knowledge of these environments. Important, IBM FlashSystem 9200: On 11th February 2020 IBM announced the arrival of the IBM FlashSystem 9200 to the family. This book was written specifically for IBM FlashSystem 9100, however most of the general principles will apply to the IBM FlashSystem 9200. If you are in any doubt as to their applicability to the FlashSystem 9200 then you should work with your local IBM representative. This book will be updated to include FlashSystem 9200 in due course.

IBM FlashSystem and VMware Implementation and Best Practices Guide

2021-07-19 O'Reilly Amazon

book

Jordan Fincher , Duane Bolland , David Green , Vasfi Gucer , Ibrahim Alade Rufai , Leandro Torolho , Warren Hawkins

data data-engineering IBM API VMware

This IBM® Redbooks® publication details the configuration and best practices for using IBM's FlashSystem family of storage products within a VMware environment. This book was published in 2021 and specifically addresses Spectrum Virtualize Version 8.4 with VMware vSphere Version 7.0. Topics illustrate planning, configuring, operations, and preferred practices that include integration of FlashSystem storage systems with the VMware vCloud suite of applications: - vSphere Web Client (VWC) - vStorage APIs for Storage Awareness (VASA) - vStorage APIs for Array Integration (VAAI) - Site Recovery Manager (SRM) - vSphere Metro Storage Cluster (vMSC) This book is intended for presales consulting engineers, sales engineers, and IBM clients who want to deploy IBM FlashSystem® storage systems in virtualized data centers that are based on VMware vSphere.

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

2021-07-16 O'Reilly Amazon

book

Dejan Sarka

data data-engineering relational-databases microsoft-sql-server transact-sql AI/ML

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration. No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well. Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining. Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented. T-SQL is supported in SQL Server,Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you. What You Will Learn Describe distribution of variables with statistical measures Find associations between pairs of variables Evaluate the quality of the data you are analyzing Perform time-series analysis on your data Forecast values of a continuous variable Perform market-basket analysis to predict customer purchasing patterns Predict target variable outcomes from one or more input variables Categorize passages of text by extracting and analyzing keywords Who This Book Is For Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

Best Practices Guide for Databases on IBM FlashSystem

2021-07-15 O'Reilly Amazon

book

Jagadeesh Papaiah

data data-engineering IBM

The purpose of this IBM® Redpaper® document is to provide best practice guidelines to design and implement IBM FlashSystem® storage for database workloads. The recommended settings and values are based on lab testing, proof of concept (PoC) and experience drawn from customer implementations. Suggestions that are presented in this document are applicable to most production database environments to increase performance of I/O and availability. However, more considerations might be required while designing, configuring, and implementing storage for extreme transactional, analytical, and database cluster environments. Customers are migrating database storage to IBM FlashSystem largely due to low latency performance of the IBM FlashSystem family of Storage. Using IBM FlashSystem, IBM customers are able to achieve low latency for queries and transactions from milliseconds to microseconds, realize a multi-fold increase in application level transactions per second, increase CPU efficiency and reduce database licensing costs. Recent additions of data reduction technologies to IBM FlashSystem further increase overall TCO benefits. All IBM FlashSystem models now offer compression, which can reduce database storage by 40 - 80% depending on database software. In addition to best practices that are described in this document, the IBM FlashSystem Worldwide Solutions Engineering Team can further assist customers with performing analysis of current database workloads for IBM FlashSystem benefits, perform PoCs at our labs, and help with implementation.

IBM TS4500 R7 Tape Library Guide

2021-07-15 O'Reilly Amazon

book

Jesus Eduardo Cervantes Rolon , Larry Coyne , Robert Beiderbeck , Khanh Ngo , Erwin Zwemmer , Fabian Corona Villarreal , Jeremy Tudgay

data data-engineering IBM Cloud Computing ELK SAS

The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and better integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve investments in IBM tape library products. Now, you can achieve a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 11 petabytes (PB) of uncompressed data in a single frame library or scale up to 2 PB per square foot to over 350 PB. The TS4500 offers the following benefits: High availability: Dual active accessors with integrated service bays reduce inactive service space by 40%. The Elastic Capacity option can be used to eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to another 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for IBM TS1160 while also supporting TS1155, TS1150, and TS1140 tape drive: The TS1160 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1160 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The TS1160 Tape Drive Model 60E delivers a dual 10 Gb or 25 Gb Ethernet host attachment interface that is optimized for cloud-based and hyperscale environments. The TS1160 Tape Drive Model 60F delivers a native data rate of 400 MBps, the same load/ready, locate speeds, and access times as the TS1155, and includes dual-port 16 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 8 tape drive: The LTO Ultrium 8 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 7, while still protecting your investment in the previous technology. Support of LTO 8 Type M cartridge (m8): The LTO Program introduced a new capability with LTO-8 drives. The ability of the LTO-8 drive to write 9 TB on a brand new LTO-7 cartridge instead of 6 TB as specified by the LTO-7 format. Such a cartridge is called an LTO-7 initialized LTO-8 Type M cartridge. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), command-line interface (CLI), and REST over SCSI (RoS) to obtain status information about library components. October 2020 - Added support for the 3592 model 60S tape drive that provides a dual-port 12 Gb SAS (Serial Attached SCSI) interface for host attachment.

Data Lakes For Dummies

2021-07-14 O'Reilly Amazon

book

Alan R. Simon

data data-engineering storage-repositories data-lake Analytics BI

Take a dive into data lakes “Data lakes” is the latest buzz word in the world of data storage, management, and analysis. Data Lakes For Dummies decodes and demystifies the concept and helps you get a straightforward answer the question: “What exactly is a data lake and do I need one for my business?” Written for an audience of technology decision makers tasked with keeping up with the latest and greatest data options, this book provides the perfect introductory survey of these novel and growing features of the information landscape. It explains how they can help your business, what they can (and can’t) achieve, and what you need to do to create the lake that best suits your particular needs. With a minimum of jargon, prolific tech author and business intelligence consultant Alan Simon explains how data lakes differ from other data storage paradigms. Once you’ve got the background picture, he maps out ways you can add a data lake to your business systems; migrate existing information and switch on the fresh data supply; clean up the product; and open channels to the best intelligence software for to interpreting what you’ve stored. Understand and build data lake architecture Store, clean, and synchronize new and existing data Compare the best data lake vendors Structure raw data and produce usable analytics Whatever your business, data lakes are going to form ever more prominent parts of the information universe every business should have access to. Dive into this book to start exploring the deep competitive advantage they make possible—and make sure your business isn’t left standing on the shore.

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

2021-07-13 O'Reilly Amazon

book

Gareth Coates , Stefan Stefanov , Scott Vetter , Bernhard Buehler , Sridhar Murthy , Ashwini Deo , Sabine Jordan , Turgut Genc , Muhammad Farrukh Mahmood

data data-engineering IBM ibm-power-systems Cloud Computing Linux

This IBM® Redbooks® publication is a guide to IBM Power Private Cloud with Shared Utility Capacity featuring Power Enterprise Pools 2.0 (also known as PEP 2.0). This technology allows multiple servers in an to share base processor and memory resources, and draw upon pre-paid credits when the base is exceeded. Previously, the Shared Utility feature supported IBM Power System E950 (9040-MR9) and IBM Power System E980 (9080-M9S). It was extended in August 2020 to include the Scale-out Power Systems announced on July 14th 2020 and received dedicated processor support later in the year. The IBM Power System S922 (9009-22G), and IBM Power System S924 (9009-42G) servers which use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems are now supported. The previous Scale-out models: IBM Power System S922 (9009-22A), and IBM Power System S924 (9009-42A) servers cannot be added to an Enterprise Pool. The goal of this book is to provide an overview of the environment and guidance for planning a deployment. The paper also covers how to configure PEP 2.0. There are also chapters on migrating from PEP 1.0 to PEP 2.0 and various use cases. This publication is for professionals who want to acquire a better understanding of IBM Power Private Cloud, and Shared Utility. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners This book expands the set of Power Systems documentation by providing a desktop reference which offers a detailed technical description of IBM Power Private Cloud, and Shared Utility.

Self-Sovereign Identity

2021-07-04 O'Reilly Amazon

book

Drummond Reed , Alex Preukschat

data data-engineering data-security-privacy data security & privacy Blockchain Linux

In a world of changing privacy regulations, identity theft, and online anonymity, identity is a precious and complex concept. Self-Sovereign Identity (SSI) is a set of technologies that move control of digital identity from third party “identity providers” directly to individuals, and it promises to be one of the most important trends for the coming decades. Now in Self-Sovereign Identity, privacy and personal data experts Drummond Reed and Alex Preukschat lay out a roadmap for a future of personal sovereignty powered by the Blockchain and cryptography. Cutting through the technical jargon with dozens of practical use cases from experts across all major industries, it presents a clear and compelling argument for why SSI is a paradigm shift, and shows how you can be ready to be prepared for it. About the Technology Trust on the internet is at an all-time low. Large corporations and institutions control our personal data because we’ve never had a simple, safe, strong way to prove who we are online. Self-sovereign identity (SSI) changes all that. About the Book In Self-Sovereign Identity: Decentralized digital identity and verifiable credentials, you’ll learn how SSI empowers us to receive digitally-signed credentials, store them in private wallets, and securely prove our online identities. It combines a clear, jargon-free introduction to this blockchain-inspired paradigm shift with interesting essays written by its leading practitioners. Whether for property transfer, ebanking, frictionless travel, or personalized services, the SSI model for digital trust will reshape our collective future. What's Inside The architecture of SSI software and services The technical, legal, and governance concepts behind SSI How SSI affects global business industry-by-industry Emerging standards for SSI About the Reader For technology and business readers. No prior SSI, cryptography, or blockchain experience required. About the Authors Drummond Reed is the Chief Trust Officer at Evernym, a technology leader in SSI. Alex Preukschat is the co-founder of SSIMeetup.org and AlianzaBlockchain.org. Quotes This book is a comprehensive roadmap to the most crucial fix for today’s broken Internet. - Brian Behlendorf, GM for Blockchain, Healthcare and Identity at the Linux Foundation If trusted relationships over the Internet are important to you or your business, this book is for you. - John Jordan, Executive Director, Trust over IP Foundation Decentralized identity represents not only a wide range of trust-enabling technologies, but also a paradigm shift in our increasingly digital-first world. - Rouven Heck, Executive Director, Decentralized Identity Foundation

Implementation Guide for IBM Elastic Storage System 3000

2021-06-28 O'Reilly Amazon

book

Robert Guthrie , Farida Yaragatti , Ravindra Sure , John Lewars , Vasfi Gucer , Stefan Roth , Chiahong Chen , Jonathan Terner , Todd M Tosseth , Brian Herr , Wesley Jones , Olaf Weiser , Puneet Chaudhary , Luis Bolinches

data data-engineering IBM ELK SAS

This IBM® Redbooks publication introduces and describes the IBM Elastic Storage® Server 3000 (ESS 3000) as a scalable, high-performance data and file management solution. The solution is built on proven IBM Spectrum® Scale technology, formerly IBM General Parallel File System (IBM GPFS). IBM Elastic Storage System 3000 is an all-Flash array platform. This storage platform uses NVMe-attached drives in ESS 3000 to provide significant performance improvements as compared to SAS-attached flash drives. This book provides a technical overview of the ESS 3000 solution and helps you to plan the installation of the environment. We also explain the use cases where we believe it fits best. Our goal is to position this book as the starting point document for customers that would use ESS 3000 as part of their IBM Spectrum Scale setups. This book is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective storage solutions with ESS 3000.

Data Fabric as Modern Data Architecture

2021-06-25 O'Reilly Amazon

book

Alice LaPlante

data data-engineering Analytics Data Management Data Quality Fabric

Data fabric is a hot concept in data management today. By encompassing the data ecosystem your company already has in place, this architectural design pattern provides your staff with one reliable place to go for data. In this report, author Alice LaPlante shows CIOs, CDOs, and CAOs how data fabric enables their users to spend more time analyzing than wrangling data. The best way to thrive during this intense period of digital transformation is through data. But after roaring through 2019, progress on getting the most out of data investments has lost steam. Only 38% of companies now say they've created a data-driven organization. This report describes how a data fabric can help you reach the all-important goal of data democratization. Learn how data fabric handles data prep, data delivery, and serves as a data catalog Use data fabric to handle data variety, a top challenge for many organizations Learn how data fabric spans any environment to support data for users and use cases from any source Examine data fabric's capabilities including data and metadata management, data quality, integration, analytics, visualization, and governance Get five pieces of advice for getting started with data fabric

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale

Cloud Native Integration with Apache Camel: Building Agile and Scalable Integrations for Kubernetes Platforms

Developing Modern Applications with a Converged Database

Cyber Resiliency Solution using IBM Spectrum Virtualize

Data Modeling with SAP BW/4HANA 2.0: Implementing Agile Data Models Using Modern Modeling Concepts

Data Engineering on Azure

Developing Modern Database Applications with PostgreSQL

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize Version 8.4

IBM GDPS Family: An Introduction to Concepts and Capabilities

Data Modeling for Azure Data Services

SQL Server on Kubernetes: Designing and Building a Modern Data Platform

Designing Big Data Platforms

Identity in Modern Applications

Amazon Redshift Cookbook

Learning PHP, MySQL & JavaScript, 6th Edition

IBM FlashSystem 9200 and 9100 Best Practices and Performance Guidelines

IBM FlashSystem and VMware Implementation and Best Practices Guide

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

Best Practices Guide for Databases on IBM FlashSystem

IBM TS4500 R7 Tape Library Guide

Data Lakes For Dummies

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

Self-Sovereign Identity

Implementation Guide for IBM Elastic Storage System 3000

Data Fabric as Modern Data Architecture