talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3377

Collection of O'Reilly books on Data Engineering.

Filtering by: data-engineering ×

Sessions & talks

Showing 401–425 of 3377 · Newest first

Search within this event →
IBM Supply Chain Transformation

In the midst of global disruptions, every element of IBM® Supply Chain has been affected. the IBM cognitive supply chain is positioned to win the future by using the exponential technologies that are inherent to our supply chains, and with flexibility, resiliency, and end-to-end visibility. The constant commitment of IBM to building smarter supply chains over the past decade has primed IBM to quickly and effectively navigate these disruptions and course-correct by using cognitive innovation. As a result, IBM Supply Chain teams were able to deliver exceptional outcomes without client disruption. In addition, this widespread impact inspired numerous new solutions that include exponential technologies that better prepare IBM for future disruptions in constantly changing markets.

Innovative SAP SuccessFactors Recruiting: A Guide to Creating Custom Integration and Automation

Get creative and optimize your SAP SuccessFactors Recruiting implementation with this guide, which examines a variety of integration and automation opportunities throughout the recruiting process outside of the standard integrations. Innovative SAP SuccessFactors Recruiting walks you through the end-to-end recruiting process and highlights opportunities to create interfaces and automation at each stage using a variety of methods and tools. After a brief overview of the market demands driving growth in this area and an introduction to OData, Anand Athanur, Mark Ingram and Michael A. Wellens detail each step in the recruiting process, starting with automating and integrating requisition creation using APIs and middleware. They then explore ways of enhancing candidate attraction and experience for the initial application process. After that, they jump into automation for overall candidate selection and processing, including automation using Robotic Process Automation, Integration center, the assessment integration framework, custom OData integrations, the background check integration framework, and Business Rules. Additionally, you’ll be shown onboarding optimization techniques using Intelligent Services, as well as hiring into third-party HRIS systems. After finishing this book, you will have a thorough understanding of how to utilize SAP SuccessFactors to recruit the right candidates for every position. What You Will Learn Integrate and automate the requisition creation process in innovative ways outside of SAP documentation Enhance candidate attraction and experience Leverage integration and automation opportunities within the application processing stage Automate hiring into third-party HRIS systems Who this Book For Customers, Consultants, and 3rd Party Vendors wishing to connect their solutions to SAP SuccessFactors Recruiting.

High Performance MySQL, 4th Edition

How can you realize MySQL's full power? With High Performance MySQL, you'll learn advanced techniques for everything from setting service-level objectives to designing schemas, indexes, and queries to tuning your server, operating system, and hardware to achieve your platform's full potential. This guide also teaches database administrators safe and practical ways to scale applications through replication, load balancing, high availability, and failover. Updated to reflect recent advances in cloud- and self-hosted MySQL, InnoDB performance, and new features and tools, this revised edition helps you design a relational data platform that will scale with your business. You'll learn best practices for database security along with hard-earned lessons in both performance and database stability. Dive into MySQL's architecture, including key facts about its storage engines Learn how server configuration works with your hardware and deployment choices Make query performance part of your software delivery process Examine enhancements to MySQL's replication and high availability Compare different MySQL offerings in managed cloud environments Explore MySQL's full stack optimization from application-side configuration to server tuning Turn traditional database management tasks into automated processes

Storage as a Service Offering Guide

IBM® Storage as a Service (STaaS) extends your hybrid cloud experience with a new flexible consumption model enabled for both your on-premises and hybrid cloud infrastructure needs, giving you the agility, cash flow efficiency, and services of cloud storage with the flexibility to dynamically scale up or down and only pay for what you use beyond the minimal capacity. This IBM Redpaper provides a detailed introduction to the IBM STaaS service. The paper is targeted for data center managers and storage administrators.

IBM HyperSwap and Multi-site HA/DR for IBM FlashSystem A9000 and A9000R

IBM® HyperSwap® is the high availability (HA) solution that provides continuous data availability in case of hardware failure, power failure, connectivity failure, or disasters. The HyperSwap capability is available for IBM FlashSystem® A9000 and IBM FlashSystem A9000R, starting with software version 12.2.1. Version 12.3 introduces a function that combines HyperSwap and Asynchronous replication, which creates a solution that entails HA and Disaster Recovery (DR). One side of the HyperSwap pair has an active async link to the third system, and the other side has a standby link. Known as Multi-site HA/DR, this configuration provides HyperSwap active-active HA while keeping data mirrored to a third copy to ensure two levels of business continuity. This IBM Redpaper™ publication gives a broad understanding of the architecture, design, and implementation of HyperSwap and Multi-site HA/DR solution. It also discusses and illustrates various use cases pertaining to their use and functionality. This paper is intended for those users who want to deploy solutions that take advantage of HyperSwap and Multi-site HA/DR for FlashSystem A9000 and A9000R.

Expert Oracle Database Architecture: Techniques and Solutions for High Performance and Productivity

Now in its fourth edition and covering Oracle Database 21c, this best-selling book continues to bring you some of the best thinking on how to apply Oracle Database to produce scalable applications that perform well and deliver correct results. Tom Kyte and Darl Kuhn share a simple philosophy: "you can treat Oracle as a black box and just stick data into it, or you can understand how it works and exploit it as a powerful computing environment." If you choose the latter, then you’ll find that there are few information management problems that you cannot solve quickly and elegantly. This fully revised fourth edition covers the developments and new features up to Oracle Database 21c. Up-to-date features are covered for tables, indexes, data types, sequences, partitioning, data loading, temporary tables, and more. All the examples are demonstrated using modern techniques and are executed in container and pluggable databases. The book’s proof-by-example approach encourages you to let evidence be your guide. Try something. See the result. Understand why the result is what it is. Apply your newfound knowledge with confidence. The book covers features by explaining how each one works, how to implement software using it, and the common pitfalls associated with it. Don’t treat Oracle Database as a black box. Get this book. Dive deeply into Oracle Database’s most powerful features that many do not invest the time to learn about. Set yourself apart from your competition and turbo-charge your career. What You Will Learn Identify and effectively resolve application performance issues and bottlenecks Architect systems to leverage the full power and feature set of Oracle’s database engine Configure a database to maximize the use of memory structures and background processes Understand internal locking and latching technology and how it impacts your system Proactively recommend best practices around performance for table and index structures Take advantage of advanced features such as table partitioning and parallel execution Who This Book Is For Oracle developers and Oracle DBAs. If you’re a developer and want a stronger understanding of Oracle features and architecture that will enable your applications to scale regardless of the workload, this book is for you. If you’re a DBA and want to intelligently work with developers to design applications that effectively leverage Oracle technology, then look no further.

Kafka: The Definitive Guide, 2nd Edition

Every enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the data itself. With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover Kafka's AdminClient API, transactions, new security features, and tooling changes. Engineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. You'll examine: Best practices for deploying and configuring Kafka Kafka producers and consumers for writing and reading messages Patterns and use-case requirements to ensure reliable data delivery Best practices for building data pipelines and applications with Kafka How to perform monitoring, tuning, and maintenance tasks with Kafka in production The most critical metrics among Kafka's operational measurements Kafka's delivery capabilities for stream processing systems

Essential PySpark for Scalable Data Analytics

Dive into the world of scalable data processing with 'Essential PySpark for Scalable Data Analytics'. This book is a comprehensive guide that helps beginners understand and utilize PySpark to process, analyze, and draw insights from large datasets effectively. With hands-on tutorials and clear explanations, you will gain the confidence to tackle big data analytics challenges. What this Book will help me do Understand and apply the distributed computing paradigm for big data. Learn to perform scalable data ingestion, cleansing, and preparation using PySpark. Create and utilize data lakes and the Lakehouse paradigm for efficient data storage and access. Develop and deploy machine learning models with scalability in mind. Master real-time analytics pipelines and create impactful data visualizations. Author(s) None Nudurupati is an experienced data engineer and educator, specializing in distributed systems and big data technologies. With years of practical experience in the field, None brings a clear and approachable teaching style to technical topics. Passionate about empowering readers, the author has designed this book to be both practical and inspirational for aspiring data practitioners. Who is it for? This book is ideal for data professionals including data scientists, engineers, and analysts looking to scale their data analytics processes. It assumes familiarity with basic data science concepts and Python, as well as some experience with SQL-like data analysis. This is particularly suitable for individuals aiming to expand their knowledge in distributed computing and PySpark to handle big data challenges. Achieving scalable and efficient data solutions is at the core of this guide.

IBM DS8000 and IBM Z Synergy DS8000: Release 9.2 and z/OS 2.5

IBM® Z has a close and unique relationship to its storage. Over the years, improvements to the Z processors and storage software, the disk storage systems, and their communication architecture consistently reinforced this synergy. This IBM Redpaper publication summarizes and highlights the various aspects, advanced functions, and technologies that are often pioneered by IBM, and that make the IBM Z® and the IBM DS8000 products an ideal combination. This paper is intended for users who have some familiarity with IBM Z and the IBM DS8000® series and want a condensed but comprehensive overview of the synergy items up to the IBM z15™ server with z/OS v2.5 and the IBM DS8900 Release 9.2 firmware.

Beginning Hibernate 6: Java Persistence from Beginner to Pro

Get started with Hibernate, an open source Java persistence layer and gain a clear introduction to the current standard for object-relational persistence in Java. This updated edition includes the new Hibernate 6.0 framework which covers new configuration, new object relational mapping changes, and enhanced integration with the more general Spring, Boot and Quarkus and other Java frameworks.The book keeps its focus on Hibernate without wasting time on nonessential third-party tools, so you’ll be able to immediately start building transaction-based engines and applications. Experienced authors Joseph Ottinger with Dave Minter and Jeff Linwood provide more in-depth examples than any other book for Hibernate beginners. They present their material in a lively, example-based manner—not a dry, theoretical, hard-to-read fashion. What You'll Learn Build enterprise Java-based transaction-type applications that access complex data with Hibernate Work with Hibernate 6 using a present-day build process Integrate into the persistence life cycle Search and query with the new version of Hibernate Keep track of versioned data with Hibernate Envers Who This Book Is For Programmers experienced in Java with databases (the traditional, or connected, approach), but new to open-source, lightweight Hibernate.

Optimize Video Streaming Delivery

Media content today is increasingly streamed video, and this trend will only grow as the speed of consumer internet and video quality improve. Traditional video streaming platforms, such as Netflix and Hulu, now account for only a portion of this content as more and more live events are streamed over the internet. And consumer-generated content on video-based social networks such as Twitch and TikTok is now more accessible and gaining popularity. This report focuses on the current state of video delivery, including the challenges content providers face and the various solutions they're pursuing. The findings in this report are based on a recent survey conducted by Edgecast, a content delivery network (CDN) that helps companies accelerate and deliver static and dynamic content to end users around the world. You'll explore: The current state of video streaming, how it works, and how streams are delivered Responses from a survey of CDN users that produce video streams How content providers are addressing recent video streaming challenges How the information in this report can help you identify KPIs

Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Data Engineering with Apache Spark, Delta Lake, and Lakehouse is a comprehensive guide packed with practical knowledge for building robust and scalable data pipelines. Throughout this book, you will explore the core concepts and applications of Apache Spark and Delta Lake, and learn how to design and implement efficient data engineering workflows using real-world examples. What this Book will help me do Master the core concepts and components of Apache Spark and Delta Lake. Create scalable and secure data pipelines for efficient data processing. Learn best practices and patterns for building enterprise-grade data lakes. Discover how to operationalize data models into production-ready pipelines. Gain insights into deploying and monitoring data pipelines effectively. Author(s) None Kukreja is a seasoned data engineer with over a decade of experience working with big data platforms. He specializes in implementing efficient and scalable data solutions to meet the demands of modern analytics and data science. Writing with clarity and a practical approach, he aims to provide actionable insights that professionals can apply to their projects. Who is it for? This book is tailored for aspiring data engineers and data analysts who wish to delve deeper into building scalable data platforms. It is suitable for those with basic knowledge of Python, Spark, and SQL, and seeking to learn Delta Lake and advanced data engineering concepts. Readers should be eager to develop practical skills for tackling real-world data engineering challenges.

Enhanced Cyber Resilience Threat Detection with IBM FlashSystem Safeguarded Copy and IBM QRadar

The focus of this document is to demonstrate an early threat detection by using IBM® QRadar® and the Safeguarded Copy feature that is available as part of IBM FlashSystem® and IBM SAN Volume Controller. Such early detection protects and quickly recovers the data if a cyberattack occurs. This document describes integrating IBM FlashSystem audit logs with IBM QRadar, and the configuration steps for IBM FlashSystem and IBM QRadar. It also explains how to use the IBM QRadar's device support module (DSM) editor to normalize events and assign IBM QRadar identifier (QID) map to the events. Post IBM QRadar configuration, we review configuring Safeguarded Copy on the application volumes by using volume groups and applying Safeguarded backup polices on the volume group. Finally, we demonstrate the use of orchestration software IBM Copy Services Manager to start a recovery, restore operations for data restoration on online volumes, and start a backup of data volumes.

IBM Spectrum Protect Plus Protecting Database Applications

IBM® Spectrum Protect Plus is a data protection solution that provides near-instant recovery, replication, retention management, and reuse for virtual machines, databases, and application backups in hybrid multicloud environments. This IBM Redpaper publication focuses on protecting database applications. IBM Spectrum® Protect Plus supports backup, restore, and data reuse for multiple databases, such as Oracle, IBM Db2®, MongoDB, Microsoft Exchange, and Microsoft SQL Server. Although other IBM Spectrum Protect Plus features focus on virtual environments, the database and application support of IBM Spectrum Protect Plus includes databases on virtual physical servers.

IBM FlashSystem Best Practices and Performance Guidelines

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM FlashSystem products. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, remote copy services, and hosts. It explains how you can optimize disk performance with the IBM System Storage® Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting. This book is intended for experienced storage, SAN, IBM FlashSystem, SAN Volume Controller (SVC), and IBM Storwize® administrators and technicians. Understanding this book requires advanced knowledge of these environments.

Storage Systems

Storage Systems: Organization, Performance, Coding, Reliability and Their Data Processing was motivated by the 1988 Redundant Array of Inexpensive/Independent Disks proposal to replace large form factor mainframe disks with an array of commodity disks. Disk loads are balanced by striping data into strips—with one strip per disk— and storage reliability is enhanced via replication or erasure coding, which at best dedicates k strips per stripe to tolerate k disk failures. Flash memories have resulted in a paradigm shift with Solid State Drives (SSDs) replacing Hard Disk Drives (HDDs) for high performance applications. RAID and Flash have resulted in the emergence of new storage companies, namely EMC, NetApp, SanDisk, and Purestorage, and a multibillion-dollar storage market. Key new conferences and publications are reviewed in this book.The goal of the book is to expose students, researchers, and IT professionals to the more important developments in storage systems, while covering the evolution of storage technologies, traditional and novel databases, and novel sources of data. We describe several prototypes: FAWN at CMU, RAMCloud at Stanford, and Lightstore at MIT; Oracle's Exadata, AWS' Aurora, Alibaba's PolarDB, Fungible Data Center; and author's paper designs for cloud storage, namely heterogeneous disk arrays and hierarchical RAID. Surveys storage technologies and lists sources of data: measurements, text, audio, images, and video Familiarizes with paradigms to improve performance: caching, prefetching, log-structured file systems, and merge-trees (LSMs) Describes RAID organizations and analyzes their performance and reliability Conserves storage via data compression, deduplication, compaction, and secures data via encryption Specifies implications of storage technologies on performance and power consumption Exemplifies database parallelism for big data, analytics, deep learning via multicore CPUs, GPUs, FPGAs, and ASICs, e.g., Google's Tensor Processing Units

Snowflake Security: Securing Your Snowflake Data Cloud

This book is your complete guide to Snowflake security, covering account security, authentication, data access control, logging and monitoring, and more. It will help you make sure that you are using the security controls in a right way, are on top of access control, and making the most of the security features in Snowflake. Snowflake is the fastest growing cloud data warehouse in the world, and having the right methodology to protect the data is important both to data engineers and security teams. It allows for faster data enablement for organizations, as well as reducing security risks, meeting compliance requirements, and solving data privacy challenges. There are currently tens of thousands of people who are either data engineers/data ops in Snowflake-using organizations, or security people in such organizations. This book provides guidance when you want to apply certain capabilities, such as data masking, row-level security, column-level security, tackling rolehierarchy, building monitoring dashboards, etc., to your organizations. What You Will Learn Implement security best practices for Snowflake Set up user provisioning, MFA, OAuth, and SSO Set up a Snowflake security model Design roles architecture Use advanced access control such as row-based security and dynamic masking Audit and monitor your Snowflake Data Cloud Who This Book Is For Data engineers, data privacy professionals, and security teams either with security knowledge (preferably some data security knowledge) or with data engineering knowledge; in other words, either “Snowflake people” or “data people” who want to get security right, or “security people” who want to make sure that Snowflake gets handled right in terms of security

IBM FlashSystem 5000 and 5200 for Mid-Market

The IBM® FlashSystem 5015, 5035, and 5200 help you meet the challenges of rapid data growth while staying within limited IT budgets. These systems allow you to quickly consolidate, simplify, and optimize your IT infrastructure with an efficient, highly flexible, yet easy-to-use storage system with powerful virtualization features. This IBM Redpaper™ publication is intended for mid-market clients.

Fabric Resiliency and Best Practices for IBM c-type Products

This IBM Redpaper publication describes best practices for deploying and using advanced Cisco NX-OS features to identify, monitor, and protect Fibre Channel (FC) Storage Area Networks (SANs) from problematic devices and media behavior. The paper focuses on the IBM c-type SAN switches with firmware Cisco MDS NX-OS Release 8.4(2a).

Azure Databricks Cookbook

Azure Databricks is a robust analytics platform that leverages Apache Spark and seamlessly integrates with Azure services. In the Azure Databricks Cookbook, you'll find hands-on recipes to ingest data, build modern data pipelines, and perform real-time analytics while learning to optimize and secure your solutions. What this Book will help me do Design advanced data workflows integrating Azure Synapse, Cosmos DB, and streaming sources with Databricks. Gain proficiency in using Delta Tables and Spark for efficient data storage and analysis. Learn to create, deploy, and manage real-time dashboards with Databricks SQL. Master CI/CD pipelines for automating deployments of Databricks solutions. Understand security best practices for restricting access and monitoring Azure Databricks. Author(s) None Raj and None Jaiswal are experienced professionals in the field of big data and analytics. They are well-versed in implementing Azure Databricks solutions for real-world problems. Their collaborative writing approach ensures clarity and practical focus. Who is it for? This book is tailored for data engineers, scientists, and big data professionals who want to apply Azure Databricks and Apache Spark to their analytics workflows. A basic familiarity with Spark and Azure is recommended to make the best use of the recipes provided. If you're looking to scale and optimize your analytics pipelines, this book is for you.

Securing Data on Threat Detection by Using IBM Spectrum Scale and IBM QRadar: An Enhanced Cyber Resiliency Solution

Having appropriate storage for hosting business-critical data and advanced Security Information and Event Management (SIEM) software for deep inspection, detection, and prioritization of threats has become a necessity for any business. This IBM® Redpaper publication explains how the storage features of IBM Spectrum® Scale, when combined with the log analysis, deep inspection, and detection of threats that are provided by IBM QRadar®, help reduce the impact of incidents on business data. Such integration provides an excellent platform for hosting unstructured business data that is subject to regulatory compliance requirements. This paper describes how IBM Spectrum Scale File Audit Logging can be integrated with IBM QRadar. Using IBM QRadar, an administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data that is stored on IBM Spectrum Scale. When the threats are identified, you can quickly act on them to mitigate or reduce the impact of incidents. We further demonstrate how the threat detection by IBM QRadar can proactively trigger data snapshots or cyber resiliency workflow in IBM Spectrum Scale to protect the data during threat. This third edition has added the section "Ransomware threat detection", where we describe a ransomware attack scenario within an environment to leverage IBM Spectrum Scale File Audit logs integration with IBM QRadar. This paper is intended for chief technology officers, solution engineers, security architects, and systems administrators. This paper assumes a basic understanding of IBM Spectrum Scale and IBM QRadar and their administration.

PostGIS in Action, Third Edition

In PostGIS in Action, Third Edition you will learn: An introduction to spatial databases Geometry, geography, raster, and topology spatial types, functions, and queries Applying PostGIS to real-world problems Extending PostGIS to web and desktop applications Querying data from external sources using PostgreSQL Foreign Data Wrappers Optimizing queries for maximum speed Simplifying geometries for greater efficiency PostGIS in Action, Third Edition teaches readers of all levels to write spatial queries for PostgreSQL. You’ll start by exploring vector-, raster-, and topology-based GIS before quickly progressing to analyzing, viewing, and mapping data. This fully updated third edition covers key changes in PostGIS 3.1 and PostgreSQL 13, including parallelization support, partitioned tables, and new JSON functions that help in creating web mapping applications. About the Technology PostGIS is a spatial database extender for PostgreSQL. It offers the features and firepower you need to take on nearly any geodata task. PostGIS lets you create location-aware queries with a few lines of SQL code, then build the backend for mapping, raster analysis, or routing application with minimal effort. About the Book PostGIS in Action, Third Edition shows you how to solve real-world geodata problems. You’ll go beyond basic mapping, and explore custom functions for your applications. Inside this fully updated edition, you’ll find coverage of new PostGIS features such as PostGIS Window functions, parallelization of queries, and outputting data for applications using JSON and Vector Tile functions. What's Inside Fully revised for PostGIS version 3.1 and PostgreSQL 13 Optimize queries for maximum speed Simplify geometries for greater efficiency Extend PostGIS to web and desktop applications About the Reader For readers familiar with relational databases and basic SQL. No prior geodata or GIS experience required. About the Authors Regina Obe and Leo Hsu are database consultants and authors. Regina is a member of the PostGIS core development team and the Project Steering Committee. Quotes The best introduction I’ve seen for engineers who want to get ramped up quickly and build advanced GIS applications. - Ikechukwu Okonkwo, Orum.io A wealth of information that showcases how powerful PostGIS is. - Luis Moux-Dominguez, EMO An extraordinary book for the world of GIS. Truly learned a lot! - DeUndre’ Rushon, DigiDiscover LLC Gives you insight into how best to provide map services for a wide audience. - Marcus Brown, Enel Green Power

Learning MySQL, 2nd Edition

Get a comprehensive overview on how to set up and design an effective database with MySQL. This thoroughly updated edition covers MySQL's latest version, including its most important aspects. Whether you're deploying an environment, troubleshooting an issue, or engaging in disaster recovery, this practical guide provides the insights and tools necessary to take full advantage of this powerful RDBMS. Authors Vinicius Grippa and Sergey Kuzmichev from Percona show developers and DBAs methods for minimizing costs and maximizing availability and performance. You'll learn how to perform basic and advanced querying, monitoring and troubleshooting, database management and security, backup and recovery, and tuning for improved efficiency. This edition includes new chapters on high availability, load balancing, and using MySQL in the cloud. Get started with MySQL and learn how to use it in production Deploy MySQL databases on bare metal, on virtual machines, and in the cloud Design database infrastructures Code highly efficient queries Monitor and troubleshoot MySQL databases Execute efficient backup and restore operations Optimize database costs in the cloud Understand database concepts, especially those pertaining to MySQL

Foundations of Data Intensive Applications

PEEK “UNDER THE HOOD” OF BIG DATA ANALYTICS The world of big data analytics grows ever more complex. And while many people can work superficially with specific frameworks, far fewer understand the fundamental principles of large-scale, distributed data processing systems and how they operate. In Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood, renowned big-data experts and computer scientists Drs. Supun Kamburugamuve and Saliya Ekanayake deliver a practical guide to applying the principles of big data to software development for optimal performance. The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You???ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within. Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system. Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to: Identify the foundations of large-scale, distributed data processing systems Make major software design decisions that optimize performance Diagnose performance problems and distributed operation issues Understand state-of-the-art research in big data Explain and use the major big data frameworks and understand what underpins them Use big data analytics in the real world to solve practical problems

IBM DS8900F Product Guide Release 9.2

This IBM® Redbooks Product Guide provides an overview of the features and functions that are available with the IBM DS8900F models that run microcode Release 9.2 (Bundle 89.20 / Licensed Machine Code 7.9.20). As of August 2021, the DS8900F with DS8000 Release 9.2 is the latest addition. The DS8900F is an all-flash system exclusively, and it offers three classes: IBM DS8980F: Analytic Class: The DS8980F Analytic Class offers best performance for organizations that want to expand their workload possibilities to artificial intelligence (AI), Business Intelligence, and Machine Learning. IBM DS8950F: Agility Class: The agility class is efficiently designed to consolidate all your mission-critical workloads for IBM Z, IBM LinuxONE, IBM Power Systems, and distributed environments under a single all-flash storage solution.. IBM DS8910F: Flexibility Class: The flexibility class delivers significant performance for midrange organizations that are looking to meet storage challenges with advanced functionality delivered as a single rack solution.