O'Reilly Data Engineering Books

Practical MongoDB Aggregations

2024-03-01 O'Reilly Amazon

book

Paul Done

data data-engineering nosql-databases MongoDB Big Data IoT

Dive into the capabilities of the MongoDB aggregation framework with this official guide, "Practical MongoDB Aggregations". You'll learn how to design and optimize efficient aggregation pipelines for MongoDB 7.0, empowering you to handle complex data analysis and processing tasks directly within the database. What this Book will help me do Gain expertise in crafting advanced MongoDB aggregation pipelines for custom data workflows. Learn to perform time series analysis for financial datasets and IoT applications. Discover optimization techniques for working with sharded clusters and large datasets. Master array manipulation and other specific operations essential for MongoDB data models. Build pipelines that ensure data security and distribution while maintaining performance. Author(s) Paul Done, a recognized expert in MongoDB, brings his extensive experience in database technologies to this book. With years of practice in helping companies leverage MongoDB for big data solutions, Paul shares his deep knowledge in an accessible and logical manner. His approach to writing is hands-on, focusing on practical insights and clear explanations. Who is it for? This book is tailored for intermediate-level developers, database architects, data analysts, engineers, and scientists who use MongoDB. If you are familiar with MongoDB and looking to expand your understanding specifically around its aggregation capabilities, this guide is for you. Whether you're analyzing time series data or need to optimize pipelines for performance, you'll find actionable tips and examples here to suit your needs.

Learn T-SQL Querying - Second Edition

2024-02-29 O'Reilly Amazon

book

Pam Lahoud , Pedro Lopes

data data-engineering SQL Azure Microsoft SQL Server

Troubleshoot query performance issues, identify anti-patterns in your code, and write efficient T-SQL queries with this guide for T-SQL developers Key Features A definitive guide to mastering the techniques of writing efficient T-SQL code Learn query optimization fundamentals, query analysis, and how query structure impacts performance Discover insightful solutions to detect, analyze, and tune query performance issues Purchase of the print or Kindle book includes a free PDF eBook Book Description Data professionals seeking to excel in Transact-SQL for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. Learn T-SQL Querying second edition focuses on indexing queries and crafting elegant T-SQL code enabling data professionals gain mastery in modern SQL Server versions (2022) and Azure SQL Database. The book covers new topics like logical statement processing flow, data access using indexes, and best practices for tuning T-SQL queries. Starting with query processing fundamentals, the book lays a foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning to analyze execution plans for insights into current performance and scalability. Using dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. The book covers indexing and delves into SQL Server’s built-in tools to expedite resolution of T-SQL query performance and scalability issues. Hands-on examples will guide you to avoid UDF pitfalls and understand features like predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and avoid pitfalls What you will learn Identify opportunities to write well-formed T-SQL statements Familiarize yourself with the Cardinality Estimator for query optimization Create efficient indexes for your existing workloads Implement best practices for T-SQL querying Explore Query Execution Dynamic Management Views Utilize the latest performance optimization features in SQL Server 2017, 2019, and 2022 Safeguard query performance during upgrades to newer versions of SQL Server Who this book is for This book is for database administrators, database developers, data analysts, data scientists and T-SQL practitioners who want to master the art of writing efficient T-SQL code and troubleshooting query performance issues through practical examples. A basic understanding of T-SQL syntax, writing queries in SQL Server, and using the SQL Server Management Studio tool will be helpful to get started.

Azure Data Factory Cookbook - Second Edition

2024-02-28 O'Reilly Amazon

book

Xenia Ireton , Tonya Chernyshova , Dmitry Foshin , Dmitry Anoshin

data data-engineering storage-repositories data-lake Analytics Azure

This comprehensive guide to Azure Data Factory shows you how to create robust data pipelines and workflows to handle both cloud and on-premises data solutions. Through practical recipes, you will learn to build, manage, and optimize ETL, hybrid ETL, and ELT processes. The book offers detailed explanations to help you integrate technologies like Azure Synapse, Data Lake, and Databricks into your projects. What this Book will help me do Master building and managing data pipelines using Azure Data Factory's latest versions and features. Leverage Azure Synapse and Azure Data Lake for streamlined data integration and analytics workflows. Enhance your ETL/ELT solutions with Microsoft Fabric, Databricks, and Delta tables. Employ debugging tools and workflows in Azure Data Factory to identify and solve data processing issues efficiently. Implement industry-grade best practices for reliable and efficient data orchestration and integration pipelines. Author(s) Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, and Xenia Ireton collectively bring years of expertise in data engineering and cloud-based solutions. They are recognized professionals in the Azure ecosystem, dedicated to sharing their knowledge through detailed and actionable content. Their collaborative approach ensures that this book provides practical insights for technical audiences. Who is it for? This book is ideal for data engineers, ETL developers, and professional architects who work with cloud and hybrid environments. If you're looking to upskill in Azure Data Factory or expand your knowledge into related technologies like Synapse Analytics or Databricks, this is for you. Readers should have a foundational understanding of data warehousing concepts to fully benefit from the material.

Big Data Computing

2024-02-27 O'Reilly Amazon

book

Bishwajeet Kumar Pandey , Tanvir Habib Sardar

data data-engineering Hadoop apache-hive Big Data Hive

This book primarily aims to provide an in-depth understanding of recent advances in big data computing technologies, methodologies, and applications along with introductory details of big data computing models such as Apache Hadoop, MapReduce, Hive, Pig, Mahout in-memory storage systems, NoSQL databases, and big data streaming services.

IBM FlashSystem and VMware Implementation and Best Practices Guide

2024-02-27 O'Reilly Amazon

book

Jordan Fincher , Duane Bolland , David Green , Vasfi Gucer , Nezih Boyacioglu , Ibrahim Alade Rufai , Leandro Torolho , Warren Hawkins

data data-engineering IBM API VMware

This IBM® Redbooks® publication details the configuration and best practices for using the IBM FlashSystem® family of storage products within a VMware environment. The first version of this book was published in 2021 and specifically addressed IBM Spectrum® Virtualize Version 8.4 with VMware vSphere 7.0. This second version of this book includes all the enhancements that are available with IBM Spectrum Virtualize 8.5. Topics illustrate planning, configuring, operations, and preferred practices that include integration of IBM FlashSystem storage systems with the VMware vCloud suite of applications: VMware vSphere Web Client (vWC) vSphere Storage APIs - Storage Awareness (VASA) vSphere Storage APIs – Array Integration (VAAI) VMware Site Recovery Manager (SRM) VMware vSphere Metro Storage Cluster (vMSC) Embedded VASA Provider for VMware vSphere Virtual Volumes (vVols) This book is intended for presales consulting engineers, sales engineers, and IBM clients who want to deploy IBM FlashSystem storage systems in virtualized data centers that are based on VMware vSphere. Note: There is a newer version of this book: "IBM Storage Virtualize and VMware: Integrations, Implementation and Best Practices, SG24-8549". This book addresses IBM Storage Virtualize Version 8.6 with VMware vSphere 8. The new IBM Storage plugin for vSphere is covered in this book.

IBM TS7700 Release 5.3 Guide

2024-02-27 O'Reilly Amazon

book

Shinsuke Ueyama , Aderson Pacini , Dave Brettell , Yuki Asakura , Nao Takemura , Lourie Goodall , Alberto Barajas Ortiz , Erina Tatsumi , Nielson ’Nino’ de Carvalho , Chen Zhu , Larry Coyne , Taisei Takai , Tomoaki Ogino , Michael Scott , Kousei Kawamura , Derek Erdmann , Trinidad Armando Rangel Ruiz , Shinya Ohri , Nobuhiko Furuya , Joe Hew , Rin Fujiwara , Ramón A. Minjares Campos , Stefan Neff , Tony Makepeace , Takahiro Tsuda

data data-engineering IBM Cloud Computing Cloud Storage S3

This IBM Redbooks® publication covers IBM TS7700 R5.3. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on over 25 years of experience, the R5.3 release includes many features that enable improved performance, usability, and security. Highlights include the IBM TS7700 Advanced Object Store, an all flash TS7770, grid resiliency enhancements, and Logical WORM retention. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud Object Storage, Amazon S3, and RSTOR. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.3. The R5.3 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000 Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.3 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. TS7700 provides tape virtualization for the IBM Z® environment. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. New and existing capabilities of the TS7700 5.3 release includes the following highlights: Support for IBM TS1160 Tape Drives and JE/JM media Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication of virtual tape and TCT objects Grid access to all logical volume and object data independent of where it resides An all flash TS7770 option for improved performance Full Advanced Object Store Grid Cloud support of DS8000 Transparent Cloud Tier Full AES256 encryption for data that is in-flight and at-rest Tight integration with IBM Z and DFSMS policy management DS8000 Object Store with AES256 in-flight encryption and compression Regulatory compliance through Logical WORM and LWORM Retention support Cloud Storage Tier support for archive, logical volume versions, and disaster recovery Optional integration with physical tape 16 Gb IBM FICON® throughput that exceeds 4 GBps per TS7700 cluster Grid Resiliency Support with Control Unit Initiated Reconfiguration (CUIR) support IBM Z hosts view up to 3,968 3490 devices per TS7700 grid TS7770 Cache On Demand feature that uses capacity-based licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1160, IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM Power9® technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM DS8000 Copy Services: Updated for IBM DS8000 Release 9.1

2024-02-18 O'Reilly Amazon

book

Connie Riggins , Lisa Martinez , Mark Wells , Bertrand Dufrasne , Tony Eriksson , Michael Frankenberg , Suellen Ricardo Fida

data data-engineering IBM

This IBM® Redbooks® publication helps you plan, install, configure, and manage Copy Services on the IBM DS8000® operating in an IBM Z® or Open Systems environment. This book helps you design and implement a new Copy Services installation or migrate from an existing installation. It includes hints and tips to maximize the effectiveness of your installation, and information about tools and products to automate Copy Services functions. It is intended for anyone who needs a detailed and practical understanding of the DS8000 Copy Services. This edition is an update for the DS8900 Release 9.1. Note that the Safeguarded Copy feature is covered in IBM DS8000 Safeguarded Copy, REDP-5506.

IBM Z Functional Matrix

2024-02-14 O'Reilly Amazon

book

Ewerson Palacio , Bill White , Octavian Lascu

data data-engineering IBM

This IBM® Redpaper publication can help you quickly understand the features, functions, and connectivity options that are available with the IBM z16™, IBM z15®, and IBM z14®. The intention of this publication is to compare the standard and optional features for various IBM Z® configurations.

IBM and CMTG Cyber Resiliency: Building an Automated, VMware Aware Safeguarded Copy Solution to Provide Data Resilience

2024-02-09 O'Reilly Amazon

book

Stephen Doney , Barry Whyte , Neil Morris

data data-engineering IBM Cloud Computing Data Management VMware

This IBM Blueprint outlines how CMTG and IBM have partnered to provide cyber resilient services to their clients. CMTG is one of Australia's leading private cloud providers based in Perth, Western Australia. The solution is based on IBM Storage FlashSystem, IBM Safeguarded Copy and IBM Storage Copy Data Management. The target audience for this Blueprint is IBM Storage technical specialists and storage admins.

Deciphering Data Architectures

2024-02-07 O'Reilly Amazon

book

James Serra

data data-engineering storage-repositories data-lake Big Data Data Lake

Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of these architectures to help data professionals understand the pros and cons of each. James Serra, big data and data warehousing solution architect at Microsoft, examines common data architecture concepts, including how data warehouses have had to evolve to work with data lake features. You'll learn what data lakehouses can help you achieve, as well as how to distinguish data mesh hype from reality. Best of all, you'll be able to determine the most appropriate data architecture for your needs. With this book, you'll: Gain a working understanding of several data architectures Learn the strengths and weaknesses of each approach Distinguish data architecture theory from reality Pick the best architecture for your use case Understand the differences between data warehouses and data lakes Learn common data architecture concepts to help you build better solutions Explore the historical evolution and characteristics of data architectures Learn essentials of running an architecture design session, team organization, and project success factors Free from product discussions, this book will serve as a timeless resource for years to come.

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.6

2024-02-07 O'Reilly Amazon

book

James Whitaker , Bill Scales , Barry Whyte

data data-engineering IBM Cloud Computing Cyber Security

IBM® Storage Virtualize based storage systems are secure storage platforms that implement various security-related features, in terms of system-level access controls and data-level security features. This document outlines the available security features and options of IBM Storage Virtualize based storage systems. It is not intended as a "how to" or best practice document. Instead, it is a checklist of features that can be reviewed by a user security team to aid in the definition of a policy to be followed when implementing IBM FlashSystem®, IBM SAN Volume Controller, and IBM Storage Virtualize for Public Cloud. IBM Storage Virtualize features the following levels of security to protect against threats and to keep the attack surface as small as possible: The first line of defense is to offer strict verification features that stop unauthorized users from using login interfaces and gaining access to the system and its configuration. The second line of defense is to offer least privilege features that restrict the environment and limit any effect if a malicious actor does access the system configuration. The third line of defense is to run in a minimal, locked down, mode to prevent damage spreading to the kernel and rest of the operating system. The fourth line of defense is to protect the data at rest that is stored on the system from theft, loss, or corruption (malicious or accidental). The topics that are discussed in this paper can be broadly split into two categories: System security: This type of security encompasses the first three lines of defense that prevent unauthorized access to the system, protect the logical configuration of the storage system, and restrict what actions users can perform. It also ensures visibility and reporting of system level events that can be used by a Security Information and Event Management (SIEM) solution, such as IBM QRadar®. Data security: This type of security encompasses the fourth line of defense. It protects the data that is stored on the system against theft, loss, or attack. These data security features include Encryption of Data At Rest (EDAR) or IBM Safeguarded Copy (SGC). This document is correct as of IBM Storage Virtualize 8.6.

Data Science and Machine Learning Applications in Subsurface Engineering

2024-02-06 O'Reilly Amazon

book

Daniel Asante Otchere

data ai-ml machine-learning AI/ML Data Science

This book provides comprehensive research and explores the different applications of data science and machine learning in subsurface engineering.

Mastering MongoDB 7.0 - Fourth Edition

2024-02-01 O'Reilly Amazon

book

Malak Abu Hammad , Elie Hannouch , Leandro Domingues , Marko Aleksendrić , Arek Borucki , Rachelle Palmer , Rajesh Nair

data data-engineering nosql-databases MongoDB Data Management Cyber Security

Discover the many capabilities of MongoDB 7.0 with this comprehensive guide designed to take your database skills to new heights. By exploring advanced features like aggregation pipelines, role-based security, and MongoDB Atlas, you will gain in-depth expertise in modern data management. This book empowers you to create secure, high-performance database applications. What this Book will help me do Understand and implement advanced MongoDB queries for detailed data analysis. Apply optimized indexing techniques to maximize query performance. Leverage MongoDB Atlas for robust monitoring, efficient backups, and advanced integrations. Develop secure applications with role-based access control, auditing, and encryption. Create scalable and innovative solutions using the latest features in MongoDB 7.0. Author(s) Marko Aleksendrić, Arek Borucki, and their co-authors are accomplished experts in database engineering and MongoDB development. They bring collective experience in teaching and practical application of MongoDB solutions across various industries. Their goal is to simplify complex topics, making them approachable and actionable for developers worldwide. Who is it for? This book is written for developers, software engineers, and database administrators with experience in MongoDB who want to deepen their expertise. An understanding of basic database operations and queries is recommended. If you are looking to master advanced concepts and create secure, optimized, and scalable applications, this is the book for you.

Data Engineering with Scala and Spark

2024-01-31 O'Reilly Amazon

book

Rupam Bhattacharjee , Eric Tome , David Radford

software-development programming-languages jvm-languages Scala API CI/CD

Data Engineering with Scala and Spark guides you through building robust data pipelines that process massive datasets efficiently. You will learn practical techniques leveraging Scala and Spark with a hands-on approach to mastering data engineering tasks including ingestion, transformation, and orchestration. What this Book will help me do Set up a data pipeline development environment using Scala Utilize Spark APIs like DataFrame and Dataset for effective data processing Implement CI/CD and testing strategies for pipeline maintainability Optimize pipeline performance through tuning techniques Apply data profiling and quality enforcement using tools like Deequ Author(s) Eric Tome, Rupam Bhattacharjee, and David Radford bring decades of combined experience in data engineering and distributed systems. Their work spans cutting-edge data processing solutions using Scala and Spark. They aim to help professionals excel in building reliable, scalable pipelines. Who is it for? This book is tailored for working data engineers familiar with data workflow processes who desire to enhance their expertise in Scala and Spark. If you aspire to build scalable, high-performance data solutions or transition raw data into strategic assets, this book is ideal.

IBM Storage Fusion Multicloud Object Gateway

2024-01-31 O'Reilly Amazon

book

Eyal Abraham , Shawn Houston

data data-engineering IBM Cloud Computing

This Redpaper provides an overview of IBM Storage Fusion Multicloud Object Gateway (MCG) and can be used as a quick reference guide for the most common use cases. The intended audience is cloud and application administrators, as well as other technical staff members who wish to learn how MCG works, how to set it up, and usage of a Backing Store or Namespace Store, as well as object caching.

Take Control of iOS & iPadOS Privacy and Security, 4th Edition

2024-01-29 O'Reilly Amazon

book

Glenn Fleishman

data data-engineering data-security-privacy data security & privacy Marketing Cyber Security

Master networking, privacy, and security for iOS and iPadOS! Version 4.2, updated January 29, 2024 Ensuring that your iPhone or iPad’s data remains secure and in your control and that your private data remains private isn’t a battle—if you know what boxes to check and how to configure iOS and iPadOS to your advantage. Take Control of iOS & iPadOS Privacy and Security takes you into the intricacies of Apple’s choices when it comes to networking, data sharing, and encryption—and protecting your personal safety. Substantially updated to cover dozens of changes and new features in iOS 17 and iPadOS 17! Your iPhone and iPad have become the center of your digital identity, and it’s easy to lose track of all the ways in which Apple and other parties access your data legitimately—or without your full knowledge and consent. While Apple nearly always errs on the side of disclosure and permission, many other firms don’t. This book comprehensively explains how to configure iOS 17, iPadOS 17, and iCloud-based services to best protect your privacy with messaging, email, browsing, and much more. The book also shows you how to ensure your devices and data are secure from intrusion from attackers of all types. You’ll get practical strategies and configuration advice to protect yourself against psychological and physical threats, including restrictions on your freedom and safety. For instance, you can now screen images that may contain nude images, while Apple has further enhanced Lockdown Mode to block potential attacks by governments, including your own. Take Control of iOS & iPadOS Privacy and Security covers how to configure the hundreds of privacy and data sharing settings Apple offers in iOS and iPadOS, and which it mediates for third-party apps. Safari now has umpteen different strategies built in by Apple to protect your web surfing habits, personal data, and identity, and new features in Safari, Mail, and Messages that block tracking of your movement across sites, actions on ads, and even when you open and view an email message. In addition to privacy and security, this book also teaches you everything you need to know about networking, whether you’re using 3G, 4G LTE, or 5G cellular, Wi-Fi or Bluetooth, or combinations of all of them; as well as about AirDrop, AirPlay, Airplane Mode, Personal Hotspot, and tethering. You’ll learn how to:

Twiddle 5G settings to ensure the best network speeds on your iPhone or iPad. Master the options for a Personal Hotspot for yourself and in a Family Sharing group. Set up a device securely from the moment you power up a new or newly restored iPhone or iPad. Manage Apple’s built-in second factor verification code generator for extra-secure website and app logins. Create groups of passwords and passkeys you can share securely with other iPhone, iPad, and Mac users. Decide whether Advanced Data Protection in iCloud, an enhanced encryption option that makes nearly all your iCloud data impossible for even Apple to view, makes sense for you. Use passkeys, a high-security but easy-to-use website login system with industry-wide support. Block unknown (and unwanted) callers, iMessage senders, and phone calls, now including FaceTime. Protect your email by using Hide My Email, a iCloud+ tool to generate an address Apple manages and relays messages through for you—now including email used with Apple Pay transactions. Use Safari’s blocking techniques and how to review websites’ attempts to track you, including the latest improvements in iOS 17 and iPadOS 17. Use Communication Safety, a way to alert your children about sensitive images—but now also a tool to keep unsolicited and unwanted images of private parts from appearing on your devices. Understand why Apple might ask for your iPhone, iPad, or Mac password when you log in on a new device using two-factor authentication. Keep yourself safe when en route to a destination by creating a Check In partner who will be alerted if you don’t reach your intended end point or don’t respond within a period of time. Dig into Private Browsing’s several new features in iOS 17/iPadOS 17, designed to let you leave no trace of your identity or actions behind, while protecting your iPhone or iPad from prying eyes, too. Manage data usage across two phone SIMs (or eSIMS) at home and while traveling. Use a hardware encryption key to strongly protect your Apple ID account. Share a Wi-Fi password with nearby contacts and via a QR Code. Differentiate between encrypted data sessions and end-to-end encryption. Stream music and video to other devices with AirPlay 2. Use iCloud+’s Private Relay, a privacy-protecting browsing service that keeps your habits and locations from prying marketing eyes. Deter brute-force cracking by relying on an Accessories timeout for devices physically being plugged in that use USB and other standards. Configure Bluetooth devices. Enjoy enhanced AirDrop options that let you tap two iPhones to transfer files and continue file transfers over the internet when you move out of range. Protect Apple ID account and iCloud data from unwanted access at a regular level and via the new Safety Check, designed to let you review or sever digital connections with people you know who may wish you harm.

Building Information Modeling

2024-01-24 O'Reilly Amazon

book

Marie Bagieu , Régine Teulier

data data-engineering data-models GIS

This book presents how Building Information Modeling (BIM) and the use of shared representation of built assets facilitate design, construction and operation processes (ISO 19650). The modeling of public works data disrupts the art of construction. Written by both academics and engineers who are heavily involved in the French research project Modélisation des INformations INteropérables pour les INfrastructues Durables (MINnD) as well as in international standardization projects, this book presents the challenges of BIM from theoretical and practical perspectives. It provides knowledge for evolving in an ecosystem of federated models and common data environments, which are the basis of the platforms and data spaces. BIM makes it possible to handle interoperability very concretely, using open standards, which lead to openBIM. The use of a platform allows for the merging of business software and for approaches such as a Geographic Information System (GIS) to be added to the processes. In organizations, BIM meets the life cycles of structures and circular economy. It is not only a technique that reshapes cooperation and trades around a digital twin but can also disrupt organizations and business models.

IBM SAN Volume Controller Model SV3 Product Guide (for IBM Storage Virtualize V8.6)

2024-01-17 O'Reilly Amazon

book

Vasfi Gucer , Jon Herd , Hartmut Lonzer

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller Cloud Computing

This IBM® Redpaper® Product Guide describes the IBM SAN Volume Controller model SV3 solution, which is a next-generation IBM SAN Volume Controller. Built with IBM Storage Virtualize software and part of the IBM Storage family, IBM SAN Volume Controller is an enterprise-class storage system. It helps organizations achieve better data economics by supporting the large-scale workloads that are critical to success. Data centers often contain a mix of storage systems. This situation can arise as a result of company mergers or as a deliberate acquisition strategy. Regardless of how they arise, mixed configurations add complexity to the data center. Different systems have different data services, which make it difficult to move data from one to another without updating automation. Different user interfaces increase the need for training and can make errors more likely. Different approaches to hybrid cloud complicate modernization strategies. Also, many different systems mean more silos of capacity, which can lead to inefficiency. To simplify the data center and to improve flexibility and efficiency in deploying storage, enterprises of all types and sizes turn to IBM SAN Volume Controller, which is built with IBM Spectrum Virtualize software. This software simplifies infrastructure and eliminates differences in management, function, and even hybrid cloud support. IBM SAN Volume Controller introduces a common approach to storage management, function, replication, and hybrid cloud that is independent of storage type. It is the key to modernizing and revitalizing your storage, but is as easy to understand. IBM SAN Volume Controller provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Data-at-rest encryption Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including three-site replication for high availability (HA) This Redpaper applies to IBM Storage Virtualize V8.6.

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

2024-01-08 O'Reilly Amazon

book

Anna Bailliekova , Henrietta Dombrovskaya , Boris Novikov

data data-engineering relational-databases postgresql SQL

Write optimized queries. This book helps you write queries that perform fast and deliver results on time. You will learn that query optimization is not a dark art practiced by a small, secretive cabal of sorcerers. Any motivated professional can learn to write efficient queries from the get-go and capably optimize existing queries. You will learn to look at the process of writing a query from the database engine’s point of view, and know how to think like the database optimizer. The book begins with a discussion of what a performant system is and progresses to measuring performance and setting performance goals. It introduces different classes of queries and optimization techniques suitable to each, such as the use of indexes and specific join algorithms. You will learn to read and understand query execution plans along with techniques for influencing those plans for better performance. The book also covers advanced topics such as the use of functions and procedures, dynamic SQL, and generated queries. All of these techniques are then used together to produce performant applications, avoiding the pitfalls of object-relational mappers. This second edition includes new examples using Postgres 15 and the newest version of the PostgresAir database. It includes additional details and clarifications about advanced topics, and covers configuration parameters in greater depth. Finally, it makes use of advancements in NORM, using automatically generated functions. What You Will Learn Identify optimization goals in OLTP and OLAP systems Read and understand PostgreSQL execution plans Distinguish between short queries and long queries Choose the right optimization technique for each query type Identify indexes that will improve query performance Optimize full table scans Avoid the pitfalls of object-relational mapping systems Optimize the entire application rather than just database queries Who This Book Is For IT professionals working in PostgreSQL who want to develop performant and scalable applications, anyone whose job title contains the words “database developer” or “database administrator" or who is a backend developer charged with programming database calls, and system architects involved in the overall design of application systems running against a PostgreSQL database

Mastering MongoDB 7.0 - Fourth Edition

2024-01-05 O'Reilly Amazon

book

Malak Abu Hammad , Elie Hannouch , Leandro Domingues , Marko Aleksendrić , Arek Borucki , Rachelle Palmer , Rajesh Nair

data data-engineering nosql-databases MongoDB Data Management NoSQL

Mastering MongoDB 7.0 is your in-depth resource for learning MongoDB 7.0, the powerful NoSQL database designed for developers. Gain expertise in database architecture, data management, and modern features like MongoDB Atlas. By reading this book, you'll acquire the essential skills needed for building efficient, scalable, and secure applications. What this Book will help me do Develop expert-level skills in crafting advanced queries and managing complex data tasks in MongoDB. Learn to design efficient schemas and optimize indexing to maximize database performance. Integrate applications seamlessly with MongoDB Atlas, mastering its monitoring and backup tools. Implement robust security with RBAC, auditing strategies, and comprehensive encryption. Explore the latest MongoDB 7.0 features, including Atlas Vector Search, for modern applications. Author(s) Marko Aleksendrić, Arek Borucki, and co-authors are recognized MongoDB experts with years of hands-on experience. They bring together their expertise to deliver a practical guide filled with real-world insights that help developers advance their MongoDB skills. Their collaborative writing ensures comprehensive coverage of MongoDB 7.0 tools and techniques. Who is it for? This book is written for software developers, database administrators, and engineers who have intermediate knowledge of MongoDB and want to extend their expertise. Whether you are developing scalable applications, managing data systems, or ensuring database security, this book offers advanced guidance for achieving your professional goals with MongoDB.

Data Observability for Data Engineering

2023-12-29 O'Reilly Amazon

book

Michele Pinto , Sammy El Khammal

data data-engineering Analytics Data Engineering Data Quality Python

"Data Observability for Data Engineering" introduces you to the foundational concepts of observing and validating data pipeline health. With real-world projects and Python code examples, you'll gain hands-on experience in improving data quality and minimizing risks, enabling you to implement strategies that ensure accuracy and reliability in your data systems. What this Book will help me do Master data observability techniques to monitor and validate data pipelines effectively. Learn to collect and analyze meaningful metrics to gauge and improve data quality. Develop skills in Python programming specific to applying data concepts such as observable data state. Address scalability challenges using state-of-the-art observability frameworks and practices. Enhance your ability to manage and optimize data workflows ensuring seamless operation from start to end. Author(s) Authors Michele Pinto and Sammy El Khammal bring a wealth of experience in data engineering and observing scalable data systems. Pinto specializes in constructing robust analytics platforms while Khammal offers insights into integrating software observability into massive pipelines. Their collaborative writing style ensures readers find both practical advice and theoretical foundations. Who is it for? This book is geared toward data engineers, architects, and scientists who seek to confidently handle pipeline challenges. Whether you're addressing specific issues or wish to introduce proactive measures in your team, this guide meets the needs of those ready to leverage observability as a key practice.

Handbook of Geospatial Artificial Intelligence

2023-12-29 O'Reilly Amazon

book

Wenwen Li , Yingjie Hu , Song Gao

data data-engineering location-data geographic-information-system-gis geographic information system (gis) AI/ML

Geospatial Artificial Intelligence (GeoAI) is the integration of geospatial studies and AI using machine learning and deep learning technologies. This comprehensive handbook explains and discusses key fundamental concepts, methods, models, technologies of GeoAI, recent advances, research tools, and applications in different fields.

Redis Stack for Application Modernization

2023-12-29 O'Reilly Amazon

book

Mirko Ortensi , Luigi Fugaro

data data-engineering nosql-databases Redis DataViz Java

In "Redis Stack for Application Modernization," you will explore how the Redis Stack extends traditional Redis capabilities, allowing you to innovate in building real-time, scalable, multi-model applications. Through practical examples and hands-on sessions, this book equips you with skills to manage, implement, and optimize data flows and database features. What this Book will help me do Learn how to use Redis Stack for handling real-time data with JSON, hash, and other document types. Discover modern techniques for performing vector similarity searches and hybrid workflows. Become proficient in integrating Redis Stack with programming languages like Java, Python, and Node.js. Gain skills to configure Redis Stack server for scalability, security, and high availability. Master RedisInsight for data visualization, analysis, and efficient database management. Author(s) Luigi Fugaro and None Ortensi are experienced software professionals with deep expertise in database systems and application architecture. They bring years of experience working with Redis and developing real-world applications. Their hands-on approach to teaching and real-world examples make this book a valuable resource for professionals in the field. Who is it for? This book is ideal for database administrators, developers, and architects looking to leverage Redis Stack for real-time multi-model applications. It requires a basic understanding of Redis and any programming language such as Python or Java. If you wish to modernize your applications and efficiently manage databases, this book is for you.

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

2023-12-27 O'Reilly Amazon

book

Abhishek Mishra , Anjani Kumar , Sanjeev Kumar

data data-engineering storage-repositories data-warehouse AWS Azure

Design and architect new generation cloud-based data warehouses using Azure and AWS. This book provides an in-depth understanding of how to build modern cloud-native data warehouses, as well as their history and evolution. The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices in the modern world of distributed data processing. The authors walk you through the essential concepts of Data Mesh, Data Lake, Lakehouse, and Delta Lake. And they demonstrate the services and offerings available on Azure and AWS that deal with data orchestration, data democratization, data governance, data security, and business intelligence. After completing this book, you will be ready to design and architect enterprise-grade, cloud-based modern data warehouses using industry best practices and guidelines. What You Will Learn Understand the core concepts underlying modern data warehouses Design and build cloud-native data warehousesGain a practical approach to architecting and building data warehouses on Azure and AWS Implement modern data warehousing components such as Data Mesh, Data Lake, Delta Lake, and Lakehouse Process data through pandas and evaluate your model’s performance using metrics such as F1-score, precision, and recall Apply deep learning to supervised, semi-supervised, and unsupervised anomaly detection tasks for tabular datasets and time series applications Who This Book Is For Experienced developers, cloud architects, and technology enthusiasts looking to build cloud-based modern data warehouses using Azure and AWS

What is New in DFSMSrmm

2023-12-18 O'Reilly Amazon

book

Larry Coyne , Michael Scott , Ryan Bouchard , Samuel Smith , Karim Walji , Parker Mathewson

data data-engineering IBM

DFSMSrmm is an IBM z/OS feature that is a fully functioning tape management system to manage your removable media. In the last decade, many enhancements were made to DFSMSrmm. This IBM Redbooks publication is intended to help you configure and use the newer functions and features that are now available. Discussion of the new features is included along with use cases. Hints and tips of various common DFSMSrmm problems and useful configuration and reporting JCL also are included. This publication is intended as a supplement to DFSMSrmm Primer, SG24-5983, which is still the recommended starting point for any users new to DFSMSrmm.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Practical MongoDB Aggregations

Learn T-SQL Querying - Second Edition

Azure Data Factory Cookbook - Second Edition

Big Data Computing

IBM FlashSystem and VMware Implementation and Best Practices Guide

IBM TS7700 Release 5.3 Guide

IBM DS8000 Copy Services: Updated for IBM DS8000 Release 9.1

IBM Z Functional Matrix

IBM and CMTG Cyber Resiliency: Building an Automated, VMware Aware Safeguarded Copy Solution to Provide Data Resilience

Deciphering Data Architectures

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.6

Data Science and Machine Learning Applications in Subsurface Engineering

Mastering MongoDB 7.0 - Fourth Edition

Data Engineering with Scala and Spark

IBM Storage Fusion Multicloud Object Gateway

Take Control of iOS & iPadOS Privacy and Security, 4th Edition

Building Information Modeling

IBM SAN Volume Controller Model SV3 Product Guide (for IBM Storage Virtualize V8.6)

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

Mastering MongoDB 7.0 - Fourth Edition

Data Observability for Data Engineering

Handbook of Geospatial Artificial Intelligence

Redis Stack for Application Modernization

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

What is New in DFSMSrmm