O'Reilly Data Engineering Books

Learn PostgreSQL - Second Edition

2023-10-31 O'Reilly Amazon

book

Enrico Pirozzi , Luca Ferrari

data data-engineering relational-databases postgresql Cyber Security SQL

Learn PostgreSQL, a comprehensive guide to mastering PostgreSQL 16, takes readers on a journey from the fundamentals to advanced concepts, such as replication and database optimization. With hands-on exercises and practical examples, this book provides all you need to confidently use, manage, and build secure and scalable databases. What this Book will help me do Master the essentials of PostgreSQL 16, including advanced SQL features and performance tuning. Understand database replication methods and manage a scalable architecture. Enhance database security through roles, schemas, and strict privilege management. Learn how to personalize your experience with custom extensions and functions. Acquire practical skills in backup, restoration, and disaster recovery planning. Author(s) Luca Ferrari and Enrico Pirozzi are experienced database engineers and PostgreSQL enthusiasts with years of experience using and teaching PostgreSQL technology. They specialize in creating learning content that is practical and focused on real-world situations. Their writing emphasizes clarity and systematically equips readers with professional skills. Who is it for? This book is perfect for database professionals, software developers, and system administrators looking to develop their PostgreSQL expertise. Beginners with an interest in databases will also find this book highly approachable. Ideal for readers seeking to improve their database scalability and robustness. If you aim to hone practical PostgreSQL skills, this guide is essential.

Procedural Programming with PostgreSQL PL/pgSQL: Design Complex Database-Centric Applications with PL/pgSQL

2023-10-30 O'Reilly Amazon

book

Baji Shaik , Dinesh Kumar Chemuduru

data data-engineering relational-databases postgresql

Learn the fundamentals of PL/PGSQL, the programming language of PostgreSQL which is most robust Open Source Relational Database. This book provides practical insights into developing database code objects such as functions and procedures, with a focus on effectively handling strings, numbers, and arrays to achieve desired outcomes, and transaction management. The unique approach to handling Triggers in PostgreSQL ensures that both functionality and performance are maintained without compromise. You'll gain proficiency in writing inline/anonymous server-side code within the limitations, along with learning essential debugging and profiling techniques. Additionally, the book delves into statistical analysis of PL/PGSQL code and offers valuable knowledge on managing exceptions while writing code blocks. Finally, you'll explore the installation and configuration of extensions to enhance the performance of stored procedures and functions. What You'll Learn Understand the PL/PGSQL concepts Learn to debug, profile, and optimize PL/PGSQL code Study linting PL/PGSQL code Review transaction management within PL/PGSQL code Work with developer friendly features like operators, casts, and aggregators Who Is This Book For App developers, database migration consultants, and database administrators.

Designing a Modern Application Data Stack

2023-10-25 O'Reilly Amazon

book

Adam Morton , Brad Culberson , Kevin McGinley

data data-engineering Snowflake Analytics Cloud Computing

Today's massive datasets represent an unprecedented opportunity for organizations to build data-intensive applications. With this report, product leads, architects, and others who deal with applications and application development will explore why a cloud data platform is a great fit for data-intensive applications. You'll learn how to carefully consider scalability, data processing, and application distribution when making data app design decisions. Cloud data platforms are the modern infrastructure choice for data applications, as they offer improved scalability, elasticity, and cost efficiency. With a better understanding of data-intensive application architectures on cloud-based data platforms and the best practices outlined in this report, application teams can take full advantage of advances in data processing and app distribution to accelerate development, deployment, and adoption cycles. With this insightful report, you will: Learn why a modern cloud data platform is essential for building data-intensive applications Explore how scalability, data processing, and distribution models are key for today's data apps Implement best practices to improve application scalability and simplify data processing for efficiency gains Modernize application distribution plans to meet the needs of app providers and consumers About the authors: Adam Morton works with Intelligen Group, a Snowflake pure-play data and analytics consultancy. Kevin McGinley is technical director of the Snowflake customer acceleration team. Brad Culberson is a data platform architect specializing in data applications at Snowflake.

Cyber Resiliency with IBM Storage Sentinel and IBM Storage Safeguarded Copy

2023-10-23 O'Reilly Amazon

book

David Green , Vasfi Gucer , Thomas Gerisch , Axel Westphal , Nezih Boyacioglu , Gerd Franke , Daniel Thompson , Guillaume Legmar , Christopher Vollmar , Markus Standau

data data-engineering IBM AI/ML Oracle SAP

IBM Storage Sentinel is a cyber resiliency solution for SAP HANA, Oracle, and Epic healthcare systems, designed to help organizations enhance ransomware detection and incident recovery. IBM Storage Sentinel automates the creation of immutable backup copies of your data, then uses machine learning to detect signs of possible corruption and generate forensic reports that help you quickly diagnose and identify the source of the attack. Because IBM Storage Sentinel can intelligently isolate infected backups, your organization can identify the most recent verified and validated backup copies, greatly accelerating your time to recovery. This IBM Redbooks publication explains how to implement a cyber resiliency solution for SAP HANA, Oracle, and Epic healthcare systems using IBM Storage Sentinel and IBM Storage Safeguarded Copy. Target audience of this document is cyber security and storage specialists.

Delta Lake: Up and Running

2023-10-17 O'Reilly Amazon

book

Dan Davis , Bennie Haelen

data data-engineering storage-repositories delta-lake AI/ML Analytics

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADLS, and GCS. This practical book shows data engineers, data scientists, and data analysts how to get Delta Lake and its features up and running. The ultimate goal of building data pipelines and applications is to gain insights from data. You'll understand how your storage solution choice determines the robustness and performance of the data pipeline, from raw data to insights. You'll learn how to: Use modern data management and data engineering techniques Understand how ACID transactions bring reliability to data lakes at scale Run streaming and batch jobs against your data lake concurrently Execute update, delete, and merge commands against your data lake Use time travel to roll back and examine previous data versions Build a streaming data quality pipeline following the medallion architecture

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.5.3

2023-10-10 O'Reilly Amazon

book

James Whitaker , Bill Scales , Barry Whyte

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller Cloud Computing

IBM® Storage Virtualize based storage systems are secure storage platforms that implement various security-related features, in terms of system-level access controls and data-level security features. This document outlines the available security features and options of IBM Storage Virtualize based storage systems. It is not intended as a "how to" or best practice document. Instead, it is a checklist of features that can be reviewed by a user security team to aid in the definition of a policy to be followed when implementing IBM FlashSystem®, IBM SAN Volume Controller, and IBM Storage Virtualize for Public Cloud. IBM Storage Virtualize features the following levels of security to protect against threats and to keep the attack surface as small as possible: The first line of defense is to offer strict verification features that stop unauthorized users from using login interfaces and gaining access to the system and its configuration. The second line of defense is to offer least privilege features that restrict the environment and limit any effect if a malicious actor does access the system configuration. The third line of defense is to run in a minimal, locked down, mode to prevent damage spreading to the kernel and rest of the operating system. The fourth line of defense is to protect the data at rest that is stored on the system from theft, loss, or corruption (malicious or accidental). The topics that are discussed in this paper can be broadly split into two categories: System security: This type of security encompasses the first three lines of defense that prevent unauthorized access to the system, protect the logical configuration of the storage system, and restrict what actions users can perform. It also ensures visibility and reporting of system level events that can be used by a Security Information and Event Management (SIEM) solution, such as IBM QRadar®. Data security: This type of security encompasses the fourth line of defense. It protects the data that is stored on the system against theft, loss, or attack. These data security features include Encryption of Data At Rest (EDAR) or IBM Safeguarded Copy (SGC). This document is correct as of IBM Storage Virtualize 8.5.3.

Amazon Redshift: The Definitive Guide

2023-10-03 O'Reilly Amazon

book

Rajesh Francis , Rajiv Gupta , Milind Oke

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse. Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift. By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you: Build a cloud data strategy around Amazon Redshift as foundational data warehouse Get started with Amazon Redshift with simple-to-use data models and design best practices Understand how and when to use Redshift Serverless and Redshift provisioned clusters Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing Learn best practices for security, monitoring, resilience, and disaster recovery Leverage Amazon Redshift integration with other AWS services to unlock additional value

Geospatial Analysis with SQL

2023-10-03 O'Reilly Amazon

book

Bonny P McClain

data data-engineering location-data geographic-information-system-gis geographic information system (gis) GIS

"Geospatial Analysis with SQL" is a practical guide that teaches you how to use SQL for geospatial data analysis. With direct, actionable guidance, you will learn to explore and analyze data using geospatial techniques without needing additional programming. This book equips you with the knowledge to solve location-based queries and perform advanced geospatial operations. What this Book will help me do Master the fundamentals of geospatial analysis and learn the importance of location-based data. Develop skills in creating and manipulating spatial database objects in SQL. Gain proficiency in using tools such as PostGIS and QGIS for geospatial data analysis. Learn techniques to visualize spatial data effectively and communicate results. Perform both single-layer and multi-layer spatial analysis for complex real-world scenarios. Author(s) Bonny P. McClain, the author of "Geospatial Analysis with SQL", brings extensive experience as a spatial data analyst and GIS expert. Bonny specializes in helping practitioners make data-driven insights through geospatial techniques. With a passion for teaching, Bonny's goal is to make complex concepts accessible and practical for analysts and developers alike. Who is it for? This book is ideal for GIS analysts, data analysts, and data scientists who have a basic understanding of SQL and geospatial concepts and want to expand their analytical capabilities. Readers looking to perform professional-grade geospatial analysis using SQL will find this book especially valuable. It caters to professionals wishing to use their SQL skills to understand and work with spatial datasets effectively.

IBM SAN Volume Controller Best Practices and Performance Guidelines

2023-10-03 O'Reilly Amazon

book

Marcelo Avalos , Anil K Nayak , Jordan Fincher , Duane Bolland , David Green , Vasfi Gucer , Jon Herd , Chris Hoffmann , Sidney Varoni Junior , Sergey Kubin , Thales Noivo Ferreira , Barry Whyte , Jackson Shea , Antonio Rainero , Danilo Morelli Miyasiro

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller

This IBM® Redbooks® publication describes several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM SAN Volume Controller powered by IBM Spectrum® Virtualize V8.4. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools, and managed disks, volumes, Remote Copy services, and hosts. Then, it provides performance guidelines for IBM SAN Volume Controller, back-end storage, and applications. It explains how you can optimize disk performance with the IBM System Storage Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting IBM SAN Volume Controller. This book is intended for experienced storage, SAN, and IBM SAN Volume Controller administrators and technicians. Understanding this book requires advanced knowledge of the IBM SAN Volume Controller, IBM FlashSystem, and SAN environments.

IBM SAN Volume Controller Best Practices and Performance Guidelines for IBM Spectrum Virtualize Version 8.4.2

2023-10-03 O'Reilly Amazon

book

Carlton Beatty , Nils Olsson , Konrad Trojok , David Green , Vasfi Gucer , Hartmut Lonzer , Mandy Stevens , Uwe Schreiber , Renato Santos , Rene Oehme , Kendall Williams , Sergey Kubin , Nezih Boyacioglu , Jonathan Wilkie , Thales Noivo Ferreira , Antonio Rainero

data data-engineering IBM

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM SAN Volume Controller powered by IBM Spectrum® Virtualize Version 8.4.2. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, Remote Copy services and hosts. It explains how you can optimize disk performance with the IBM System Storage Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting. This book is intended for experienced storage, SAN, IBM FlashSystem®, IBM SAN Volume Controller, and IBM Storwize® administrators and technicians. Understanding this book requires advanced knowledge of these environments.

Practical Implementation of a Data Lake: Translating Customer Expectations into Tangible Technical Goals

2023-10-03 O'Reilly Amazon

book

Nayanjyoti Paul

data data-engineering storage-repositories data-lake AI/ML Data Lake

This book explains how to implement a data lake strategy, covering the technical and business challenges architects commonly face. It also illustrates how and why client requirements should drive architectural decisions. Drawing upon a specific case from his own experience, author Nayanjyoti Paul begins with the consideration from which all subsequent decisions should flow: what does your customer need? He also describes the importance of identifying key stakeholders and the key points to focus on when starting a new project. Next, he takes you through the business and technical requirement-gathering process, and how to translate customer expectations into tangible technical goals. From there, you’ll gain insight into the security model that will allow you to establish security and legal guardrails, as well as different aspects of security from the end user’s perspective. You’ll learn which organizational roles need to be onboarded into the data lake, their responsibilities, the services they need access to, and how the hierarchy of escalations should work. Subsequent chapters explore how to divide your data lakes into zones, organize data for security and access, manage data sensitivity, and techniques used for data obfuscation. Audit and logging capabilities in the data lake are also covered before a deep dive into designing data lakes to handle multiple kinds and file formats and access patterns. The book concludes by focusing on production operationalization and solutions to implement a production setup. After completing this book, you will understand how to implement a data lake, the best practices to employ while doing so, and will be armed with practical tips to solve business problems. What You Will Learn Understand the challenges associated with implementing a data lake Explore the architectural patterns and processes used to design a new data lake Design and implement data lake capabilities Associate business requirements with technical deliverables to drive success Who This Book Is For Data Scientists and Architects, Machine Learning Engineers, and Software Engineers.

Data Engineering and Data Science

2023-09-26 O'Reilly Amazon

book

Vinay Jha Pillai , M. Niranjanamurthy , Kukatlapalli Pradeep Kumar , Hari Murthy , Aynur Unal

data data-science AI/ML Data Collection Data Engineering Data Science

DATA ENGINEERING and DATA SCIENCE Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one-stop shop” for the concepts and applications of data science and engineering for data scientists across many industries. The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.

Database-Driven Web Development: Learn to Operate at a Professional Level with PERL and MySQL

2023-09-25 O'Reilly Amazon

book

Thomas Valentine

data data-engineering relational-databases MySQL

This book will teach you the essential knowledge required to be a successful and productive web developer with the ability to produce cutting-edge websites utilizing a database. This updated edition starts with the fundamentals of web development before delving into Perl and MySQL concepts such as script and database modelling, script-driven database interactions, content generation from a database, and information delivery from the server to the browser and vice versa. The only skills required to get the most from this book are basic knowledge of how the Internet works and a novice skill level with Perl and MySQL. The rest is intuitively presented code that most people can quickly and easily understand and employ. An extensive selection of practical, fully functional programming constructs in six different programming languages will give you the knowledge and tools required to create eye-catching, capable, and functionally impressive database-driven websites. Author Thomas Valentine has taken the concepts presented in the first edition of this book to new heights, offering in-depth discussions of each area of functionality required to develop fully formed database-driven web applications. He has expanded on the examples presented in the first edition and has included some very interesting and useful programming techniques for your consideration. Upon completing this book, you’ll have gained the benefit of the author’s decades worth of experience and will be able to apply your new knowledge and skills to your own projects. What You Will Learn Install, configure and use a trio of software packages (Apache Web Server, MySQL Database Server, and Perl Scripting Server) Create an effective web development workstation with databases in mind Use the PERL scripting language and MySQL databases effectively Maximize the Apache Web Server Who This Book Is For Those who already know web development basics and web developers who want to master database-driven web development. The skills required to understand the concepts put forth in this book are a working knowledge of PERL and basic MySQL.

The Unrealized Opportunities with Real-Time Data

2023-09-25 O'Reilly Amazon

book

Federico Castanedo

data data-engineering AI/ML Analytics Data Science Data Streaming

The amount of data generated from various processes and platforms has increased exponentially in the past decade, and the challenges of filtering useful data out of streams of raw data has become even greater. Meanwhile, the essence of making useful insights from that data has become even more important. In this incisive report, Federico Castanedo examines the challenges companies face when acting on data at rest as well as the benefits you unlock when acting on data as it's generated. Data engineers, enterprise architects, CTOs, and CIOs will explore the tools, processes, and mindset your company needs to process streaming data in real time. Learn how to make quick data-driven decisions to gain an edge on competitors. This report helps you: Explore gaps in today's real-time data architectures, including the limitations of real-time analytics to act on data immediately Examine use cases that can't be served efficiently with real-time analytics Understand how stream processing engines work with real-time data Learn how distributed data processing architectures, stream processing, streaming analytics, and event-based architectures relate to real-time data Understand how to transition from traditional batch processing environments to stream processing Federico Castanedo is an academic director and adjunct professor at IE University in Spain. A data science and AI leader, he has extensive experience in academia, industry, and startups.

Learning and Operating Presto

2023-09-21 O'Reilly Amazon

book

Tim Meehan , Ying Su , Vivek Bharathan , Angelica Lo Duca

data data-engineering Hadoop Presto BI Cloud Computing

The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside. Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production. With this book, you will: Learn how to install and configure Presto Use Presto with business intelligence tools Understand how to connect Presto to a variety of data sources Extend Presto for real-time business insight Learn how to apply best practices and tuning Get troubleshooting tips for logs, error messages, and more Explore Presto's architectural concepts and usage patterns Understand Presto security and administration

Kafka Connect

2023-09-18 O'Reilly Amazon

book

Kate Stanley , Mickael Maison

data data-engineering streaming-messaging Kafka kafka-connect Data Streaming

Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time. With this practical guide, authors Mickael Maison and Kate Stanley show data engineers, site reliability engineers, and application developers how to build data pipelines between Kafka clusters and a variety of data sources and sinks. Kafka Connect allows you to quickly adopt Kafka by tapping into existing data and enabling many advanced use cases. No matter where you are in your event streaming journey, Kafka Connect is the ideal tool for building a modern data pipeline. Learn Kafka Connect's capabilities, main concepts, and terminology Design data and event streaming pipelines that use Kafka Connect Configure and operate Kafka Connect environments at scale Deploy secured and highly available Kafka Connect clusters Build sink and source connectors and single message transforms and converters

Building Real-Time Analytics Systems

2023-09-14 O'Reilly Amazon

book

Mark Needham

data data-engineering streaming-messaging real-time-analytics Analytics AWS

Gain deep insight into real-time analytics, including the features of these systems and the problems they solve. With this practical book, data engineers at organizations that use event-processing systems such as Kafka, Google Pub/Sub, and AWS Kinesis will learn how to analyze data streams in real time. The faster you derive insights, the quicker you can spot changes in your business and act accordingly. Author Mark Needham from StarTree provides an overview of the real-time analytics space and an understanding of what goes into building real-time applications. The book's second part offers a series of hands-on tutorials that show you how to combine multiple software products to build real-time analytics applications for an imaginary pizza delivery service. You will: Learn common architectures for real-time analytics Discover how event processing differs from real-time analytics Ingest event data from Apache Kafka into Apache Pinot Combine event streams with OLTP data using Debezium and Kafka Streams Write real-time queries against event data stored in Apache Pinot Build a real-time dashboard and order tracking app Learn how Uber, Stripe, and Just Eat use real-time analytics

Art of Custom Sneakers

2023-09-12 O'Reilly Amazon

book

Xavier Kickz

data data-engineering xquery

With Art of Custom Sneakers, learn to make your own one-of-a-kind kicks from YouTube superstar Xavier Kickz.

Practical MongoDB Aggregations

2023-09-07 O'Reilly Amazon

book

Paul Done

data data-engineering nosql-databases MongoDB

Practical MongoDB Aggregations serves as the definitive guide to mastering aggregation pipelines within MongoDB 7.0. Officially endorsed by MongoDB, Inc., this book provides streamlined strategies and practical examples to help you achieve complex data manipulation and analytical tasks, ultimately enhancing your database operation proficiency. What this Book will help me do Understand the architecture of the MongoDB aggregation framework to build scalable pipelines. Design and implement optimized aggregation pipelines for high performance. Learn practical techniques for processing large datasets efficiently using sharding. Apply data processing directly within MongoDB to minimize external workflows. Master handling arrays and securing data through well-designed pipelines. Author(s) Paul Done is an experienced software engineer with in-depth expertise in MongoDB and database systems. With years of professional experience managing and optimizing databases, Paul draws from real-world scenarios to devise effective strategies for learning MongoDB's advanced features. His approachable and instructional writing style empowers developers, engineers, and analysts to reach their full potential. Who is it for? This book is perfect for developers, database architects, and data engineers who have a foundational understanding of MongoDB and are looking to deepen their practical skills in using aggregation pipelines. Professionals who want to perform efficient data processing and gain insights into MongoDB's advanced features will find this guide invaluable. If you wish to streamline analytical tasks, optimize performance, and work efficiently with MongoDB's latest functionalities, this book is tailored for you.

Leveling Up with SQL: Advanced Techniques for Transforming Data into Insights

2023-09-04 O'Reilly Amazon

book

Mark Simon

data data-engineering SQL

Learn to write SQL queries to select and analyze data, and improve your ability to manipulate data. This book will help you take your existing skills to the next level. Author Mark Simon kicks things off with a quick review of basic SQL knowledge, followed by a demonstration of how efficient SQL databases are designed and how to extract just the right data from them. You’ll then learn about each individual table’s structure and how to work with the relationships between tables. As you progress through the book, you will learn more sophisticated techniques such as using common table expressions and subqueries, analyzing your data using aggregate and windowing functions, and how to save queries in the form of views and other methods. This book employs an accessible approach to work through a realistic sample, enabling you to learn concepts as they arise to improve parts of the database or to work with the data itself. After completing this book, you will have a more thorough understanding of database structure and how to use advanced techniques to extract, manage, and analyze data. What Will You Learn Gain a stronger understanding of database design principles, especially individual tables Understand the relationships between tables Utilize techniques such as views, subqueries, common table expressions, and windowing functions Who Is This Book For: SQL Databases users who want to improve their knowledge and techniques.

IBM Storage as a Service Offering Guide

2023-08-31 O'Reilly Amazon

book

Vasfi Gucer , Hartmut Lonzer

data data-engineering IBM Cloud Computing Cloud Storage

IBM® Storage as a Service (STaaS) extends your hybrid cloud experience with a new flexible consumption model enabled for both your on-premises and hybrid cloud infrastructure needs, giving you the agility, cash flow efficiency, and services of cloud storage with the flexibility to dynamically scale up or down and only pay for what you use beyond the minimal capacity. This IBM Redpaper provides a detailed introduction to the IBM STaaS service. The paper is targeted for data center managers and storage administrators.

IBM Power E1050: Technical Overview and Introduction

2023-08-30 O'Reilly Amazon

book

Guido Somers , Scott Vetter , Tsvetomir Spasov , Marc Gregorutti , Michael Malicdem , Stephen Lutz , Giuliano Anselmi

data data-engineering IBM Cloud Computing Linux Marketing

This IBM® Redpaper publication is a comprehensive guide that covers the IBM Power E1050 server (9043-MRX) that uses the latest IBM Power10 processor-based technology and supports IBM AIX® and Linux operating systems (OSs). The goal of this paper is to provide a hardware architecture analysis and highlight the changes, new technologies, and major features that are being introduced in this system, such as: The latest IBM Power10 processor design, including the dual-chip module (DCM) packaging, which is available in various configurations from 12 - 24 cores per socket. Support of up to 16 TB of memory. Native Peripheral Component Interconnect Express (PCIe) 5th generation (Gen5) connectivity from the processor socket to deliver higher performance and bandwidth for connected adapters. Open Memory Interface (OMI) connected Differential Dual Inline Memory Module (DDIMM) memory cards delivering increased performance, resiliency, and security over industry-standard memory technologies, including transparent memory encryption. Enhanced internal storage performance with the use of native PCIe-connected Non-volatile Memory Express (NVMe) devices in up to 10 internal storage slots to deliver up to 64 TB of high-performance, low-latency storage in a single 4-socket system. Consumption-based pricing in the Power Private Cloud with Shared Utility Capacity commercial model to allow customers to consume resources more flexibly and efficiently, including AIX, Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server, and Red Hat OpenShift Container Platform workloads. This publication is for professionals who want to acquire a better understanding of IBM Power products. The intended audience includes: IBM Power customers Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the set of IBM Power documentation by providing a desktop reference that offers a detailed technical description of the Power E1050 Midrange server model. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions..

IBM Power E1080 Technical Overview and Introduction

2023-08-30 O'Reilly Amazon

book

Ivaylo Bozhinov , Scott Vetter , Dinil Das , Giuliano Anselmi , Manish Arora , Madison Lee , Bartlomiej Grabowski , Armin Röll , Turgut Genc

data data-engineering IBM Cloud Computing Linux Marketing

This IBM® Redpaper® publication provides a broad understanding of a new architecture of the IBM Power® E1080 (also known as the Power E1080) server that supports IBM AIX®, IBM i, and selected distributions of Linux operating systems. The objective of this paper is to introduce the Power E1080, the most powerful and scalable server of the IBM Power portfolio, and its offerings and relevant functions: Designed to support up to four system nodes and up to 240 IBM Power10™ processor cores The Power E1080 can be initially ordered with a single system node or two system nodes configuration, which provides up to 60 Power10 processor cores with a single node configuration or up to 120 Power10 processor cores with a two system nodes configuration. More support for a three or four system nodes configuration is to be added on December 10, 2021, which provides support for up to 240 Power10 processor cores with a full combined four system nodes server. Designed to supports up to 64 TB memory The Power E1080 can be initially ordered with the total memory RAM capacity up to 8 TB. More support is to be added on December 10, 2021 to support up to 64 TB in a full combined four system nodes server. Designed to support up to 32 Peripheral Component Interconnect® (PCIe) Gen 5 slots in a full combined four system nodes server and up to 192 PCIe Gen 3 slots with expansion I/O drawers The Power E1080 supports initially a maximum of two system nodes; therefore, up to 16 PCIe Gen 5 slots, and up to 96 PCIe Gen 3 slots with expansion I/O drawer. More support is to be added on December 10, 2021, to support up to 192 PCIe Gen 3 slots with expansion I/O drawers. Up to over 4,000 directly attached serial-attached SCSI (SAS) disks or solid-state drives (SSDs) Up to 1,000 virtual machines (VMs) with logical partitions (LPARs) per system System control unit, providing redundant system master Flexible Service Processor (FSP) Supports IBM Power System Private Cloud Solution with Dynamic Capacity This publication is for professionals who want to acquire a better understanding of Power servers. The intended audience includes the following roles: Customers Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

Serverless Machine Learning with Amazon Redshift ML

2023-08-30 O'Reilly Amazon

book

Debu Panda , Bhanu Pittampally , Sumeet Joshi , Phil Bates

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Serverless Machine Learning with Amazon Redshift ML provides a hands-on guide to using Amazon Redshift Serverless and Redshift ML for building and deploying machine learning models. Through SQL-focused examples and practical walkthroughs, you will learn efficient techniques for cloud data analytics and serverless machine learning. What this Book will help me do Grasp the workflow of building machine learning models with Redshift ML using SQL. Learn to handle supervised learning tasks like classification and regression. Apply unsupervised learning techniques, such as K-means clustering, in Redshift ML. Develop time-series forecasting models within Amazon Redshift. Understand how to operationalize machine learning in serverless cloud architecture. Author(s) Debu Panda, Phil Bates, Bhanu Pittampally, and Sumeet Joshi are seasoned professionals in cloud computing and machine learning technologies. They combine deep technical knowledge with teaching expertise to guide learners through mastering Amazon Redshift ML. Their collaborative approach ensures that the content is accessible, engaging, and practically applicable. Who is it for? This book is perfect for data scientists, machine learning engineers, and database administrators using or intending to use Amazon Redshift. It's tailored for professionals with basic knowledge of machine learning and SQL who aim to enhance their efficiency and specialize in serverless machine learning within cloud architectures.

Building a Fast Universal Data Access Platform

2023-08-25 O'Reilly Amazon

book

Christopher Gardner

data data-engineering ETL/ELT IoT Cyber Security Tableau

Your company relies on data to succeed—data that traditionally comes from a business's transactional processes, pulled from the transaction systems through an extract-transform-load (ETL) process into a warehouse for reporting purposes. But this data flow is no longer sufficient given the growth of the internet of things (IOT), web commerce, and cybersecurity. How can your company keep up with today's increasing magnitude of data and insights? Organizations that can no longer rely on data generated by business processes are looking outside their workflow for information on customer behavior, retail patterns, and industry trends. In this report, author Christopher Gardner examines the challenges of building a framework that provides universal access to data. You will: Learn the advantages and challenges of universal data access, including data diversity, data volume, and the speed of analytic operations Discover how to build a framework for data diversity and universal access Learn common methods for improving database and performance SLAs Examine the organizational requirements that a fast universal data access platform must meet Explore a case study that demonstrates how components work together to form a multiaccess, high-volume, high-performance interface About the author: Christopher Gardner is the campus Tableau application administrator at the University of Michigan, controlling security, updates, and performance maintenance.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Learn PostgreSQL - Second Edition

Procedural Programming with PostgreSQL PL/pgSQL: Design Complex Database-Centric Applications with PL/pgSQL

Designing a Modern Application Data Stack

Cyber Resiliency with IBM Storage Sentinel and IBM Storage Safeguarded Copy

Delta Lake: Up and Running

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.5.3

Amazon Redshift: The Definitive Guide

Geospatial Analysis with SQL

IBM SAN Volume Controller Best Practices and Performance Guidelines

IBM SAN Volume Controller Best Practices and Performance Guidelines for IBM Spectrum Virtualize Version 8.4.2

Practical Implementation of a Data Lake: Translating Customer Expectations into Tangible Technical Goals

Data Engineering and Data Science

Database-Driven Web Development: Learn to Operate at a Professional Level with PERL and MySQL

The Unrealized Opportunities with Real-Time Data

Learning and Operating Presto

Kafka Connect

Building Real-Time Analytics Systems

Art of Custom Sneakers

Practical MongoDB Aggregations

Leveling Up with SQL: Advanced Techniques for Transforming Data into Insights

IBM Storage as a Service Offering Guide

IBM Power E1050: Technical Overview and Introduction

IBM Power E1080 Technical Overview and Introduction

Serverless Machine Learning with Amazon Redshift ML

Building a Fast Universal Data Access Platform