O'Reilly Data Engineering Books

Practical Lakehouse Architecture

2024-07-31 O'Reilly Amazon

book

Gaurav Ashok Thalpati

data data-engineering storage-repositories data-lake AI/ML BI

This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures. Practical Lakehouse Architecture shows you how to: Understand key lakehouse concepts and features like transaction support, time travel, and schema evolution Understand the differences between traditional and lakehouse data architectures Differentiate between various file formats and table formats Design lakehouse architecture layers for storage, compute, metadata management, and data consumption Implement data governance and data security within the platform Evaluate technologies and decide on the best technology stack to implement the lakehouse for your use case Make critical design decisions and address practical challenges to build a future-ready data platform Start your lakehouse implementation journey and migrate data from existing systems to the lakehouse

Database Design and Modeling with PostgreSQL and MySQL

2024-07-26 O'Reilly Amazon

book

Alkin Tezuysal , Ibrar Ahmed

data data-engineering relational-databases postgresql Data Modelling MySQL

Discover how to design and optimize modern databases efficiently using PostgreSQL and MySQL. This book guides you through database design for scalability and performance, covering data modeling, query optimization, and real-world application integration. What this Book will help me do Build efficient and scalable relational database schemas for real-world applications. Master data modeling with normalization and denormalization techniques. Understand query optimization strategies for better database performance. Learn database strategies such as sharding, replication, and backup management. Integrate relational databases with applications and explore future database trends. Author(s) Alkin Tezuysal and Ibrar Ahmed are seasoned database professionals with decades of experience. Alkin specializes in database scalability and performance, while Ibrar brings expertise in database systems and development. Together, they bring a hands-on approach, providing clear and insightful guidance for database professionals. Who is it for? This book is oriented towards software developers, database administrators, and IT professionals looking to enhance their knowledge in database design using PostgreSQL and MySQL. Beginners in database design will find its structured approach approachable. Advanced professionals will appreciate its depth on cutting-edge topics and practical optimizations.

Big Data on Kubernetes

2024-07-19 O'Reilly Amazon

book

Neylson Crepalde

data data-engineering streaming-messaging Kafka Airflow BI

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.

Information Modeling and Relational Databases, 3rd Edition

2024-07-09 O'Reilly Amazon

book

Tony Morgan , Terry Halpin

data data-engineering relational-databases NoSQL RDBMS SQL

Information Modeling and Relational Databases, Third Edition, provides an introduction to ORM (Object-Role Modeling) and much more. In fact, it is the only book to go beyond introductory coverage and provide all of the in-depth instruction you need to transform knowledge from domain experts into a sound database design. This book is intended for anyone with a stake in the accuracy and efficacy of databases: systems analysts, information modelers, database designers and administrators, and programmers. Dr. Terry Halpin and Dr. Tony Morgan, pioneers in the development of ORM, blend conceptual information with practical instruction that will let you begin using ORM effectively as soon as possible. The all-new Third Edition includes coverage of advances and improvements in ORM and UML, nominalization, relational mapping, SQL, XML, data interchange, NoSQL databases, ontological modeling, and post-relational databases. Supported by examples, exercises, and useful background information, the authors’ step-by-step approach teaches you to develop a natural-language-based ORM model, and then, where needed, abstract ER and UML models from it. This book will quickly make you proficient in the modeling technique that is proving vital to the development of accurate and efficient databases that best meet real business objectives. "This book is an excellent introduction to both information modeling in ORM and relational databases. The book is very clearly written in a step-by-step manner and contains an abundance of well-chosen examples illuminating practice and theory in information modeling. I strongly recommend this book to anyone interested in conceptual modeling and databases." — Dr. Herman Balsters, Director of the Faculty of Industrial Engineering, University of Groningen, The Netherlands Presents the most in-depth coverage of object-role modeling, including a thorough update of the book for the latest versions of ORM, ER, UML, OWL, and BPMN modeling. Includes clear coverage of relational database concepts as well as the latest developments in SQL, XML, information modeling, data exchange, and schema transformation. Case studies and a large number of class-tested exercises are provided for many topics. Includes all-new chapters on data file formats and NoSQL databases.

IBM FlashCore Module (FCM) Product Guide: Features the newly available FCM4 with AI-powered ransomware detection

2024-07-03 O'Reilly Amazon

book

Vasfi Gucer , Jon Herd , Hartmut Lonzer

data data-engineering IBM AI/ML

This IBM® Redpaper® Product Guide describes the IBM FlashCore Module (FCM) history, a general overview and then a deeper dive on the way IBM leads the field in the adoption of high speed, low latency storage. The IBM FlashCore Module is used in the latest IBM FlashSystem® solutions, which is are next-generation IBM FlashSystem control enclosures. The IBM FlashCore Module combines the performance of flash and a Non-Volatile Memory Express (NVMe) optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) with IBM Storage Virtualize software.

SAP HANA on IBM Power Systems Architectural Summary

2024-07-03 O'Reilly Amazon

book

John F. Aristizabal , Neelabha Banerjee , Henry Vo , Corie Neri , Youssef Largou , Tim Simon

data data-engineering relational-databases sap-hana IBM Linux

This IBM Redpaper publication delivers SAP HANA architectural concepts for successful implementation on IBM Power Systems servers. This update is designed to introduce the Power10 product line and how it enhances support for SAP HANA. Also discussed is the addition of support for Red Hat Enterprise Linux as a supported operating system for SAP workloads. This publication addresses topics for sellers, IT architects, IT specialists, and anyone who wants to understand how to take advantage of running SAP HANA workloads on Power Systems servers. Moreover, this document provides information to transfer how-to skills to the technical teams, and it provides solution guidance to the sales team. This publication complements documentation that is available at IBM Knowledge Center, and it aligns with educational materials that are provided by IBM Systems.

Forms and Functions of Meta-Discourse

2024-07-01 O'Reilly Amazon

book

Maria Cristina Lo Baido

data data-engineering metadata

This book constitutes the first systematic analysis of meta-discourse in the spoken domain, addressing the question of how, why, and when speakers switch from discourse to meta-discourse by means of comment clauses (e.g., ‘I think’). The case of Present-day Italian is considered, exploring the internal properties of comment clauses (e.g., morphosyntax and semantics of the verb), their relations with the surrounding discourse (e.g., position of comment clause), and their prosodic profiles. This study shows that speakers recur to meta-discourse to convey a non-random set of functions, having mainly to do with the online process of reference construction (e.g., approximation and reformulation) and with the degree of speaker’s commitment (e.g., epistemicity and emphasis). Comment clauses are also used as attention-getting or topic-resuming devices, though less frequently. One of the most interesting results of this study is the identification of a close relation between meta-discourse and stance-taking in spoken domain, with speakers recurring to comment clauses to convey their attitude. Finally, meta-discourse turns out to be highly influenced, if not constrained, by universal properties of the spoken domain (i.e., non-linearity).

Data Migration Management for SAP S/4HANA: A Practical Guide

2024-06-30 O'Reilly Amazon

book

Aleksei Arziaev

data data-engineering SAP

Enhance your data transfer and storage skills with this comprehensive step-by-step guide to managing data migration for new on-premises SAP S/4HANA implementations. This book is tailored towards small to large projects, with a focus on the managerial aspects of the data migration process rather than the technical details. You’ll follow a project-led approach, enriched with a practical case study, and a comprehensive methodology for data migration planning and documentation. Athen traverse through a detailed plan on managing and documenting data migration throughout the project lifecycle. This book utilizes the general SAP Activate methodology for on-premises solutions as its foundational framework, enhancing it with specific strategies for data migration. Structured in alignment with the project phases of the SAP Activate methodology, Data Migration Management for SAP S/4HANA methodically covers planning, organizing, and controlling the data migration process. It serves as an essential guide for professionals tasked with implementing SAP S/4HANA in their business, ensuring a thorough understanding of each data migration phase on the project. What You'll Learn Significantly decrease the time needed for both the preparation and execution of data migration activities. Foster clear transparency in data migration processes for all stakeholders, including the customer and the project team. Facilitate a seamless and timely data migration process. Establish a benchmark for data migration management in future projects. Address and remedy any deficiencies in the SAP Activate methodology pertaining to data migration. Who This Book Is For SAP projects and data migration workstreams leads, already well-versed in SAP Activate methodology and possessing moderate experience in project and workstream management, who are seeking to enhance their skills in professionally managing data migration in implementation projects.

Elastic Stack 8.x Cookbook

2024-06-28 O'Reilly Amazon

book

Yazid Akadiri , Huage Chen

data data-engineering search elasticsearch elastic-stack-elk-stack elastic stack (elk stack)

Unlock the potential of the Elastic Stack with the "Elastic Stack 8.x Cookbook." This book provides over 80 hands-on recipes, guiding you through ingesting, processing, and visualizing data using Elasticsearch, Logstash, Kibana, and more. You'll also explore advanced features like machine learning and observability to create data-driven applications with ease. What this Book will help me do Implement a robust workflow for ingesting, transforming, and visualizing diverse datasets. Utilize Kibana to create insightful dashboards and visual analytics. Leverage Elastic Stack's AI capabilities, such as natural language processing and machine learning. Develop search solutions and integrate advanced features like vector search. Monitor and optimize your Elastic Stack deployments for performance and security. Author(s) Huage Chen and Yazid Akadiri are experienced professionals in the field of Elastic Stack. They bring years of practical experience in data engineering, observability, and software development. Huage and Yazid aim to provide a clear, practical pathway for both beginners and experienced users to get the most out of the Elastic Stack's capabilities. Who is it for? This book is perfect for developers, data engineers, and observability practitioners looking to harness the power of Elastic Stack. It caters to both beginners and experts, providing clear instructions to help readers understand and implement powerful data solutions. If you're working with search applications, data analysis, or system observability, this book is an ideal resource.

Hands-On MySQL Administration

2024-06-28 O'Reilly Amazon

book

Arunjith Aravindan , Jeyaram Ayyalusamy

data data-engineering relational-databases MySQL Aurora Amazon RDS

Geared to intermediate- to advanced-level DBAs and IT professionals looking to enhance their MySQL skills, this guide provides a comprehensive overview on how to manage and optimize MySQL databases. You'll learn how to create databases and implement backup and recovery, security configurations, high availability, scaling techniques, and performance tuning. Using practical techniques, tips, and real-world examples, authors Arunjith Aravindan and Jeyaram Ayyalusamy show you how to deploy and manage MySQL, Amazon RDS, Amazon Aurora, and Azure MySQL. By the end of the book, you'll have the knowledge and skills necessary to administer, manage, and optimize MySQL databases effectively. Design and implement a scalable and reliable database infrastructure using MySQL 8 on premises and cloud Install and configure software, manage user accounts, and optimize database performance Use backup and recovery strategies, security measures, and high availability solutions Apply best practices for database schema design, indexing strategies, and replication techniques Implement advanced database features and techniques such as replication, clustering, load balancing, and high availability Troubleshoot common issues and errors, using diagnostic tools and techniques to identify and resolve problems quickly and efficiently Facilitate major MySQL upgrades including MySQL 5.7 to MySQL 8

High Performance PostgreSQL for Rails

2024-06-17 O'Reilly Amazon

book

Andrew Atkinson

data data-engineering relational-databases postgresql Docker Linux

Build faster, more reliable Rails apps by taking the best advanced PostgreSQL and Active Record capabilities, and using them to solve your application scale and growth challenges. Gain the skills needed to comfortably work with multi-terabyte databases, and with complex Active Record, SQL, and specialized Indexes. Develop your skills with PostgreSQL on your laptop, then take them into production, while keeping everything in sync. Make slow queries fast, perform any schema or data migration without errors, use scaling techniques like read/write splitting, partitioning, and sharding, to meet demanding workload requirements from Internet scale consumer apps to enterprise SaaS. Deepen your firsthand knowledge of high-scale PostgreSQL databases and Ruby on Rails applications with dozens of practical and hands-on exercises. Unlock the mysteries surrounding complex Active Record. Make any schema or data migration change confidently, without downtime. Grow your experience with modern and exclusive PostgreSQL features like SQL Merge, Returning, and Exclusion constraints. Put advanced capabilities like Full Text Search and Publish Subscribe mechanisms built into PostgreSQL to work in your Rails apps. Improve the quality of the data in your database, using the advanced and extensible system of types and constraints to reduce and eliminate application bugs. Tackle complex topics like how to improve query performance using specialized indexes. Discover how to effectively use built-in database functions and write your own, administer replication, and make the most of partitioning and foreign data wrappers. Use more than 40 well-supported open source tools to extend and enhance PostgreSQL and Ruby on Rails. Gain invaluable insights into database administration by conducting advanced optimizations - including high-impact database maintenance - all while solving real-world operational challenges. Take your new skills into production today and then take your PostgreSQL and Rails applications to a whole new level of reliability and performance. What You Need: A computer running macOS, Linux, or Windows and WSL2 PostgreSQL version 16, installed by package manager, compiled, or running with Docker An Internet connection

Databricks Certified Associate Developer for Apache Spark Using Python

2024-06-14 O'Reilly Amazon

book

Saba Shah

data data-engineering apache-spark Analytics API Big Data

This book serves as the ultimate preparation for aspiring Databricks Certified Associate Developers specializing in Apache Spark. Deep dive into Spark's components, its applications, and exam techniques to achieve certification and expand your practical skills in big data processing and real-time analytics using Python. What this Book will help me do Deeply understand Apache Spark's core architecture for building big data applications. Write optimized SQL queries and leverage Spark DataFrame API for efficient data manipulation. Apply advanced Spark functions, including UDFs, to solve complex data engineering tasks. Use Spark Streaming capabilities to implement real-time and near-real-time processing solutions. Get hands-on preparation for the certification exam with mock tests and practice questions. Author(s) Saba Shah is a seasoned data engineer with extensive experience working at Databricks and leading data science teams. With her in-depth knowledge of big data applications and Spark, she delivers clear, actionable insights in this book. Her approach emphasizes practical learning and real-world applications. Who is it for? This book is ideal for data professionals such as engineers and analysts aiming to achieve Databricks certification. It is particularly helpful for individuals with moderate Python proficiency who are keen to understand Spark from scratch. If you're transitioning into big data roles, this guide prepares you comprehensively.

Data Engineering with Databricks Cookbook

2024-05-31 O'Reilly Amazon

book

Pulkit Chadha

data data-engineering Big Data Cloud Computing Data Engineering Data Governance

In "Data Engineering with Databricks Cookbook," you'll learn how to efficiently build and manage data pipelines using Apache Spark, Delta Lake, and Databricks. This recipe-based guide offers techniques to transform, optimize, and orchestrate your data workflows. What this Book will help me do Master Apache Spark for data ingestion, transformation, and analysis. Learn to optimize data processing and improve query performance with Delta Lake. Manage streaming data processing with Spark Structured Streaming capabilities. Implement DataOps and DevOps workflows tailored for Databricks. Enforce data governance policies using Unity Catalog for scalable solutions. Author(s) Pulkit Chadha, the author of this book, is a Senior Solutions Architect at Databricks. With extensive experience in data engineering and big data applications, he brings practical insights into implementing modern data solutions. His educational writings focus on empowering data professionals with actionable knowledge. Who is it for? This book is ideal for data engineers, data scientists, and analysts who want to deepen their knowledge in managing and transforming large datasets. Readers should have an intermediate understanding of SQL, Python programming, and basic data architecture concepts. It is especially well-suited for professionals working with Databricks or similar cloud-based data platforms.

The Ultimate Guide to Snowpark

2024-05-30 O'Reilly Amazon

book

Vivekanandan SS , Shankar Narayanan SGS

data data-engineering Snowflake AI/ML Cloud Computing Data Engineering

The Ultimate Guide to Snowpark serves as a comprehensive resource to help you master the Snowflake Snowpark framework using Python. You'll learn how to manage data engineering, data science, and data applications in Snowpark, coupled with practical implementations and examples. By following this guide, you'll gain the skills needed to efficiently process and analyze data in the Snowflake Data Cloud. What this Book will help me do Master Snowpark with Python for data engineering, data science, and data application workloads. Develop and deploy robust data pipelines using Snowpark in Python. Design, implement, and produce machine learning models using Snowpark. Learn to monetize and operationalize Snowflake-native applications. Effectively adopt Snowpark in production for scalable, efficient data solutions. Author(s) Shankar Narayanan SGS and Vivekanandan SS are experienced professionals in data engineering and Snowflake technologies. Shankar has extensive experience in utilizing Snowflake Snowpark to manage and enhance data solutions. Vivekanandan brings expertise in the intersection of Python programming and cloud-based data processing. Together, their combined knowledge and approachable writing style make this book an invaluable resource to readers. Who is it for? This book is designed for data engineers, data scientists, developers, and seasoned data practitioners. Ideal candidates are those looking to expand their skills in implementing Snowpark solutions using Python. A prior understanding of SQL, Python programming, and familiarity with Snowflake is beneficial for readers to fully leverage the techniques presented.

Tuning the Snowflake Data Cloud: Optimizing Your Data Platform to Minimize Cost and Maximize Performance

2024-05-28 O'Reilly Amazon

book

Andrew Carruthers

data data-engineering Snowflake Cloud Computing

This project-oriented book presents a hands-on approach to identifying migration and performance issues with experience drawn from real-world examples. As you work through the book, you will develop skills, knowledge, and deep understanding of Snowflake tuning options and capabilities while preparing for later incorporation of additional Snowflake features as they become available. Your Snowflake platform will cost less to run and will improve your customer experience. Written by a seasoned Snowflake practitioner, this book is full of practical, hands-on guidance and advice specifically designed to further accelerate your Snowflake journey. Tuning the Snowflake Data Cloud provides you a pathway to success by equipping you with the skills, knowledge, and expertise needed to elevate your Snowflake experience. The book shows you how to leverage what you already know, adds what you don’t, and helps you apply it toward delivering for your Snowflake accounts. Read this book to embark on a voyage of advancement and equip your organization to deliver consistent Snowflake performance. What You Will Learn Recognize and understand the root cause of performance bottlenecks Know how to resolve performance issues Develop a deep understanding of Snowflake performance tuning options Reduce expensive mistakes, remediate poorly performing code Manage Snowflake costs

Kafka Streams in Action, Second Edition

2024-05-24 O'Reilly Amazon

book

Bill Bejeck

data data-engineering streaming-messaging Kafka API Java

Everything you need to implement stream processing on Apache KafkaⓇ using Kafka Streams and the kqsIDB event streaming database. Kafka Streams in Action, Second Edition guides you through setting up and maintaining your streaming processing with Kafka. Inside, you’ll find comprehensive coverage of not only Kafka Streams, but the entire toolbox you’ll need for effective streaming—from the components of the Kafka ecosystem, to Producer and Consumer clients, Connect, and Schema Registry. In Kafka Streams in Action, Second Edition you’ll learn how to: Design streaming applications in Kafka Streams with the KStream and the Processor API Integrate external systems with Kafka Connect Enforce data compatibility with Schema Registry Build applications that respond immediately to events in either Kafka Streams or ksqlDB Craft materialized views over streams with ksqlDB This totally revised new edition of Kafka Streams in Action has been expanded to cover more of the Kafka platform used for building event-based applications. You’ll also find full coverage of ksqlDB, an event streaming database that makes it a snap to create applications that respond immediately to events, such as real-time push and pull updates. About the Technology Enterprise applications need to handle thousands—even millions—of data events every day. With an intuitive API and flawless reliability, the lightweight Kafka Streams library has earned a spot at the center of these systems. Kafka Streams provides exactly the power and simplicity you need to manage real-time event processing or microservices messaging. About the Book Kafka Streams in Action, Second Edition teaches you how to create event streaming applications on the amazing Apache Kafka platform. This thoroughly revised new edition now covers a wider range of streaming architectures and includes data integration with Kafka Connect. As you go, you’ll explore real-world examples that introduce components and brokers, schema management, and the other essentials. Along the way, you’ll pick up practical techniques for blending Kafka with Spring, low-level control of processors and state stores, storing event data with ksqlDB, and testing streaming applications. What's Inside Design efficient streaming applications Integrate external systems with Kafka Connect Enforce data compatibility with Schema Registry About the Reader For Java developers. No knowledge of Kafka or streaming applications required. About the Author Bill Bejeck is a Confluent engineer and a Kafka Streams contributor with over 15 years of software development experience. Bill is also a committer on the Apache KafkaⓇ project. Quotes Comprehensive streaming data applications are only a few years away from becoming the reality, and this book is the guide the industry has been waiting for to move beyond the hype. - Adi Polak, Director, Developer Experience Engineering, Confluent Covers all the key aspects of building applications with Kafka Streams. Whether you are getting started with stream processing or have already built Kafka Streams applications, it is an essential resource. - Mickael Maison, Principal Software Engineer, Red Hat Serves as both a learning and a resource guide, offering a perfect blend of ‘how-to’ and ‘why-to.’ Even if you have been using Kafka Streams for many years, I highly recommend this book. - Neil Buesing, CTO & Co-founder, Kinetic Edge

Azure Data Engineer Associate Certification Guide - Second Edition

2024-05-23 O'Reilly Amazon

book

Newton Alex , Surendra Mettapalli , Giacinto Palmieri

it-operations cloud-computing cloud-platforms microsoft-azure microsoft-azure-certifications az-303-microsoft-azure-architect-technologies

This book is your gateway to mastering the skills required for achieving the Azure Data Engineer Associate certification (DP-203). Whether you're new to the field or a seasoned professional, it comprehensively prepares you for the challenges of the exam. Learn to design and implement advanced data solutions, secure sensitive information, and optimize data processes effectively. What this Book will help me do Understand and utilize Azure's data services such as Azure Synapse and Azure Databricks for data processing. Master advanced data storage and management solutions, including designing partitions and lake architectures. Learn to secure data with state-of-the-art tools like RBAC, encryption, and Azure Purview. Develop and manage data pipelines and workflows using tools like Azure Data Factory (ADF) and Spark. Prepare for and confidently pass the DP-203 certification exam with the included practical resources and guidance. Author(s) The authors, None Palmieri, Surendra Mettapalli, and None Alex, bring a wealth of expertise in cloud and data engineering. With extensive industry experience, they've designed this guide to be both educational and practical, enabling learners to not only understand but also apply concepts in real-world scenarios. Their goal is to make complex topics approachable, supporting your journey to certification success. Who is it for? This guide is perfect for aspiring and current data engineers aiming to achieve the Azure Data Engineer Associate certification (DP-203). It's particularly useful for professionals familiar with cloud services and basic data engineering concepts who want to delve deeper into Azure's offerings. Additionally, managers and learners preparing for roles involving Azure cloud data solutions will find the content invaluable for career advancement.

Concept Of Database Management System by Pearson

2024-05-21 O'Reilly Amazon

book

Shefali Naik

data data-engineering relational-databases Computer Science Oracle SQL

Concepts of Database Management System is designed to meet the syllabi requirements of undergraduate students of computer applications and computer science. It describes the concepts in an easy-to-understand language with sufficient number of examples. The overview of emerging trends in databases is thoroughly explained. A brief introduction to PL/SQL, MS-Access and Oracle is discussed to help students get a flavor of different types of database management systems.

IBM z14 (3906) Technical Guide

2024-05-21 O'Reilly Amazon

book

Octavian Lascu

data data-engineering IBM Analytics Cloud Computing Cyber Security

This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™. IBM z14 is the trusted enterprise platform for pervasive encryption, integrating data, transactions, and insights into the data. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 servers to deliver a record level of capacity over the prior IBM Z platforms. In its maximum configuration, z14 is powered by up to 170 client characterizable microprocessors (cores) running at 5.2 GHz. This configuration can run more than 146,000 million instructions per second (MIPS) and up to 32 TB of client memory. The IBM z14 Model M05 is estimated to provide up to 35% more total system capacity than the IBM z13® Model NE1. This Redbooks publication provides information about IBM z14 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with existing IBM Z technology and terminology.

IBM z14 ZR1 Technical Guide

2024-05-21 O'Reilly Amazon

book

Frank Packheiser , John Troy , Bill White , Octavian Lascu , Hervey Kamga , Martijn Raave

data data-engineering IBM Analytics Cloud Computing Cyber Security

This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™ Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, in an industry standard footprint. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 ZR1 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 ZR1 servers to deliver a record level of capacity over the previous IBM Z platforms. In its maximum configuration, z14 ZR1 is powered by up to 30 client characterizable microprocessors (cores) running at 4.5 GHz. This configuration can run more than 29,000 million instructions per second and up to 8 TB of client memory. The IBM z14 Model ZR1 is estimated to provide up to 54% more total system capacity than the IBM z13s® Model N20. This Redbooks publication provides information about IBM z14 ZR1 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with IBM Z technology and terminology.

IBM z15 (8561) Technical Guide

2024-05-21 O'Reilly Amazon

book

Frank Packheiser , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga , Bo XU

data data-engineering IBM Agile/Scrum Analytics Cloud Computing

This IBM® Redbooks® publication describes the features and functions the latest member of the IBM Z® platform, the IBM z15™ (machine type 8561). It includes information about the IBM z15 processor design, I/O innovations, security features, and supported operating systems. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Making use of multicloud integration services Securing data with pervasive encryption Accelerating digital transformation with agile service delivery Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM z15 (8562) Technical Guide

2024-05-20 O'Reilly Amazon

book

Octavian Lascu

data data-engineering IBM Agile/Scrum Analytics Cloud Computing

This IBM® Redbooks® publication describes the features and functions the latest member of the IBM Z® platform, the IBM z15™ Model T02 (machine type 8562). It includes information about the IBM z15 processor design, I/O innovations, security features, and supported operating systems. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Making use of multicloud integration services Securing data with pervasive encryption Accelerating digital transformation with agile service delivery Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

Databricks ML in Action

2024-05-17 O'Reilly Amazon

book

Amanda Baker , Stephanie Rivera , Hayley Horn , Anastasia Prokaieva

data data-engineering Databricks AI/ML Big Data Data Lakehouse

Dive into the Databricks Data Intelligence Platform and learn how to harness its full potential for creating, deploying, and maintaining machine learning solutions. This book covers everything from setting up your workspace to integrating state-of-the-art tools such as AutoML and VectorSearch, imparting practical skills through detailed examples and code. What this Book will help me do Set up and manage a Databricks workspace tailored for effective data science workflows. Implement monitoring to ensure data quality and detect drift efficiently. Build, fine-tune, and deploy machine learning models seamlessly using Databricks tools. Operationalize AI projects including feature engineering, data pipelines, and workflows on the Databricks Lakehouse architecture. Leverage integrations with popular tools like OpenAI's ChatGPT to expand your AI project capabilities. Author(s) This book is authored by Stephanie Rivera, Anastasia Prokaieva, Amanda Baker, and Hayley Horn, seasoned experts in data science and machine learning from Databricks. Their collective years of expertise in big data and AI technologies ensure a rich and insightful perspective. Through their work, they strive to make complex concepts accessible and actionable. Who is it for? This book serves as an ideal guide for machine learning engineers, data scientists, and technically inclined managers. It's well-suited for those transitioning to the Databricks environment or seeking to deepen their Databricks-based machine learning implementation skills. Whether you're an ambitious beginner or an experienced professional, this book provides clear pathways to success.

Database Management Systems by Pearson

2024-05-16 O'Reilly Amazon

book

Rohit Khurana

data data-engineering relational-databases DWH Cyber Security SQL

Express Learning is a series of books designed as quick reference guides to important undergraduate computer courses. The organized and accessible format of these books allows students to learn important concepts in an easy-to-understand, question-and-answer format. These portable learning tools have been designed as one-stop references for students to understand and master the subjects by themselves.

Features –

• Designed as a student-friendly self-learning guide. The book is written in a clear, concise, and lucid manner. • Easy-to-understand question-and-answer format. • Includes previously asked as well as new questions organized in chapters. • All types of questions including MCQs, short and long questions are covered. • Solutions to numerical questions asked at examinations are provided. • All ideas and concepts are presented with clear examples. • Text is well structured and well supported with suitable diagrams. • Inter-chapter dependencies are kept to a minimum

Book Contents –

1: Database System 2: Conceptual Modelling 3: Relational Model 4: Relational Algebra and Calculus 5: Structured Query Language 6: Relational Database Design 7: Data Storage and Indexing 8: Query Processing and Optimization 9: Introduction to Transaction Processing 10: Concurrency Control Techniques 11: Database Recovery System 12: Database Security 13: Database System Architecture 14: Data Warehousing, OLAP, and Data Mining 15: Information Retrieval 16: Miscellaneous Questions

Express Learning - Data Warehousing and Data Mining, 1st Edition by Pearson

2024-05-16 O'Reilly Amazon

book

ITL Education

data data-engineering storage-repositories data-warehouse DWH

Express Learning is a series of books designed as quick reference guides to important undergraduate courses. The organized and accessible format of these books allows students to learn important concepts in an easy-to-understand, question-and-answer format. These portable learning tools have been designed as one-stop references for students to understand and master the subjects by themselves.

Book Contents –

Chapter 1: Introduction to Data Warehouse Chapter 2: Building a Data Warehouse Chapter 3: Data Warehouse: Architecture Chapter 4: OLAP Technology Chapter 5: Introduction to Data Mining Chapter 6: Data Preprocessing Chapter 7: Mining Association Rules Chapter 8: Classification and Prediction Chapter 9: Cluster Analysis Chapter 10: Advanced Techniques of Data Mining and Its Applications Index

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Practical Lakehouse Architecture

Database Design and Modeling with PostgreSQL and MySQL

Big Data on Kubernetes

Information Modeling and Relational Databases, 3rd Edition

IBM FlashCore Module (FCM) Product Guide: Features the newly available FCM4 with AI-powered ransomware detection

SAP HANA on IBM Power Systems Architectural Summary

Forms and Functions of Meta-Discourse

Data Migration Management for SAP S/4HANA: A Practical Guide

Elastic Stack 8.x Cookbook

Hands-On MySQL Administration

High Performance PostgreSQL for Rails

Databricks Certified Associate Developer for Apache Spark Using Python

Data Engineering with Databricks Cookbook

The Ultimate Guide to Snowpark

Tuning the Snowflake Data Cloud: Optimizing Your Data Platform to Minimize Cost and Maximize Performance

Kafka Streams in Action, Second Edition

Azure Data Engineer Associate Certification Guide - Second Edition

Concept Of Database Management System by Pearson

IBM z14 (3906) Technical Guide

IBM z14 ZR1 Technical Guide

IBM z15 (8561) Technical Guide

IBM z15 (8562) Technical Guide

Databricks ML in Action

Database Management Systems by Pearson

Express Learning - Data Warehousing and Data Mining, 1st Edition by Pearson