O'Reilly Data Engineering Books

IBM SAN Volume Controller Model SV3 Product Guide

2022-06-13 O'Reilly Amazon

book

Carsten Larsten , Shu Mookerjee , Konrad Trojok , Vasfi Gucer , Jon Herd , Hartmut Lonzer , Douwe van Terwisga , Kendall Williams , Corne Lottering

data data-engineering IBM Cloud Computing

This IBM® Redpaper Product Guide describes the IBM SAN Volume Controller model SV3 solution, which is a next-generation IBM SAN Volume Controller. Built with IBM Spectrum® Virtualize software and part of the IBM Spectrum Storage family, IBM SAN Volume Controller is an enterprise-class storage system. It helps organizations achieve better data economics by supporting the large-scale workloads that are critical to success. Data centers often contain a mix of storage systems. This situation can arise as a result of company mergers or as a deliberate acquisition strategy. Regardless of how they arise, mixed configurations add complexity to the data center. Different systems have different data services, which make it difficult to move data from one to another without updating automation. Different user interfaces increase the need for training and can make errors more likely. Different approaches to hybrid cloud complicate modernization strategies. Also, many different systems mean more silos of capacity, which can lead to inefficiency. To simplify the data center and to improve flexibility and efficiency in deploying storage, enterprises of all types and sizes turn to IBM SAN Volume Controller, which is built with IBM Spectrum Virtualize software. This software simplifies infrastructure and eliminates differences in management, function, and even hybrid cloud support. IBM SAN Volume Controller introduces a common approach to storage management, function, replication, and hybrid cloud that is independent of storage type. It is the key to modernizing and revitalizing your storage, but is as easy to understand. IBM SAN Volume Controller provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Data-at-rest encryption Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including three-site replication for high availability (HA)

Elasticsearch 8.x Cookbook - Fifth Edition

2022-05-27 O'Reilly Amazon

book

Alberto Paro

data data-engineering search elasticsearch Analytics Big Data

"Elasticsearch 8.x Cookbook" is your go-to resource for harnessing the full potential of Elasticsearch 8. This book provides over 180 hands-on recipes to help you efficiently implement, customize, and scale Elasticsearch solutions in your enterprise. Whether you're handling complex queries, analytics, or cluster management, you'll find practical insights to enhance your capabilities. What this Book will help me do Understand the advanced features of Elasticsearch 8.x, including X-Pack, for improving functionality and security. Master advanced indexing and query techniques to perform efficient and scalable data operations. Implement and manage Elasticsearch clusters effectively including monitoring performance via Kibana. Integrate Elasticsearch seamlessly into Java, Scala, Python, and big data environments. Develop custom plugins and extend Elasticsearch to meet unique project requirements. Author(s) Alberto Paro is a seasoned Elasticsearch expert with years of experience in search technologies and enterprise solution development. As a professional developer and consultant, he has worked with numerous organizations to implement Elasticsearch at scale. Alberto brings his deep technical knowledge and hands-on approach to this book, ensuring readers gain practical insights and skills. Who is it for? This book is perfect for software engineers, data professionals, and developers working with Elasticsearch in enterprise environments. If you're seeking to advance your Elasticsearch knowledge, enhance your query-writing abilities, or seek to integrate it into big data workflows, this book will be invaluable. Regardless of whether you're deploying Elasticsearch in e-commerce, applications, or for analytics, you'll find the content purposeful and engaging.

IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment

2022-05-27 O'Reilly Amazon

book

Dino Quintero , Prashant Pandey , Diego Riesco , Nilabja Haldar , Edson Gomes Pereira , Vera Cruz , Antony Steel , Douglas Roach , Thomas Baumann , Youssef Largou

data data-engineering IBM ibm-power-systems Cloud Computing Data Management

This IBM® Redpaper publication delivers an updated guide for high availability and disaster recovery (HADR) planning in a multicloud environment for IBM Power. This publication describes the ideas from studies that were performed in a virtual collaborative team of IBM Business Partners, technical focal points, and product managers who used hands-on experience to implement case studies to show HADR management aspects to develop this technical update guide for a hybrid multicloud environment. The goal of this book is to deliver a HADR guide for backup and data management on-premises and in a multicloud environment. This document updates HADR on-premises and in the cloud with IBM PowerHA® SystemMirror®, IBM VM Recovery Manager (VMRM), and other solutions that are available on IBM Power for IBM AIX®, IBM i, and Linux. This publication highlights the available offerings at the time of writing for each operating system (OS) that is supported in IBM Power, including best practices. This book addresses topics for IT architects, IT specialists, sellers, and anyone looking to implement and manage HADR on-premises and in the cloud. Moreover, this publication provides documentation to transfer how-to skills to the technical teams and solution guidance to the sales team. This book complements the documentation that is available at IBM Documentation and aligns with the educational materials that are provided by IBM Systems Technical Training.

Essential Math for Data Science

2022-05-26 O'Reilly Amazon

book

Thomas Nield

data data-science AI/ML Data Science NumPy Python

Master the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics and how they apply to techniques like linear regression, logistic regression, and neural networks. Along the way you'll also gain practical insights into the state of data science and how to use those insights to maximize your career. Learn how to: Use Python code and libraries like SymPy, NumPy, and scikit-learn to explore essential mathematical concepts like calculus, linear algebra, statistics, and machine learning Understand techniques like linear regression, logistic regression, and neural networks in plain English, with minimal mathematical notation and jargon Perform descriptive statistics and hypothesis testing on a dataset to interpret p-values and statistical significance Manipulate vectors and matrices and perform matrix decomposition Integrate and build upon incremental knowledge of calculus, probability, statistics, and linear algebra, and apply it to regression models including neural networks Navigate practically through a data science career and avoid common pitfalls, assumptions, and biases while tuning your skill set to stand out in the job market

SAP Intelligent RPA for Developers

2022-05-20 O'Reilly Amazon

book

Vishwas Madhuvarshi , Vijaya Kumar Ganugula

data data-engineering SAP ERP JavaScript

SAP Intelligent RPA for Developers dives deep into the realm of robotic process automation using SAP Intelligent RPA. It provides a comprehensive guide to leveraging RPA for automating repetitive business processes, ensuring a seamless integrated environment for SAP and non-SAP systems. By the end, you'll be equipped to craft, manage, and optimize automated workflows. What this Book will help me do Master the fundamentals of SAP Intelligent RPA and its architecture. Develop and deploy automation bots to streamline business processes. Utilize low-code and pro-code methodologies effectively in project designs. Debug and troubleshoot RPA solutions to ensure operational efficiency. Understand and plan the migration from SAP Intelligent RPA to SAP Process Automation. Author(s) None Madhuvarshi and None Ganugula are experts in SAP Intelligent RPA with years of experience in ERP systems integration and process automation. Together, they offer a practical and comprehensive approach to mastering and implementing SAP RPA solutions effectively. Who is it for? This book is perfect for developers and business analysts eager to explore SAP Intelligent RPA. It caters to those with a basic knowledge of JavaScript who aspire to leverage RPA for automating monotonous workflows. If you're looking to dive into SAP's automation framework and understand its practical applications, this book is a great fit for you.

IBM Z Functional Matrix

2022-05-19 O'Reilly Amazon

book

Ewerson Palacio , Bill White , Octavian Lascu

data data-engineering IBM Marketing

This IBM® Redpaper™ publication provides a list of features and functions that are supported on IBM Z, including: IBM z16™ - Machine type 3931; IBM z15™ - Machine types 8561 and 8562; IBM z14™ - Machine types 3906 and 3907. On 30 June 2021, the IBM z14 (M/T 3906) was withdrawn from marketing (WDMF). Field-installed features and all associated conversions that are delivered solely through a modification to the machine's Licensed Internal Code (LIC) are still possible until 29 June 2022. This IBM Redpaper publication can help you quickly understand the features, functions, and connectivity alternatives that are available when planning and designing IBM Z infrastructures.

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

2022-05-19 O'Reilly Amazon

book

Dhiraj Kumar , Jessica Tischbierek , Johannes Rank , Elena Wolz , André Bögelsack , Utpal Chakraborty

data data-engineering SAP AWS Azure Cloud Computing

This book helps SAP architects and SAP Basis administrators deploy and operate SAP S/4HANA systems on the most common public cloud platforms. Market-leading cloud offerings are covered, including Amazon Web Services, Microsoft Azure, and Google Cloud. You will gain an end-to-end understanding of the initial implementation of SAP S/4HANA systems on those platforms. You will learn how to move away from the big monolithic SAP ERP systems and arrive at an environment with a central SAP S/4HANA system as the digital core surrounded by cloud-native services. The book begins by introducing the core concepts of Hyperscaler cloud platforms that are relevant to SAP. You will learn about the architecture of SAP S/4HANA systems on public cloud platforms, with specific content provided for each of the major platforms. The book simplifies the deployment of SAP S/4HANA systems in public clouds by providing step-by-step instructions and helping you deal with thecomplexity of such a deployment. Content in the book is based on best practices, industry lessons learned, and architectural blueprints, helping you develop deep insights into the operations of SAP S/4HANA systems on public cloud platforms. Reading this book enables you to build and operate your own SAP S/4HANA system in the public cloud with a minimum of effort. What You Will Learn Choose the right Hyperscaler platform for your future SAP S/4HANA workloads Start deploying your first SAP S/4HANA system in the public cloud Avoid typical pitfalls during your implementation Apply and leverage cloud-native services for your SAP S/4HANA system Save costs by choosing the right architecture and build a robust architecture for your most critical SAP systems Meet your business’ criteria for availability and performance by having the right sizing in place Identify further use cases whenoperating SAP S/4HANA in the public cloud Who This Book Is For SAP architects looking for an answer on how to move SAP S/4HANA systems from on-premises into the cloud; those planning to deploy to one of the three major platforms from Amazon Web Services, Microsoft Azure, and Google Cloud Platform; and SAP Basis administrators seeking a detailed and realistic description of how to get started on a migration to the cloud and how to drive that cloud implementation to completion

Designing Machine Learning Systems

2022-05-17 O'Reilly Amazon

book

Chip Huyen

data ai-ml machine-learning AI/ML

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements. Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references. This book will help you tackle scenarios such as: Engineering data and choosing the right metrics to solve a business problem Automating the process for continually developing, evaluating, deploying, and updating models Developing a monitoring system to quickly detect and address issues your models might encounter in production Architecting an ML platform that serves across use cases Developing responsible ML systems

Advanced SQL with SAS

2022-05-01 O'Reilly Amazon

book

Christian FG Schendera

data data-engineering SQL Cloud Computing Data Quality SAS

This book introduces advanced techniques for using PROC SQL in SAS. If you are a SAS programmer, analyst, or student who has mastered the basics of working with SQL, Advanced SQL with SAS® will help take your skills to the next level. Filled with practical examples with detailed explanations, this book demonstrates how to improve performance and speed for large data sets. Although the book addresses advanced topics, it is designed to progress from the simple and manageable to the complex and sophisticated. In addition to numerous tuning techniques, this book also touches on implicit and explicit pass-throughs, presents alternative SAS grid- and cloud-based processing environments, and compares SAS programming languages and approaches including FedSQL, CAS, DS2, and hash programming. Other topics include: Missing values and data quality with audit trails “Blind spots” like how missing values can affect even the simplest calculations and table joins SAS macro language and SAS macro programs SAS functions Integrity constraints SAS Dictionaries SAS Compute Server

Python for ArcGIS Pro

2022-04-29 O'Reilly Amazon

book

William Parker , Silas Toms

data data-engineering location-data geographic-information-system-gis arcgis API

Python for ArcGIS Pro is your guide to automating geospatial tasks and maximizing your productivity using Python. Inside, you'll learn how to integrate Python scripting into ArcGIS workflows to streamline map production, data analysis, and data management. What this Book will help me do Automate map production and streamline repetitive cartography tasks. Conduct geospatial data analysis using Python libraries like pandas and NumPy. Integrate ArcPy and ArcGIS API for Python to manage geospatial data more effectively. Create script tools to improve repeatability and manage datasets. Publish and manage geospatial data to ArcGIS Online seamlessly. Author(s) None Toms and None Parker are both experienced GIS professionals and Python developers. With years of hands-on experience using Esri technology in real-world scenarios, they bring practical insights into the application's nuances. Their collaborative approach allows them to demystify technical concepts, making their teachings accessible to audiences of all skill levels. Who is it for? This book is for ArcGIS users looking to integrate Python into workflows, whether you're a GIS specialist, technician, or analyst. It's also suitable for those transitioning to roles requiring programming skills. A basic understanding of ArcGIS helps, but the book starts from the fundamentals.

The MySQL Workshop

2022-04-29 O'Reilly Amazon

book

Thomas Pettit , Scott Cosentino , Dr. Vlad Sebastian Ionescu

data data-engineering relational-databases MySQL Data Management RDBMS

The MySQL Workshop is your comprehensive, hands-on guide to learning and mastering MySQL database management. This book covers everything from setting up a database to working with SQL queries, managing data, and securing your databases. With practical exercises and real-world scenarios, you'll quickly gain the confidence and skills to handle MySQL databases effectively. What this Book will help me do Understand and implement the core concepts of relational databases. Write, execute, and optimize SQL queries for data management. Connect MySQL databases to applications like MS Access and Excel. Secure databases by managing user roles and permissions effectively. Perform database backups and restores to maintain data integrity. Author(s) Thomas Pettit and Scott Cosentino are experienced professionals in database management and MySQL technologies. With years of industry experience, they bring a wealth of knowledge to their writing. They focus on breaking down complex topics into digestible lessons, ensuring practical learning outcomes. Who is it for? This book is ideal for tech professionals and students looking to learn MySQL. Beginners will find a gentle introduction, while those with some SQL background will deepen their understanding and cover gaps in knowledge. It suits professionals dealing with data who want actionable MySQL skills for work and projects.

IBM z16 Technical Introduction

2022-04-28 O'Reilly Amazon

book

Gerard Laumay , Roman Vogt , Ewerson Palacio , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Bill White , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga , Martijn Raave , Andre Spahni , Bo XU , Makus Ertl , Slav Martinksi

data data-engineering IBM Analytics Cloud Computing Cyber Security

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform that is built with the IBM Telum processor: the IBM z16 server. The IBM Z platform is recognized for its security, resiliency, performance, and scale. It is relied on for mission-critical workloads and as an essential element of hybrid cloud infrastructures. The IBM z16 server adds capabilities and value with innovative technologies that are needed to accelerate the digital transformation journey. This book explains how the IBM z16 server uses innovations and traditional IBM Z strengths to satisfy the growing demand for cloud, analytics, and a more flexible infrastructure. With the IBM z16 servers as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

SAP S/4HANA Conversion: A Guide to Executing and Simplifying Your Conversion

2022-04-27 O'Reilly Amazon

book

Ravi Surya Subrahmanyam

data data-engineering SAP ERP

Succeed in your conversion to SAP S/4HANA. This book will help you understand the core aspects and implement a conversion project. You will start with an overview of the SAP S/4HANA conversion tools: Readiness Check, Simplification Item Check report, Maintenance Planner, Custom Code Analysis, SUM (Software Update Manager), and more. You will understand the preparation activities for SAP FI (Finance), SAP CO (Controlling), SAP AA (Asset Accounting), Material Ledger, and COPA (Controlling–Profitability Analysis). And you will find the SAP CVI (Customer/Vendor Integration) steps that can help consultants understand the mandatory activities to be completed as a part of preparation on the SAP ECC (ERP Central Component) system. You will learn the preparation activities for conversion of accounting to SAP S/4HANA, and migration activities: customizing, asset accounting, controlling, and house bank accounts. You will gain knowledge on data migration activities such as the migration of cost elements, technical check of transactional data, material ledger migration enrichment of data, migration of line items, balances, and general ledger allocations to journal entry tables. After reading this book, you will know how to use the Migration Cockpit for data migration and post-conversion activities to successfully execute and implement an SAP S/4 HANA conversion. What You Will Learn Choose an ideal path and planning tools for SAP S/4HANA Start with the preparation step: General Ledger Accounting, Asset Accounting, Controlling, Material Ledger, and so on Use Migration Cockpit for conversion preparation, migration, and post-migration activities Who This Book Is For SAP application consultants, finance consultants, and CVI consultants who need help with SAP S/4HANA conversion

Early Threat Detection and Safeguarding Data with IBM QRadar and IBM Copy Services Manager on IBM DS8000

2022-04-21 O'Reilly Amazon

book

IBM

data data-engineering IBM

The focus of this blueprint is to highlight early threat detection by IBM® QRadar® and to proactively start a cyber resilience workflow in response to a cyberattack or malicious user actions. The workflow uses IBM Copy Services Manager (CSM) as orchestration software to start IBM DS8000® Safeguarded Copy functions. The Safeguarded Copy creates an immutable copy of the data in an air-gapped form on the same DS8000 system for isolation and eventual quick recovery. This document also explains the steps that are involved to enable and forward IBM DS8000 audit logs to IBM QRadar. It also discusses how to use create various rules to determine a threat, and configure and start a suitable response to the detected threat in IBM QRadar. Finally, this document explains how to register a storage system and create a Scheduled Task by using CSM.

IBM Power Systems S922, S914, and S924 Technical Overview and Introduction Featuring PCIe Gen 4 Technology

2022-04-18 O'Reilly Amazon

book

Scott Vetter , Mauro Minomizaki , Bartlomiej Grabowski , Armin Röll

data data-engineering IBM Linux Marketing

This IBM® Redpaper publication is a comprehensive guide that covers the IBM Power System S914 (9009-41G), IBM Power System S922 (9009-22G), and IBM Power System S924 (9009-42G) servers that use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems (OSs). The goal of this paper is to provide a hardware architecture analysis and highlight the changes, new technologies, and major features that are being introduced in these systems, such as: The latest IBM POWER9 processor, which is available in various configurations for the number of cores per socket More performance by using industry-leading Peripheral Component Interconnect Express (PCIe) Gen 4 slots Enhanced internal disk scalability and performance with up to 11 NVMe adapters Introduction of a competitive Power S922 server with a 1-socket configuration that is targeted at IBM i customers This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power S914, Power S922, and Power S924 systems. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

IBM GDPS: An Introduction to Concepts and Capabilities

2022-04-13 O'Reilly Amazon

book

Lydia Parziale

data data-engineering IBM

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues that are related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

CockroachDB: The Definitive Guide

2022-04-11 O'Reilly Amazon

book

Jesse Seldess , Ben Darnell , Guy Harrison

data data-engineering relational-databases cockroachdb Cloud Computing Data Modelling

Get the lowdown on CockroachDB, the distributed SQL database built to handle the demands of today's data-driven cloud applications. In this hands-on guide, software developers, architects, and DevOps/SRE teams will learn how to use CockroachDB to create applications that scale elastically and provide seamless delivery for end users while remaining indestructible. Teams will also learn how to migrate existing applications to CockroachDB's performant, cloud native data architecture. If you're familiar with distributed systems, you'll quickly discover the benefits of strong data correctness and consistency guarantees as well as optimizations for delivering ultra low latencies to globally distributed end users. You'll learn how to: Design and build applications for distributed infrastructure, including data modeling and schema design Migrate data into CockroachDB Read and write data and run ACID transactions across distributed infrastructure Plan a CockroachDB deployment for resiliency across single region and multi-region clusters Secure, monitor, and optimize your CockroachDB deployment

Data Algorithms with Spark

2022-04-11 O'Reilly Amazon

book

Mahmoud Parsian

data data-engineering apache-spark AI/ML Analytics API

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark. In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script. With this book, you will: Learn how to select Spark transformations for optimized solutions Explore powerful transformations and reductions including reduceByKey(), combineByKey(), and mapPartitions() Understand data partitioning for optimized queries Build and apply a model using PySpark design patterns Apply motif-finding algorithms to graph data Analyze graph data by using the GraphFrames API Apply PySpark algorithms to clinical and genomics data Learn how to use and apply feature engineering in ML algorithms Understand and use practical and pragmatic data design patterns

Logging in Action

2022-04-10 O'Reilly Amazon

book

Phil Wilkins

data data-engineering search elasticsearch elastic-stack-elk-stack elastic stack (elk stack)

Make log processing a real asset to your organization with powerful and free open source tools. In Logging in Action you will learn how to: Deploy Fluentd and Fluent Bit into traditional on-premises, IoT, hybrid, cloud, and multi-cloud environments, both small and hyperscaled Configure Fluentd and Fluent Bit to solve common log management problems Use Fluentd within Kubernetes and Docker services Connect a custom log source or destination with Fluentd’s extensible plugin framework Logging best practices and common pitfalls Logging in Action is a guide to optimize and organize logging using the CNCF Fluentd and Fluent Bit projects. You’ll use the powerful log management tool Fluentd to solve common log management, and learn how proper log management can improve performance and make management of software and infrastructure solutions easier. Through useful examples like sending log-driven events to Slack, you’ll get hands-on experience applying structure to your unstructured data. About the Technology Don’t fly blind! An effective logging system can help you see and correct problems before they cripple your software. With the Fluentd log management tool, it’s a snap to monitor the behavior and health of your software and infrastructure in real time. Designed to collect and process log data from multiple sources using the industry-standard JSON format, Fluentd delivers a truly unified logging layer across all your systems. About the Book Logging in Action teaches you to record and analyze application and infrastructure data using Fluentd. Using clear, relevant examples, it shows you exactly how to transform raw system data into a unified stream of actionable information. You’ll discover how logging configuration impacts the way your system functions and set up Fluentd to handle data from legacy IT environments, local data centers, and massive Kubernetes-driven distributed systems. You’ll even learn how to implement complex log parsing with RegEx and output events to MongoDB and Slack. What's Inside Capture log events from a wide range of systems and software, including Kubernetes and Docker Connect to custom log sources and destinations Employ Fluentd’s extensible plugin framework Create a custom plugin for niche problems About the Reader For developers, architects, and operations professionals familiar with the basics of monitoring and logging. About the Author Phil Wilkins has spent over 30 years in the software industry. Has worked for small startups through to international brands. Quotes I highly recommend using Logging in Action as a getting-started guide, a refresher, or as a way to optimize your logging journey. - From the Foreword by Anurag Gupta, Fluent maintainer and Cofounder, Calyptia Covers everything you need if you want to implement a logging system using open source technology such as Fluentd and Kubernetes. - Alex Saez, Naranja X A great exploration of the features and capabilities of Fluentd, along with very useful hands-on exercises. - George Thomas, Manhattan Associates A practical holistic guide to integrating logging into your enterprise architecture. - Satej Sahu, Honeywell

PostgreSQL 14 Administration Cookbook

2022-03-31 O'Reilly Amazon

book

Simon Riggs , Gianni Ciolli

data data-engineering relational-databases postgresql Cloud Computing Cyber Security

PostgreSQL 14 Administration Cookbook provides a hands-on guide to mastering the administration of PostgreSQL 14. With over 175 recipes, this book equips you with practical techniques to manage, secure, and optimize your PostgreSQL databases, ensuring they are robust and high-performing. What this Book will help me do Master managing PostgreSQL databases both on-premises and in the cloud efficiently. Implement effective backup and recovery strategies to secure your data. Leverage the latest features of PostgreSQL 14 to enhance your database workflows. Understand and apply best practices for maintaining high availability and performance. Troubleshoot real-world challenges with guided solutions and expert insights. Author(s) Simon Riggs and Gianni Ciolli are seasoned database experts with years of experience working with PostgreSQL. Simon is a PostgreSQL core team member, contributing his technical knowledge towards building robust database solutions, while Gianni brings a wealth of expertise in database administration and support. Together, they share a passion for making complex database concepts accessible and actionable. Who is it for? This book is for database administrators, data architects, and developers who manage PostgreSQL databases and are looking to deepen their knowledge. It is suitable for professionals with some experience in PostgreSQL who aim to maximize their database's performance and security, as well as for those new to the system seeking a comprehensive start. Readers with an interest in practical, problem-solving approaches to database management will greatly benefit from this cookbook.

IBM Power Systems Virtual Server Guide for IBM i

2022-03-30 O'Reilly Amazon

book

Dino Quintero , Sanjeev Chhabra , Sergio Leyva , Ahmad Y Hussein , Marcelos Avalos , Luis Eduardo Silva Viera , Gabriel Padilla Jimenez , Diego Kesselman , Bogdan Savu , Adriano Almeida , Luis Ferreira , Travis Siegfried , Michael Easlon , Jose Martin Abeleira , Deepak C Shetty

data data-engineering IBM ibm-power-systems Data Management

This IBM® Redbooks® publication delivers a how-to usage content perspective that describes deployment, networking, and data management tasks on the IBM Power Systems Virtual Server by using sample scenarios. During the content development, the team used available documentation, IBM Power Systems Virtual Server environment, and other software and hardware resources to document the following information: IBM Power Systems Virtual Server networking and data management deployment scenarios Migrations use case scenarios Backups case scenarios Disaster recovery case scenarios This book addresses topics for IT architects, IT specialists, developers, sellers, and anyone who wants to implement and manage workloads in the IBM Power Systems Virtual Server. This publication also describes transferring the how-to-skills to the technical teams, and solution guidance to the sales team. This book compliments the documentation that available at the IBM Documentation web page and aligns with the educational materials that are provided by IBM Garage for Systems Technical Education.

Grokking Streaming Systems

2022-03-27 O'Reilly Amazon

book

Ning Wang , Josh Fischer

data data-engineering streaming-messaging streaming-architecture IoT Java

A friendly, framework-agnostic tutorial that will help you grok how streaming systems work—and how to build your own! In Grokking Streaming Systems you will learn how to: Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Assess parallelization requirements Spot networking bottlenecks and resolve back pressure Group data for high-performance systems Handle delayed events in real-time systems Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that’s a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities! About the Technology Streaming systems minimize the time between receiving and processing event data, so they can deliver responses in real time. For applications in finance, security, and IoT where milliseconds matter, streaming systems are a requirement. And streaming is hot! Skills on platforms like Spark, Heron, and Kafka are in high demand. About the Book Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you’ll build your own simple streaming tool from the ground up to make sure all the ideas and techniques stick. The helpful and entertaining illustrations make streaming systems come alive as you tackle relevant examples like real-time credit card fraud detection and monitoring IoT services. What's Inside Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Spot networking bottlenecks and resolve backpressure Group data for high-performance systems About the Reader No prior experience with streaming systems is assumed. Examples in Java. About the Authors Josh Fischer and Ning Wang are Apache Committers, and part of the committee for the Apache Heron distributed stream processing engine. Quotes Very well-written and enjoyable. I recommend this book to all software engineers working on data processing. - Apoorv Gupta, Facebook Finally, a much-needed introduction to streaming systems—a must-read for anyone interested in this technology. - Anupam Sengupta, Red Hat Tackles complex topics in a very approachable manner. - Marc Roulleau, GIRO A superb resource for helping you grasp the fundamentals of open-source streaming systems. - Simon Verhoeven, Cronos Explains all the main streaming concepts in a friendly way. Start with this one! - Cicero Zandona, Calypso Technologies

IBM FlashSystem Safeguarded Copy Implementation Guide

2022-03-25 O'Reilly Amazon

book

Vasfi Gucer , Hemanand Gadgil , Andrew Greenfield , Jackson Shea

data data-engineering IBM

Safeguarded Copy function that is available with IBM® Spectrum Virtualize Version 8.4.2 supports the ability to create cyber-resilient point-in-time copies of volumes that cannot be changed or deleted through user errors, malicious actions, or ransomware attacks. The system integrates with IBM Copy Services Manager to provide automated backup copies and data recovery. This IBM Redpaper® publication introduces the features and functions of Safeguarded Copy function by using several examples. This document is aimed at pre-sales and post-sales technical support specialists and storage administrators.

Simplify Big Data Analytics with Amazon EMR

2022-03-25 O'Reilly Amazon

book

Sakti Mishra

data data-engineering apache-spark Analytics AWS Amazon EMR

Simplify Big Data Analytics with Amazon EMR is a thorough guide to harnessing Amazon's EMR service for big data processing and analytics. From distributed computation pipelines to real-time streaming analytics, this book provides hands-on knowledge and actionable steps for implementing data solutions efficiently. What this Book will help me do Understand the architecture and key components of Amazon EMR and how to deploy it effectively. Learn to configure and manage distributed data processing pipelines using Amazon EMR. Implement security and data governance best practices within the Amazon EMR ecosystem. Master batch ETL and real-time analytics techniques using technologies like Apache Spark. Apply optimization and cost-saving strategies to scalable data solutions. Author(s) Sakti Mishra is a seasoned data professional with extensive expertise in deploying scalable analytics solutions on cloud platforms like AWS. With a background in big data technologies and a passion for teaching, Sakti ensures practical insights accompany every concept. Readers will find his approach thorough, hands-on, and highly informative. Who is it for? This book is perfect for data engineers, data scientists, and other professionals looking to leverage Amazon EMR for scalable analytics. If you are familiar with Python, Scala, or Java and have some exposure to Hadoop or AWS ecosystems, this book will empower you to design and implement robust data pipelines efficiently.

Data Analytics, Computational Statistics, and Operations Research for Engineers

2022-03-24 O'Reilly Amazon

book

Mohammad Hammoudeh , Naveen Chilamkurti , Debabrata Samanta , SK Hafizul Islam

data data-science data-science-tasks statistics AI/ML Analytics

This book investigates the role of data mining in computational statistics for machine learning. It offers applications that can be used in various domains and examines the role of transformation functions in optimizing problem statements.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

IBM SAN Volume Controller Model SV3 Product Guide

Elasticsearch 8.x Cookbook - Fifth Edition

IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment

Essential Math for Data Science

SAP Intelligent RPA for Developers

IBM Z Functional Matrix

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

Designing Machine Learning Systems

Advanced SQL with SAS

Python for ArcGIS Pro

The MySQL Workshop

IBM z16 Technical Introduction

SAP S/4HANA Conversion: A Guide to Executing and Simplifying Your Conversion

Early Threat Detection and Safeguarding Data with IBM QRadar and IBM Copy Services Manager on IBM DS8000

IBM Power Systems S922, S914, and S924 Technical Overview and Introduction Featuring PCIe Gen 4 Technology

IBM GDPS: An Introduction to Concepts and Capabilities

CockroachDB: The Definitive Guide

Data Algorithms with Spark

Logging in Action

PostgreSQL 14 Administration Cookbook

IBM Power Systems Virtual Server Guide for IBM i

Grokking Streaming Systems

IBM FlashSystem Safeguarded Copy Implementation Guide

Simplify Big Data Analytics with Amazon EMR

Data Analytics, Computational Statistics, and Operations Research for Engineers