O'Reilly Data Engineering Books

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

2022-07-13 O'Reilly Amazon

book

Ron L'Esteve

data data-engineering storage-repositories data-lake AI/ML Analytics

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs. After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform. What You Will Learn Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform Benefit from the new Delta Lake open-source storage layer for data lakehouses Take advantage of schema evolution, change feeds, live tables, and more Writefunctional PySpark code for data lakehouse ELT jobs Optimize Apache Spark performance through partitioning, indexing, and other tuning options Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake Who This Book Is For Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform.

Tidy Modeling with R

2022-07-12 O'Reilly Amazon

book

Julia Silge , Max Kuhn

data data-engineering data-models AI/ML

Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you're just starting out or have years of experience with modeling, this practical introduction shows data analysts, business analysts, and data scientists how the tidymodels framework offers a consistent, flexible approach for your work. RStudio engineers Max Kuhn and Julia Silge demonstrate ways to create models by focusing on an R dialect called the tidyverse. Software that adopts tidyverse principles shares both a high-level design philosophy and low-level grammar and data structures, so learning one piece of the ecosystem makes it easier to learn the next. You'll understand why the tidymodels framework has been built to be used by a broad range of people. With this book, you will: Learn the steps necessary to build a model from beginning to end Understand how to use different modeling and feature engineering approaches fluently Examine the options for avoiding common pitfalls of modeling, such as overfitting Learn practical methods to prepare your data for modeling Tune models for optimal performance Use good statistical practices to compare, evaluate, and choose among models

IBM TS7700 Release 5.2.2 Guide

2022-07-07 O'Reilly Amazon

book

Aderson Pacini , Yuki Asakura , Ole Asmussen , Nao Takemura , Lourie Goodall , Alberto Barajas Ortiz , Nielson ’Nino’ de Carvalho , Monica Falcone , Chen Zhu , Larry Coyne , Erich Moraga , Taisei Takai , Tomoaki Ogino , Michael Scott , Kousei Kawamura , Derek Erdmann , Trinidad Armando Rangel Ruiz , Nobuhiko Furuya , Joe Hew , Rin Fujiwara , Joe Swingler , Stefan Neff , Tony Makepeace , Takahiro Tsuda

data data-engineering IBM Cloud Computing Cloud Storage S3

This IBM® Redbooks® publication covers IBM TS7700 R5.2. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on 25 years of experience, the R5.2 release includes many features that enable improved performance, usability, and security. Highlights include IBM TS7700 Advanced Object Store, an all flash TS7770, grid resiliency enhancements, and Logical WORM retention. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud® Object Storage, Amazon S3, and RSTOR. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.2. The R5.2 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000® Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.2 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. Note: The latest Release 5.2 was split into two phases: R5.2 Phase 1 (also referred to as and ) R5.2 Phase 2 ( and R) TS7700 provides tape virtualization for the IBM z environment. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. Tape virtualization can help satisfy the following requirements in a data processing environment. New and existing capabilities of the TS7700 5.2.2 release includes the following highlights: Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication of virtual tape and TCT objects Grid access to all logical volume and object data that is independent of where it exists An all-flash TS7770 option for improved performance Full Advanced Object Store Grid Cloud support of DS8000 Transparent Cloud Tier Full AES256 encryption for data that is in-flight and at-rest Tight integration with IBM Z® and DFSMS policy management DS8000 Object Store AES256 in-flight encryption and compression Regulatory compliance through Logical WORM and LWORM Retention support Cloud Storage Tier support for archive, logical volume version, and disaster recovery Optional integration with physical tape 16 Gb IBM FICON® throughput that exceeds 5 GBps per TS7700 cluster Grid Resiliency Support with Control Unit Initiated Reconfiguration (CUIR) support IBM Z hosts view up to 3,968 common devices per TS7700 grid TS7770 Cache On-demand feature that is based capacity licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1160, IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM POWER9™ technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM DS8900F and IBM Z Synergy DS8900F: Release 9.3 and z/OS 2.5

2022-07-04 O'Reilly Amazon

book

Jörg Klemm , Peterl Kimmel

data data-engineering IBM

IBM Z® has a close and unique relationship to its storage. Over the years, improvements to the IBM zSystems® processors and storage software, the disk storage systems, and their communication architecture consistently reinforced this synergy. This IBM® Redpaper™ publication summarizes and highlights the various aspects, advanced functions, and technologies that are often pioneered by IBM and make IBM Z and IBM DS8000® products an ideal combination. This paper is intended for users who have some familiarity with IBM Z and the IBM DS8000 series and want a condensed but comprehensive overview of the synergy items up to the IBM z16 server with IBM z/OS® V2.5 and the IBM DS8900 Release 9.3 firmware.

Data Engineering with Alteryx

2022-06-30 O'Reilly Amazon

book

Paul Houghton

data data-engineering AI/ML Alteryx Analytics Data Engineering

Dive into 'Data Engineering with Alteryx' to master the principles of DataOps while learning to build robust data pipelines using Alteryx. This book guides you through key practices to enhance data pipeline reliability, efficiency, and accessibility, making it an essential resource for modern data professionals. What this Book will help me do Understand and implement DataOps practices within Alteryx workflows. Design and develop data pipelines with Alteryx Designer for efficient data processing. Learn to manage and publish pipelines using Alteryx Server and Alteryx Connect. Gain advanced skills in Alteryx for handling spatial analytics and machine learning. Master techniques to monitor, secure, and optimize data workflows and access. Author(s) Paul Houghton is an experienced data engineer and author specializing in data engineering and DataOps. With extensive experience using Alteryx tools and workflows, Paul has a passion for teaching and sharing his knowledge through clear and practical guidance. His hands-on approach ensures readers successfully navigate and apply technical concepts to real-world projects. Who is it for? This book is ideal for data engineers, data scientists, and data analysts aiming to build reliable data pipelines with Alteryx. You do not need prior experience with Alteryx, but familiarity with data workflows will enhance your learning experience. If you're focused on aligning with DataOps methodologies, this book is tailored for you.

Ten Things to Know About ModelOps

2022-06-25 O'Reilly Amazon

book

Larry Derany , Thomas Hill , Mark Palmer

data data-engineering data-models AI/ML Analytics Data Science

The past few years have seen significant developments in data science, AI, machine learning, and advanced analytics. But the wider adoption of these technologies has also brought greater cost, risk, regulation, and demands on organizational processes, tasks, and teams. This report explains how ModelOps can provide both technical and operational solutions to these problems. Thomas Hill, Mark Palmer, and Larry Derany summarize important considerations, caveats, choices, and best practices to help you be successful with operationalizing AI/ML and analytics in general. Whether your organization is already working with teams on AI and ML, or just getting started, this report presents ten important dimensions of analytic practice and ModelOps that are not widely discussed, or perhaps even known. In part, this report examines: Why ModelOps is the enterprise "operating system" for AI/ML algorithms How to build your organization's IP secret sauce through repeatable processing steps How to anticipate risks rather than react to damage done How ModelOps can help you deliver the many algorithms and model formats available How to plan for success and monitor for value, not just accuracy Why AI will be soon be regulated and how ModelOps helps ensure compliance

In-Memory Analytics with Apache Arrow

2022-06-24 O'Reilly Amazon

book

Matthew Topol

data data-engineering apache-arrow Analytics API Arrow

Discover the power of in-memory data analytics with "In-Memory Analytics with Apache Arrow." This book delves into Apache Arrow's unique capabilities, enabling you to handle vast amounts of data efficiently and effectively. Learn how Arrow improves performance, offers seamless integration, and simplifies data analysis in diverse computing environments. What this Book will help me do Gain proficiency with the datastore facilities and data types defined by Apache Arrow. Master the Arrow Flight APIs to efficiently transfer data between systems. Learn to leverage in-memory processing advantages offered by Arrow for state-of-the-art analytics. Understand how Arrow interoperates with popular tools like Pandas, Parquet, and Spark. Develop and deploy high-performance data analysis pipelines with Apache Arrow. Author(s) Matthew Topol, the author of the book, is an experienced practitioner in data analytics and Apache Arrow technology. Having contributed to the development and implementation of Arrow-powered systems, he brings a wealth of knowledge to readers. His ability to delve deep into technical concepts while keeping explanations practical makes this book an excellent guide for learners of the subject. Who is it for? This book is ideal for professionals in the data domain including developers, data analysts, and data scientists aiming to enhance their data manipulation capabilities. Beginners with some familiarity with data analysis concepts will find it beneficial, as well as engineers designing analytics utilities. Programming examples accommodate users of C, Go, and Python, making it broadly accessible.

Fundamentals of Data Engineering

2022-06-22 O'Reilly Amazon

book

Matt Housley , Joe Reis

data data-engineering Cloud Computing Data Engineering Data Governance Marketing

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle

Advanced Analytics with PySpark

2022-06-15 O'Reilly Amazon

book

Josh Wills , Sandy Ryza , Sean Owen , Akash Tandon , Uri Laserson

data data-engineering apache-spark PySpark AI/ML Analytics

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming. Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing. If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis. Familiarize yourself with Spark's programming model and ecosystem Learn general approaches in data science Examine complete implementations that analyze large public datasets Discover which machine learning tools make sense for particular problems Explore code that can be adapted to many uses

IBM SAN Volume Controller Model SV3 Product Guide

2022-06-13 O'Reilly Amazon

book

Carsten Larsten , Shu Mookerjee , Konrad Trojok , Vasfi Gucer , Jon Herd , Hartmut Lonzer , Douwe van Terwisga , Kendall Williams , Corne Lottering

data data-engineering IBM Cloud Computing

This IBM® Redpaper Product Guide describes the IBM SAN Volume Controller model SV3 solution, which is a next-generation IBM SAN Volume Controller. Built with IBM Spectrum® Virtualize software and part of the IBM Spectrum Storage family, IBM SAN Volume Controller is an enterprise-class storage system. It helps organizations achieve better data economics by supporting the large-scale workloads that are critical to success. Data centers often contain a mix of storage systems. This situation can arise as a result of company mergers or as a deliberate acquisition strategy. Regardless of how they arise, mixed configurations add complexity to the data center. Different systems have different data services, which make it difficult to move data from one to another without updating automation. Different user interfaces increase the need for training and can make errors more likely. Different approaches to hybrid cloud complicate modernization strategies. Also, many different systems mean more silos of capacity, which can lead to inefficiency. To simplify the data center and to improve flexibility and efficiency in deploying storage, enterprises of all types and sizes turn to IBM SAN Volume Controller, which is built with IBM Spectrum Virtualize software. This software simplifies infrastructure and eliminates differences in management, function, and even hybrid cloud support. IBM SAN Volume Controller introduces a common approach to storage management, function, replication, and hybrid cloud that is independent of storage type. It is the key to modernizing and revitalizing your storage, but is as easy to understand. IBM SAN Volume Controller provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Data-at-rest encryption Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including three-site replication for high availability (HA)

Elasticsearch 8.x Cookbook - Fifth Edition

2022-05-27 O'Reilly Amazon

book

Alberto Paro

data data-engineering search elasticsearch Analytics Big Data

"Elasticsearch 8.x Cookbook" is your go-to resource for harnessing the full potential of Elasticsearch 8. This book provides over 180 hands-on recipes to help you efficiently implement, customize, and scale Elasticsearch solutions in your enterprise. Whether you're handling complex queries, analytics, or cluster management, you'll find practical insights to enhance your capabilities. What this Book will help me do Understand the advanced features of Elasticsearch 8.x, including X-Pack, for improving functionality and security. Master advanced indexing and query techniques to perform efficient and scalable data operations. Implement and manage Elasticsearch clusters effectively including monitoring performance via Kibana. Integrate Elasticsearch seamlessly into Java, Scala, Python, and big data environments. Develop custom plugins and extend Elasticsearch to meet unique project requirements. Author(s) Alberto Paro is a seasoned Elasticsearch expert with years of experience in search technologies and enterprise solution development. As a professional developer and consultant, he has worked with numerous organizations to implement Elasticsearch at scale. Alberto brings his deep technical knowledge and hands-on approach to this book, ensuring readers gain practical insights and skills. Who is it for? This book is perfect for software engineers, data professionals, and developers working with Elasticsearch in enterprise environments. If you're seeking to advance your Elasticsearch knowledge, enhance your query-writing abilities, or seek to integrate it into big data workflows, this book will be invaluable. Regardless of whether you're deploying Elasticsearch in e-commerce, applications, or for analytics, you'll find the content purposeful and engaging.

IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment

2022-05-27 O'Reilly Amazon

book

Dino Quintero , Prashant Pandey , Diego Riesco , Nilabja Haldar , Edson Gomes Pereira , Vera Cruz , Antony Steel , Douglas Roach , Thomas Baumann , Youssef Largou

data data-engineering IBM ibm-power-systems Cloud Computing Data Management

This IBM® Redpaper publication delivers an updated guide for high availability and disaster recovery (HADR) planning in a multicloud environment for IBM Power. This publication describes the ideas from studies that were performed in a virtual collaborative team of IBM Business Partners, technical focal points, and product managers who used hands-on experience to implement case studies to show HADR management aspects to develop this technical update guide for a hybrid multicloud environment. The goal of this book is to deliver a HADR guide for backup and data management on-premises and in a multicloud environment. This document updates HADR on-premises and in the cloud with IBM PowerHA® SystemMirror®, IBM VM Recovery Manager (VMRM), and other solutions that are available on IBM Power for IBM AIX®, IBM i, and Linux. This publication highlights the available offerings at the time of writing for each operating system (OS) that is supported in IBM Power, including best practices. This book addresses topics for IT architects, IT specialists, sellers, and anyone looking to implement and manage HADR on-premises and in the cloud. Moreover, this publication provides documentation to transfer how-to skills to the technical teams and solution guidance to the sales team. This book complements the documentation that is available at IBM Documentation and aligns with the educational materials that are provided by IBM Systems Technical Training.

SAP Intelligent RPA for Developers

2022-05-20 O'Reilly Amazon

book

Vishwas Madhuvarshi , Vijaya Kumar Ganugula

data data-engineering SAP ERP JavaScript

SAP Intelligent RPA for Developers dives deep into the realm of robotic process automation using SAP Intelligent RPA. It provides a comprehensive guide to leveraging RPA for automating repetitive business processes, ensuring a seamless integrated environment for SAP and non-SAP systems. By the end, you'll be equipped to craft, manage, and optimize automated workflows. What this Book will help me do Master the fundamentals of SAP Intelligent RPA and its architecture. Develop and deploy automation bots to streamline business processes. Utilize low-code and pro-code methodologies effectively in project designs. Debug and troubleshoot RPA solutions to ensure operational efficiency. Understand and plan the migration from SAP Intelligent RPA to SAP Process Automation. Author(s) None Madhuvarshi and None Ganugula are experts in SAP Intelligent RPA with years of experience in ERP systems integration and process automation. Together, they offer a practical and comprehensive approach to mastering and implementing SAP RPA solutions effectively. Who is it for? This book is perfect for developers and business analysts eager to explore SAP Intelligent RPA. It caters to those with a basic knowledge of JavaScript who aspire to leverage RPA for automating monotonous workflows. If you're looking to dive into SAP's automation framework and understand its practical applications, this book is a great fit for you.

IBM Z Functional Matrix

2022-05-19 O'Reilly Amazon

book

Ewerson Palacio , Bill White , Octavian Lascu

data data-engineering IBM Marketing

This IBM® Redpaper™ publication provides a list of features and functions that are supported on IBM Z, including: IBM z16™ - Machine type 3931; IBM z15™ - Machine types 8561 and 8562; IBM z14™ - Machine types 3906 and 3907. On 30 June 2021, the IBM z14 (M/T 3906) was withdrawn from marketing (WDMF). Field-installed features and all associated conversions that are delivered solely through a modification to the machine's Licensed Internal Code (LIC) are still possible until 29 June 2022. This IBM Redpaper publication can help you quickly understand the features, functions, and connectivity alternatives that are available when planning and designing IBM Z infrastructures.

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

2022-05-19 O'Reilly Amazon

book

Dhiraj Kumar , Jessica Tischbierek , Johannes Rank , Elena Wolz , André Bögelsack , Utpal Chakraborty

data data-engineering SAP AWS Azure Cloud Computing

This book helps SAP architects and SAP Basis administrators deploy and operate SAP S/4HANA systems on the most common public cloud platforms. Market-leading cloud offerings are covered, including Amazon Web Services, Microsoft Azure, and Google Cloud. You will gain an end-to-end understanding of the initial implementation of SAP S/4HANA systems on those platforms. You will learn how to move away from the big monolithic SAP ERP systems and arrive at an environment with a central SAP S/4HANA system as the digital core surrounded by cloud-native services. The book begins by introducing the core concepts of Hyperscaler cloud platforms that are relevant to SAP. You will learn about the architecture of SAP S/4HANA systems on public cloud platforms, with specific content provided for each of the major platforms. The book simplifies the deployment of SAP S/4HANA systems in public clouds by providing step-by-step instructions and helping you deal with thecomplexity of such a deployment. Content in the book is based on best practices, industry lessons learned, and architectural blueprints, helping you develop deep insights into the operations of SAP S/4HANA systems on public cloud platforms. Reading this book enables you to build and operate your own SAP S/4HANA system in the public cloud with a minimum of effort. What You Will Learn Choose the right Hyperscaler platform for your future SAP S/4HANA workloads Start deploying your first SAP S/4HANA system in the public cloud Avoid typical pitfalls during your implementation Apply and leverage cloud-native services for your SAP S/4HANA system Save costs by choosing the right architecture and build a robust architecture for your most critical SAP systems Meet your business’ criteria for availability and performance by having the right sizing in place Identify further use cases whenoperating SAP S/4HANA in the public cloud Who This Book Is For SAP architects looking for an answer on how to move SAP S/4HANA systems from on-premises into the cloud; those planning to deploy to one of the three major platforms from Amazon Web Services, Microsoft Azure, and Google Cloud Platform; and SAP Basis administrators seeking a detailed and realistic description of how to get started on a migration to the cloud and how to drive that cloud implementation to completion

Advanced SQL with SAS

2022-05-01 O'Reilly Amazon

book

Christian FG Schendera

data data-engineering SQL Cloud Computing Data Quality SAS

This book introduces advanced techniques for using PROC SQL in SAS. If you are a SAS programmer, analyst, or student who has mastered the basics of working with SQL, Advanced SQL with SAS® will help take your skills to the next level. Filled with practical examples with detailed explanations, this book demonstrates how to improve performance and speed for large data sets. Although the book addresses advanced topics, it is designed to progress from the simple and manageable to the complex and sophisticated. In addition to numerous tuning techniques, this book also touches on implicit and explicit pass-throughs, presents alternative SAS grid- and cloud-based processing environments, and compares SAS programming languages and approaches including FedSQL, CAS, DS2, and hash programming. Other topics include: Missing values and data quality with audit trails “Blind spots” like how missing values can affect even the simplest calculations and table joins SAS macro language and SAS macro programs SAS functions Integrity constraints SAS Dictionaries SAS Compute Server

Python for ArcGIS Pro

2022-04-29 O'Reilly Amazon

book

William Parker , Silas Toms

data data-engineering location-data geographic-information-system-gis arcgis API

Python for ArcGIS Pro is your guide to automating geospatial tasks and maximizing your productivity using Python. Inside, you'll learn how to integrate Python scripting into ArcGIS workflows to streamline map production, data analysis, and data management. What this Book will help me do Automate map production and streamline repetitive cartography tasks. Conduct geospatial data analysis using Python libraries like pandas and NumPy. Integrate ArcPy and ArcGIS API for Python to manage geospatial data more effectively. Create script tools to improve repeatability and manage datasets. Publish and manage geospatial data to ArcGIS Online seamlessly. Author(s) None Toms and None Parker are both experienced GIS professionals and Python developers. With years of hands-on experience using Esri technology in real-world scenarios, they bring practical insights into the application's nuances. Their collaborative approach allows them to demystify technical concepts, making their teachings accessible to audiences of all skill levels. Who is it for? This book is for ArcGIS users looking to integrate Python into workflows, whether you're a GIS specialist, technician, or analyst. It's also suitable for those transitioning to roles requiring programming skills. A basic understanding of ArcGIS helps, but the book starts from the fundamentals.

The MySQL Workshop

2022-04-29 O'Reilly Amazon

book

Thomas Pettit , Scott Cosentino , Dr. Vlad Sebastian Ionescu

data data-engineering relational-databases MySQL Data Management RDBMS

The MySQL Workshop is your comprehensive, hands-on guide to learning and mastering MySQL database management. This book covers everything from setting up a database to working with SQL queries, managing data, and securing your databases. With practical exercises and real-world scenarios, you'll quickly gain the confidence and skills to handle MySQL databases effectively. What this Book will help me do Understand and implement the core concepts of relational databases. Write, execute, and optimize SQL queries for data management. Connect MySQL databases to applications like MS Access and Excel. Secure databases by managing user roles and permissions effectively. Perform database backups and restores to maintain data integrity. Author(s) Thomas Pettit and Scott Cosentino are experienced professionals in database management and MySQL technologies. With years of industry experience, they bring a wealth of knowledge to their writing. They focus on breaking down complex topics into digestible lessons, ensuring practical learning outcomes. Who is it for? This book is ideal for tech professionals and students looking to learn MySQL. Beginners will find a gentle introduction, while those with some SQL background will deepen their understanding and cover gaps in knowledge. It suits professionals dealing with data who want actionable MySQL skills for work and projects.

IBM z16 Technical Introduction

2022-04-28 O'Reilly Amazon

book

Gerard Laumay , Roman Vogt , Ewerson Palacio , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Bill White , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga , Martijn Raave , Andre Spahni , Bo XU , Makus Ertl , Slav Martinksi

data data-engineering IBM Analytics Cloud Computing Cyber Security

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform that is built with the IBM Telum processor: the IBM z16 server. The IBM Z platform is recognized for its security, resiliency, performance, and scale. It is relied on for mission-critical workloads and as an essential element of hybrid cloud infrastructures. The IBM z16 server adds capabilities and value with innovative technologies that are needed to accelerate the digital transformation journey. This book explains how the IBM z16 server uses innovations and traditional IBM Z strengths to satisfy the growing demand for cloud, analytics, and a more flexible infrastructure. With the IBM z16 servers as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

SAP S/4HANA Conversion: A Guide to Executing and Simplifying Your Conversion

2022-04-27 O'Reilly Amazon

book

Ravi Surya Subrahmanyam

data data-engineering SAP ERP

Succeed in your conversion to SAP S/4HANA. This book will help you understand the core aspects and implement a conversion project. You will start with an overview of the SAP S/4HANA conversion tools: Readiness Check, Simplification Item Check report, Maintenance Planner, Custom Code Analysis, SUM (Software Update Manager), and more. You will understand the preparation activities for SAP FI (Finance), SAP CO (Controlling), SAP AA (Asset Accounting), Material Ledger, and COPA (Controlling–Profitability Analysis). And you will find the SAP CVI (Customer/Vendor Integration) steps that can help consultants understand the mandatory activities to be completed as a part of preparation on the SAP ECC (ERP Central Component) system. You will learn the preparation activities for conversion of accounting to SAP S/4HANA, and migration activities: customizing, asset accounting, controlling, and house bank accounts. You will gain knowledge on data migration activities such as the migration of cost elements, technical check of transactional data, material ledger migration enrichment of data, migration of line items, balances, and general ledger allocations to journal entry tables. After reading this book, you will know how to use the Migration Cockpit for data migration and post-conversion activities to successfully execute and implement an SAP S/4 HANA conversion. What You Will Learn Choose an ideal path and planning tools for SAP S/4HANA Start with the preparation step: General Ledger Accounting, Asset Accounting, Controlling, Material Ledger, and so on Use Migration Cockpit for conversion preparation, migration, and post-migration activities Who This Book Is For SAP application consultants, finance consultants, and CVI consultants who need help with SAP S/4HANA conversion

Early Threat Detection and Safeguarding Data with IBM QRadar and IBM Copy Services Manager on IBM DS8000

2022-04-21 O'Reilly Amazon

book

IBM

data data-engineering IBM

The focus of this blueprint is to highlight early threat detection by IBM® QRadar® and to proactively start a cyber resilience workflow in response to a cyberattack or malicious user actions. The workflow uses IBM Copy Services Manager (CSM) as orchestration software to start IBM DS8000® Safeguarded Copy functions. The Safeguarded Copy creates an immutable copy of the data in an air-gapped form on the same DS8000 system for isolation and eventual quick recovery. This document also explains the steps that are involved to enable and forward IBM DS8000 audit logs to IBM QRadar. It also discusses how to use create various rules to determine a threat, and configure and start a suitable response to the detected threat in IBM QRadar. Finally, this document explains how to register a storage system and create a Scheduled Task by using CSM.

IBM Power Systems S922, S914, and S924 Technical Overview and Introduction Featuring PCIe Gen 4 Technology

2022-04-18 O'Reilly Amazon

book

Scott Vetter , Mauro Minomizaki , Bartlomiej Grabowski , Armin Röll

data data-engineering IBM Linux Marketing

This IBM® Redpaper publication is a comprehensive guide that covers the IBM Power System S914 (9009-41G), IBM Power System S922 (9009-22G), and IBM Power System S924 (9009-42G) servers that use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems (OSs). The goal of this paper is to provide a hardware architecture analysis and highlight the changes, new technologies, and major features that are being introduced in these systems, such as: The latest IBM POWER9 processor, which is available in various configurations for the number of cores per socket More performance by using industry-leading Peripheral Component Interconnect Express (PCIe) Gen 4 slots Enhanced internal disk scalability and performance with up to 11 NVMe adapters Introduction of a competitive Power S922 server with a 1-socket configuration that is targeted at IBM i customers This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power S914, Power S922, and Power S924 systems. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

IBM GDPS: An Introduction to Concepts and Capabilities

2022-04-13 O'Reilly Amazon

book

Lydia Parziale

data data-engineering IBM

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues that are related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

CockroachDB: The Definitive Guide

2022-04-11 O'Reilly Amazon

book

Jesse Seldess , Ben Darnell , Guy Harrison

data data-engineering relational-databases cockroachdb Cloud Computing Data Modelling

Get the lowdown on CockroachDB, the distributed SQL database built to handle the demands of today's data-driven cloud applications. In this hands-on guide, software developers, architects, and DevOps/SRE teams will learn how to use CockroachDB to create applications that scale elastically and provide seamless delivery for end users while remaining indestructible. Teams will also learn how to migrate existing applications to CockroachDB's performant, cloud native data architecture. If you're familiar with distributed systems, you'll quickly discover the benefits of strong data correctness and consistency guarantees as well as optimizations for delivering ultra low latencies to globally distributed end users. You'll learn how to: Design and build applications for distributed infrastructure, including data modeling and schema design Migrate data into CockroachDB Read and write data and run ACID transactions across distributed infrastructure Plan a CockroachDB deployment for resiliency across single region and multi-region clusters Secure, monitor, and optimize your CockroachDB deployment

Data Algorithms with Spark

2022-04-11 O'Reilly Amazon

book

Mahmoud Parsian

data data-engineering apache-spark AI/ML Analytics API

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark. In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script. With this book, you will: Learn how to select Spark transformations for optimized solutions Explore powerful transformations and reductions including reduceByKey(), combineByKey(), and mapPartitions() Understand data partitioning for optimized queries Build and apply a model using PySpark design patterns Apply motif-finding algorithms to graph data Analyze graph data by using the GraphFrames API Apply PySpark algorithms to clinical and genomics data Learn how to use and apply feature engineering in ML algorithms Understand and use practical and pragmatic data design patterns

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

Tidy Modeling with R

IBM TS7700 Release 5.2.2 Guide

IBM DS8900F and IBM Z Synergy DS8900F: Release 9.3 and z/OS 2.5

Data Engineering with Alteryx

Ten Things to Know About ModelOps

In-Memory Analytics with Apache Arrow

Fundamentals of Data Engineering

Advanced Analytics with PySpark

IBM SAN Volume Controller Model SV3 Product Guide

Elasticsearch 8.x Cookbook - Fifth Edition

IBM Power Systems High Availability and Disaster Recovery Updates: Planning for a Multicloud Environment

SAP Intelligent RPA for Developers

IBM Z Functional Matrix

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

Advanced SQL with SAS

Python for ArcGIS Pro

The MySQL Workshop

IBM z16 Technical Introduction

SAP S/4HANA Conversion: A Guide to Executing and Simplifying Your Conversion

Early Threat Detection and Safeguarding Data with IBM QRadar and IBM Copy Services Manager on IBM DS8000

IBM Power Systems S922, S914, and S924 Technical Overview and Introduction Featuring PCIe Gen 4 Technology

IBM GDPS: An Introduction to Concepts and Capabilities

CockroachDB: The Definitive Guide

Data Algorithms with Spark