data-engineering

IBM Z Functional Matrix

2022-05-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ewerson Palacio , Bill White , Octavian Lascu

IBM Marketing data

This IBM® Redpaper™ publication provides a list of features and functions that are supported on IBM Z, including: IBM z16™ - Machine type 3931; IBM z15™ - Machine types 8561 and 8562; IBM z14™ - Machine types 3906 and 3907. On 30 June 2021, the IBM z14 (M/T 3906) was withdrawn from marketing (WDMF). Field-installed features and all associated conversions that are delivered solely through a modification to the machine's Licensed Internal Code (LIC) are still possible until 29 June 2022. This IBM Redpaper publication can help you quickly understand the features, functions, and connectivity alternatives that are available when planning and designing IBM Z infrastructures.

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

2022-05-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dhiraj Kumar , Jessica Tischbierek , Johannes Rank , Elena Wolz , André Bögelsack , Utpal Chakraborty

AWS Azure Cloud Computing ERP GCP Microsoft SAP data

This book helps SAP architects and SAP Basis administrators deploy and operate SAP S/4HANA systems on the most common public cloud platforms. Market-leading cloud offerings are covered, including Amazon Web Services, Microsoft Azure, and Google Cloud. You will gain an end-to-end understanding of the initial implementation of SAP S/4HANA systems on those platforms. You will learn how to move away from the big monolithic SAP ERP systems and arrive at an environment with a central SAP S/4HANA system as the digital core surrounded by cloud-native services. The book begins by introducing the core concepts of Hyperscaler cloud platforms that are relevant to SAP. You will learn about the architecture of SAP S/4HANA systems on public cloud platforms, with specific content provided for each of the major platforms. The book simplifies the deployment of SAP S/4HANA systems in public clouds by providing step-by-step instructions and helping you deal with thecomplexity of such a deployment. Content in the book is based on best practices, industry lessons learned, and architectural blueprints, helping you develop deep insights into the operations of SAP S/4HANA systems on public cloud platforms. Reading this book enables you to build and operate your own SAP S/4HANA system in the public cloud with a minimum of effort. What You Will Learn Choose the right Hyperscaler platform for your future SAP S/4HANA workloads Start deploying your first SAP S/4HANA system in the public cloud Avoid typical pitfalls during your implementation Apply and leverage cloud-native services for your SAP S/4HANA system Save costs by choosing the right architecture and build a robust architecture for your most critical SAP systems Meet your business’ criteria for availability and performance by having the right sizing in place Identify further use cases whenoperating SAP S/4HANA in the public cloud Who This Book Is For SAP architects looking for an answer on how to move SAP S/4HANA systems from on-premises into the cloud; those planning to deploy to one of the three major platforms from Amazon Web Services, Microsoft Azure, and Google Cloud Platform; and SAP Basis administrators seeking a detailed and realistic description of how to get started on a migration to the cloud and how to drive that cloud implementation to completion

Advanced SQL with SAS

2022-05-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian FG Schendera

Cloud Computing Data Quality SAS SQL data

This book introduces advanced techniques for using PROC SQL in SAS. If you are a SAS programmer, analyst, or student who has mastered the basics of working with SQL, Advanced SQL with SAS® will help take your skills to the next level. Filled with practical examples with detailed explanations, this book demonstrates how to improve performance and speed for large data sets. Although the book addresses advanced topics, it is designed to progress from the simple and manageable to the complex and sophisticated. In addition to numerous tuning techniques, this book also touches on implicit and explicit pass-throughs, presents alternative SAS grid- and cloud-based processing environments, and compares SAS programming languages and approaches including FedSQL, CAS, DS2, and hash programming. Other topics include: Missing values and data quality with audit trails “Blind spots” like how missing values can affect even the simplest calculations and table joins SAS macro language and SAS macro programs SAS functions Integrity constraints SAS Dictionaries SAS Compute Server

Python for ArcGIS Pro

2022-04-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by William Parker , Silas Toms

API Data Management GIS NumPy Pandas Python arcgis data geographic-information-system-gis location-data

Python for ArcGIS Pro is your guide to automating geospatial tasks and maximizing your productivity using Python. Inside, you'll learn how to integrate Python scripting into ArcGIS workflows to streamline map production, data analysis, and data management. What this Book will help me do Automate map production and streamline repetitive cartography tasks. Conduct geospatial data analysis using Python libraries like pandas and NumPy. Integrate ArcPy and ArcGIS API for Python to manage geospatial data more effectively. Create script tools to improve repeatability and manage datasets. Publish and manage geospatial data to ArcGIS Online seamlessly. Author(s) None Toms and None Parker are both experienced GIS professionals and Python developers. With years of hands-on experience using Esri technology in real-world scenarios, they bring practical insights into the application's nuances. Their collaborative approach allows them to demystify technical concepts, making their teachings accessible to audiences of all skill levels. Who is it for? This book is for ArcGIS users looking to integrate Python into workflows, whether you're a GIS specialist, technician, or analyst. It's also suitable for those transitioning to roles requiring programming skills. A basic understanding of ArcGIS helps, but the book starts from the fundamentals.

The MySQL Workshop

2022-04-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Thomas Pettit , Scott Cosentino , Dr. Vlad Sebastian Ionescu

Data Management MySQL RDBMS SQL data relational-databases

The MySQL Workshop is your comprehensive, hands-on guide to learning and mastering MySQL database management. This book covers everything from setting up a database to working with SQL queries, managing data, and securing your databases. With practical exercises and real-world scenarios, you'll quickly gain the confidence and skills to handle MySQL databases effectively. What this Book will help me do Understand and implement the core concepts of relational databases. Write, execute, and optimize SQL queries for data management. Connect MySQL databases to applications like MS Access and Excel. Secure databases by managing user roles and permissions effectively. Perform database backups and restores to maintain data integrity. Author(s) Thomas Pettit and Scott Cosentino are experienced professionals in database management and MySQL technologies. With years of industry experience, they bring a wealth of knowledge to their writing. They focus on breaking down complex topics into digestible lessons, ensuring practical learning outcomes. Who is it for? This book is ideal for tech professionals and students looking to learn MySQL. Beginners will find a gentle introduction, while those with some SQL background will deepen their understanding and cover gaps in knowledge. It suits professionals dealing with data who want actionable MySQL skills for work and projects.

IBM z16 Technical Introduction

2022-04-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Gerard Laumay , Roman Vogt , Ewerson Palacio , Jannie Houlbjerg , Kazuhiro Nakajima , John Troy , Bill White , Paul Schouten , Octavian Lascu , Anna Shugol , Hervey Kamga , Martijn Raave , Andre Spahni , Bo XU , Makus Ertl , Slav Martinksi

Analytics Cloud Computing IBM Cyber Security data

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform that is built with the IBM Telum processor: the IBM z16 server. The IBM Z platform is recognized for its security, resiliency, performance, and scale. It is relied on for mission-critical workloads and as an essential element of hybrid cloud infrastructures. The IBM z16 server adds capabilities and value with innovative technologies that are needed to accelerate the digital transformation journey. This book explains how the IBM z16 server uses innovations and traditional IBM Z strengths to satisfy the growing demand for cloud, analytics, and a more flexible infrastructure. With the IBM z16 servers as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

SAP S/4HANA Conversion: A Guide to Executing and Simplifying Your Conversion

2022-04-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Surya Subrahmanyam

ERP SAP data

Succeed in your conversion to SAP S/4HANA. This book will help you understand the core aspects and implement a conversion project. You will start with an overview of the SAP S/4HANA conversion tools: Readiness Check, Simplification Item Check report, Maintenance Planner, Custom Code Analysis, SUM (Software Update Manager), and more. You will understand the preparation activities for SAP FI (Finance), SAP CO (Controlling), SAP AA (Asset Accounting), Material Ledger, and COPA (Controlling–Profitability Analysis). And you will find the SAP CVI (Customer/Vendor Integration) steps that can help consultants understand the mandatory activities to be completed as a part of preparation on the SAP ECC (ERP Central Component) system. You will learn the preparation activities for conversion of accounting to SAP S/4HANA, and migration activities: customizing, asset accounting, controlling, and house bank accounts. You will gain knowledge on data migration activities such as the migration of cost elements, technical check of transactional data, material ledger migration enrichment of data, migration of line items, balances, and general ledger allocations to journal entry tables. After reading this book, you will know how to use the Migration Cockpit for data migration and post-conversion activities to successfully execute and implement an SAP S/4 HANA conversion. What You Will Learn Choose an ideal path and planning tools for SAP S/4HANA Start with the preparation step: General Ledger Accounting, Asset Accounting, Controlling, Material Ledger, and so on Use Migration Cockpit for conversion preparation, migration, and post-migration activities Who This Book Is For SAP application consultants, finance consultants, and CVI consultants who need help with SAP S/4HANA conversion

Early Threat Detection and Safeguarding Data with IBM QRadar and IBM Copy Services Manager on IBM DS8000

2022-04-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

IBM data

The focus of this blueprint is to highlight early threat detection by IBM® QRadar® and to proactively start a cyber resilience workflow in response to a cyberattack or malicious user actions. The workflow uses IBM Copy Services Manager (CSM) as orchestration software to start IBM DS8000® Safeguarded Copy functions. The Safeguarded Copy creates an immutable copy of the data in an air-gapped form on the same DS8000 system for isolation and eventual quick recovery. This document also explains the steps that are involved to enable and forward IBM DS8000 audit logs to IBM QRadar. It also discusses how to use create various rules to determine a threat, and configure and start a suitable response to the detected threat in IBM QRadar. Finally, this document explains how to register a storage system and create a Scheduled Task by using CSM.

IBM Power Systems S922, S914, and S924 Technical Overview and Introduction Featuring PCIe Gen 4 Technology

2022-04-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Vetter , Mauro Minomizaki , Bartlomiej Grabowski , Armin Röll

IBM Linux Marketing data

This IBM® Redpaper publication is a comprehensive guide that covers the IBM Power System S914 (9009-41G), IBM Power System S922 (9009-22G), and IBM Power System S924 (9009-42G) servers that use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems (OSs). The goal of this paper is to provide a hardware architecture analysis and highlight the changes, new technologies, and major features that are being introduced in these systems, such as: The latest IBM POWER9 processor, which is available in various configurations for the number of cores per socket More performance by using industry-leading Peripheral Component Interconnect Express (PCIe) Gen 4 slots Enhanced internal disk scalability and performance with up to 11 NVMe adapters Introduction of a competitive Power S922 server with a 1-socket configuration that is targeted at IBM i customers This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power S914, Power S922, and Power S924 systems. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

IBM GDPS: An Introduction to Concepts and Capabilities

2022-04-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lydia Parziale

IBM data

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues that are related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

CockroachDB: The Definitive Guide

2022-04-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jesse Seldess , Ben Darnell , Guy Harrison

Cloud Computing Data Modelling DevOps SQL cockroachdb data relational-databases

Get the lowdown on CockroachDB, the distributed SQL database built to handle the demands of today's data-driven cloud applications. In this hands-on guide, software developers, architects, and DevOps/SRE teams will learn how to use CockroachDB to create applications that scale elastically and provide seamless delivery for end users while remaining indestructible. Teams will also learn how to migrate existing applications to CockroachDB's performant, cloud native data architecture. If you're familiar with distributed systems, you'll quickly discover the benefits of strong data correctness and consistency guarantees as well as optimizations for delivering ultra low latencies to globally distributed end users. You'll learn how to: Design and build applications for distributed infrastructure, including data modeling and schema design Migrate data into CockroachDB Read and write data and run ACID transactions across distributed infrastructure Plan a CockroachDB deployment for resiliency across single region and multi-region clusters Secure, monitor, and optimize your CockroachDB deployment

Data Algorithms with Spark

2022-04-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mahmoud Parsian

AI/ML Analytics API ETL/ELT PySpark Spark apache-spark data

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark. In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You'll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script. With this book, you will: Learn how to select Spark transformations for optimized solutions Explore powerful transformations and reductions including reduceByKey(), combineByKey(), and mapPartitions() Understand data partitioning for optimized queries Build and apply a model using PySpark design patterns Apply motif-finding algorithms to graph data Analyze graph data by using the GraphFrames API Apply PySpark algorithms to clinical and genomics data Learn how to use and apply feature engineering in ML algorithms Understand and use practical and pragmatic data design patterns

Logging in Action

2022-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Phil Wilkins

Cloud Computing Docker IoT JSON Kubernetes MongoDB data elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Make log processing a real asset to your organization with powerful and free open source tools. In Logging in Action you will learn how to: Deploy Fluentd and Fluent Bit into traditional on-premises, IoT, hybrid, cloud, and multi-cloud environments, both small and hyperscaled Configure Fluentd and Fluent Bit to solve common log management problems Use Fluentd within Kubernetes and Docker services Connect a custom log source or destination with Fluentd’s extensible plugin framework Logging best practices and common pitfalls Logging in Action is a guide to optimize and organize logging using the CNCF Fluentd and Fluent Bit projects. You’ll use the powerful log management tool Fluentd to solve common log management, and learn how proper log management can improve performance and make management of software and infrastructure solutions easier. Through useful examples like sending log-driven events to Slack, you’ll get hands-on experience applying structure to your unstructured data. About the Technology Don’t fly blind! An effective logging system can help you see and correct problems before they cripple your software. With the Fluentd log management tool, it’s a snap to monitor the behavior and health of your software and infrastructure in real time. Designed to collect and process log data from multiple sources using the industry-standard JSON format, Fluentd delivers a truly unified logging layer across all your systems. About the Book Logging in Action teaches you to record and analyze application and infrastructure data using Fluentd. Using clear, relevant examples, it shows you exactly how to transform raw system data into a unified stream of actionable information. You’ll discover how logging configuration impacts the way your system functions and set up Fluentd to handle data from legacy IT environments, local data centers, and massive Kubernetes-driven distributed systems. You’ll even learn how to implement complex log parsing with RegEx and output events to MongoDB and Slack. What's Inside Capture log events from a wide range of systems and software, including Kubernetes and Docker Connect to custom log sources and destinations Employ Fluentd’s extensible plugin framework Create a custom plugin for niche problems About the Reader For developers, architects, and operations professionals familiar with the basics of monitoring and logging. About the Author Phil Wilkins has spent over 30 years in the software industry. Has worked for small startups through to international brands. Quotes I highly recommend using Logging in Action as a getting-started guide, a refresher, or as a way to optimize your logging journey. - From the Foreword by Anurag Gupta, Fluent maintainer and Cofounder, Calyptia Covers everything you need if you want to implement a logging system using open source technology such as Fluentd and Kubernetes. - Alex Saez, Naranja X A great exploration of the features and capabilities of Fluentd, along with very useful hands-on exercises. - George Thomas, Manhattan Associates A practical holistic guide to integrating logging into your enterprise architecture. - Satej Sahu, Honeywell

PostgreSQL 14 Administration Cookbook

2022-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Simon Riggs , Gianni Ciolli

Cloud Computing Cyber Security data postgresql relational-databases

PostgreSQL 14 Administration Cookbook provides a hands-on guide to mastering the administration of PostgreSQL 14. With over 175 recipes, this book equips you with practical techniques to manage, secure, and optimize your PostgreSQL databases, ensuring they are robust and high-performing. What this Book will help me do Master managing PostgreSQL databases both on-premises and in the cloud efficiently. Implement effective backup and recovery strategies to secure your data. Leverage the latest features of PostgreSQL 14 to enhance your database workflows. Understand and apply best practices for maintaining high availability and performance. Troubleshoot real-world challenges with guided solutions and expert insights. Author(s) Simon Riggs and Gianni Ciolli are seasoned database experts with years of experience working with PostgreSQL. Simon is a PostgreSQL core team member, contributing his technical knowledge towards building robust database solutions, while Gianni brings a wealth of expertise in database administration and support. Together, they share a passion for making complex database concepts accessible and actionable. Who is it for? This book is for database administrators, data architects, and developers who manage PostgreSQL databases and are looking to deepen their knowledge. It is suitable for professionals with some experience in PostgreSQL who aim to maximize their database's performance and security, as well as for those new to the system seeking a comprehensive start. Readers with an interest in practical, problem-solving approaches to database management will greatly benefit from this cookbook.

IBM Power Systems Virtual Server Guide for IBM i

2022-03-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Sanjeev Chhabra , Sergio Leyva , Ahmad Y Hussein , Marcelos Avalos , Luis Eduardo Silva Viera , Gabriel Padilla Jimenez , Diego Kesselman , Bogdan Savu , Adriano Almeida , Luis Ferreira , Travis Siegfried , Michael Easlon , Jose Martin Abeleira , Deepak C Shetty

Data Management IBM data ibm-power-systems

This IBM® Redbooks® publication delivers a how-to usage content perspective that describes deployment, networking, and data management tasks on the IBM Power Systems Virtual Server by using sample scenarios. During the content development, the team used available documentation, IBM Power Systems Virtual Server environment, and other software and hardware resources to document the following information: IBM Power Systems Virtual Server networking and data management deployment scenarios Migrations use case scenarios Backups case scenarios Disaster recovery case scenarios This book addresses topics for IT architects, IT specialists, developers, sellers, and anyone who wants to implement and manage workloads in the IBM Power Systems Virtual Server. This publication also describes transferring the how-to-skills to the technical teams, and solution guidance to the sales team. This book compliments the documentation that available at the IBM Documentation web page and aligns with the educational materials that are provided by IBM Garage for Systems Technical Education.

Grokking Streaming Systems

2022-03-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ning Wang , Josh Fischer

IoT Java Kafka Cyber Security Spark Data Streaming data streaming-architecture streaming-messaging

A friendly, framework-agnostic tutorial that will help you grok how streaming systems work—and how to build your own! In Grokking Streaming Systems you will learn how to: Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Assess parallelization requirements Spot networking bottlenecks and resolve back pressure Group data for high-performance systems Handle delayed events in real-time systems Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that’s a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities! About the Technology Streaming systems minimize the time between receiving and processing event data, so they can deliver responses in real time. For applications in finance, security, and IoT where milliseconds matter, streaming systems are a requirement. And streaming is hot! Skills on platforms like Spark, Heron, and Kafka are in high demand. About the Book Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you’ll build your own simple streaming tool from the ground up to make sure all the ideas and techniques stick. The helpful and entertaining illustrations make streaming systems come alive as you tackle relevant examples like real-time credit card fraud detection and monitoring IoT services. What's Inside Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Spot networking bottlenecks and resolve backpressure Group data for high-performance systems About the Reader No prior experience with streaming systems is assumed. Examples in Java. About the Authors Josh Fischer and Ning Wang are Apache Committers, and part of the committee for the Apache Heron distributed stream processing engine. Quotes Very well-written and enjoyable. I recommend this book to all software engineers working on data processing. - Apoorv Gupta, Facebook Finally, a much-needed introduction to streaming systems—a must-read for anyone interested in this technology. - Anupam Sengupta, Red Hat Tackles complex topics in a very approachable manner. - Marc Roulleau, GIRO A superb resource for helping you grasp the fundamentals of open-source streaming systems. - Simon Verhoeven, Cronos Explains all the main streaming concepts in a friendly way. Start with this one! - Cicero Zandona, Calypso Technologies

IBM FlashSystem Safeguarded Copy Implementation Guide

2022-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hemanand Gadgil , Gucer Vasfi , Andrew Greenfield , Jackson Shea

IBM data

Safeguarded Copy function that is available with IBM® Spectrum Virtualize Version 8.4.2 supports the ability to create cyber-resilient point-in-time copies of volumes that cannot be changed or deleted through user errors, malicious actions, or ransomware attacks. The system integrates with IBM Copy Services Manager to provide automated backup copies and data recovery. This IBM Redpaper® publication introduces the features and functions of Safeguarded Copy function by using several examples. This document is aimed at pre-sales and post-sales technical support specialists and storage administrators.

Simplify Big Data Analytics with Amazon EMR

2022-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sakti Mishra (AWS)

Analytics AWS Amazon EMR Big Data Cloud Computing Data Analytics Data Governance ETL/ELT Hadoop Java Python Scala +5 more

Simplify Big Data Analytics with Amazon EMR is a thorough guide to harnessing Amazon's EMR service for big data processing and analytics. From distributed computation pipelines to real-time streaming analytics, this book provides hands-on knowledge and actionable steps for implementing data solutions efficiently. What this Book will help me do Understand the architecture and key components of Amazon EMR and how to deploy it effectively. Learn to configure and manage distributed data processing pipelines using Amazon EMR. Implement security and data governance best practices within the Amazon EMR ecosystem. Master batch ETL and real-time analytics techniques using technologies like Apache Spark. Apply optimization and cost-saving strategies to scalable data solutions. Author(s) Sakti Mishra is a seasoned data professional with extensive expertise in deploying scalable analytics solutions on cloud platforms like AWS. With a background in big data technologies and a passion for teaching, Sakti ensures practical insights accompany every concept. Readers will find his approach thorough, hands-on, and highly informative. Who is it for? This book is perfect for data engineers, data scientists, and other professionals looking to leverage Amazon EMR for scalable analytics. If you are familiar with Python, Scala, or Java and have some exposure to Hadoop or AWS ecosystems, this book will empower you to design and implement robust data pipelines efficiently.

Getting Started with Elastic Stack 8.0

2022-03-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Asjad Athick

BI ELK Kibana Logstash Cyber Security data elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Discover how to harness the power of the Elastic Stack 8.0 to manage, analyze, and secure complex data environments. You will learn to combine components such as Elasticsearch, Kibana, Logstash, and more to build scalable and effective solutions for your organization. By focusing on hands-on implementations, this book ensures you can apply your knowledge to real-world use cases. What this Book will help me do Set up and manage Elasticsearch clusters tailored to various architecture scenarios. Utilize Logstash and Elastic Agent to ingest and process diverse data sources efficiently. Create interactive dashboards and data models in Kibana, enabling business intelligence insights. Implement secure and effective search infrastructures for enterprise applications. Deploy Elastic SIEM to fortify your organization's security against modern cybersecurity threats. Author(s) Asjad Athick is a seasoned technologist and author with expertise in developing scalable data solutions. With years of experience working with the Elastic Stack, Asjad brings a pragmatic approach to teaching complex architectures. His dedication to explaining technical concepts in an accessible manner makes this book a valuable resource for learners. Who is it for? This book is ideal for developers seeking practical knowledge in search, observability, and security solutions using Elastic Stack. Solutions architects who aim to design scalable data platforms will also benefit greatly. Even tech leads or managers keen to understand the Elastic Stack's impact on their operations will find the insights valuable. No prior experience with Elastic Stack is needed.

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications

2022-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Haines (Databricks)

AI/ML Airflow Data Contracts Data Engineering Docker Kafka Kubernetes MySQL Redis S3 Spark SQL +3 more

Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire local data platform running Apache Spark, Apache Zeppelin, Apache Kafka, Redis, MySQL, Minio (S3), and Apache Airflow. Apache Spark applications solve a wide range of data problems from traditional data loading and processing to rich SQL-based analysis as well as complex machine learning workloads and even near real-time processing of streaming data. Spark fits well as a central foundation for any data engineering workload. This book will teach you to write interactive Spark applications using Apache Zeppelin notebooks, write and compilereusable applications and modules, and fully test both batch and streaming. You will also learn to containerize your applications using Docker and run and deploy your Spark applications using a variety of tools such as Apache Airflow, Docker and Kubernetes. Reading this book will empower you to take advantage of Apache Spark to optimize your data pipelines and teach you to craft modular and testable Spark applications. You will create and deploy mission-critical streaming spark applications in a low-stress environment that paves the way for your own path to production. What You Will Learn Simplify data transformation with Spark Pipelines and Spark SQL Bridge data engineering with machine learning Architect modular data pipeline applications Build reusable application components and libraries Containerize your Spark applications for consistency and reliability Use Docker and Kubernetes to deploy your Spark applications Speed up application experimentation using Apache Zeppelin and Docker Understand serializable structured data and data contracts Harness effective strategies for optimizing data in your data lakes Build end-to-end Spark structured streaming applications using Redis and Apache Kafka Embrace testing for your batch and streaming applications Deploy and monitor your Spark applications Who This Book Is For Professional software engineers who want to take their current skills and apply them to new and exciting opportunities within the data ecosystem, practicing data engineers who are looking for a guiding light while traversing the many challenges of moving from batch to streaming modes, data architects who wish to provide clear and concise direction for how best to harness anduse Apache Spark within their organization, and those interested in the ins and outs of becoming a modern data engineer in today's fast-paced and data-hungry world

talk-data.com

Activity Trend

Top Events

Top Speakers

IBM Z Functional Matrix

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

Advanced SQL with SAS

Python for ArcGIS Pro

The MySQL Workshop

IBM z16 Technical Introduction

SAP S/4HANA Conversion: A Guide to Executing and Simplifying Your Conversion

Early Threat Detection and Safeguarding Data with IBM QRadar and IBM Copy Services Manager on IBM DS8000

IBM Power Systems S922, S914, and S924 Technical Overview and Introduction Featuring PCIe Gen 4 Technology

IBM GDPS: An Introduction to Concepts and Capabilities

CockroachDB: The Definitive Guide

Data Algorithms with Spark

Logging in Action

PostgreSQL 14 Administration Cookbook

IBM Power Systems Virtual Server Guide for IBM i

Grokking Streaming Systems

IBM FlashSystem Safeguarded Copy Implementation Guide

Simplify Big Data Analytics with Amazon EMR

Getting Started with Elastic Stack 8.0

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications