O'Reilly Data Engineering Books

Getting Started with IBM Hyper Protect Data Controller

2022-01-04 O'Reilly Amazon

book

Guillaume Hoareau , Bill White , Eva Yan , Philippe Richard , Jason Katonica , Roy Panting , Maxwell Weiss , Andy Coulson

data data-engineering IBM Cyber Security

IBM® Hyper Protect Data Controller is designed to provide privacy protection of your sensitive data and give ease of control and auditability. It can manage how data is shared securely through a central control. Hyper Protect Data Controller can protect data wherever it goes—security policies are kept and honored whenever the data is accessed and future data access can be revoked even after data leaves the system of record. This IBM Redbooks® publication can assist you with determining how to get started with IBM Hyper Protect Data Controller through a use case approach. It will help you plan for, install, tailor and configure the Hyper Protect Data Controller. It includes information about the following topics: Concepts and reference architecture Common use cases with implementation guidance and advice Implementation and policy examples Typical operational tasks for creating policies and preparing for audits Monitoring user activity and events This IBM Redbooks publication is written for IT Managers, IT Architects, Security Administrators, data owners, and data consumers.

Installing and Configuring IBM Db2 AI for IBM z/OS v1.4.0

2022-01-04 O'Reilly Amazon

book

Tim Hogan , Janet Figone , Lydia Parziale , Guanjun Cai

data data-engineering relational-databases ibm-db2 AI/ML Cloud Computing

Artificial intelligence (AI) enables computers and machines to mimic the perception, learning, problem-solving, and decision-making capabilities of the human mind. AI development is made possible by the availability of large amounts of data and the corresponding development and wide availability of computer systems that can process all that data faster and more accurately than humans can. What happens if you infuse AI with a world-class database management system, such as IBM Db2®? IBM® has done just that with Db2 AI for z/OS (Db2ZAI). Db2ZAI is built to infuse AI and data science to assist businesses in the use of AI to develop applications more easily. With Db2ZAI, the following benefits are realized: Data science functionality Better built applications Improved database performance (and DBA's time and efforts are saved) through simplification and automation of error reporting and routine tasks Machine learning (ML) optimizer to improve query access paths and reduce the need for manual tuning and query optimization Integrated data access that makes data available from various vendors including private cloud providers. This IBM Redpaper® publication helps to simplify your installation by tailoring and configuration of Db2 AI for z/OS®. It was written for system programmers, system administrators, and database administrators.

SAP Enterprise Portfolio and Project Management: A Guide to Implement, Integrate, and Deploy EPPM Solutions

2022-01-03 O'Reilly Amazon

book

Joseph Alexander Soosaimuthu

data data-engineering SAP BI

Learn the fundamentals of SAP Enterprise Project and Portfolio management Project Systems (PS), Portfolio and Project Management (PPM) and Commercial Project Management (CPM) and their integration with other SAP modules. This book covers various business scenarios from different industries including the public sector, engineering and construction, professional services, telecom, mining, chemical, and pharmaceutical. Author Joseph Alexander Soosaimuthu will help you understand common business challenges and pain areas faced in portfolio, program and project management, and will provide suitable recommendations to overcome these challenges. This book not only suggests solutions within SAP, but also provides workarounds or integrations with third-party tools based on various Industry-specific business requirements. SAP Portfolio and Project Management addresses commonly asked questions regarding SAP EPPM implementation and deployment, and conveys a framework to facilitate engagement and discussion with key stakeholders. This provides coverage of SAP on-premise solutions with ECC 6.08 and SAP PPM 6.1 deployed on the same client, as well as S/4 HANA On-Premise 2020 with integration to BPC and BI/W systems. Interface with other third-party schedule management, estimation, costing and forecasting applications are also covered in this book. After completing SAP Portfolio and Project Management, you will be able to implement SAP Enterprise Portfolio and Project Management based on industry best practices. For your reference, you’ll also gain a list of development objects and a functionality list by Industry, and a Fiori apps list for Enterprise Portfolio and Project Management (EPPM). What You Will Learn Understand the fundamentals of project, program and portfolio management within SAP EPPM Master the art of project forecasting and scheduling integrations with other SAP modules Gainknowledge of the different interface options for scheduling, estimation, costing and forecasting third party applications Learn EPPM industry best practices, and how to address industry-specific business challenges Leverage operational and strategic reporting within EPPM Who This Book For Functional consultants and business analysts who are involved in SAP EPPM (PS, PPM and CPM) deployment and clients who are interested and are in the process of having SAP EPPM deployed for their Enterprise.

Numerical Methods Using Java: For Data Science, Analysis, and Engineering

2022-01-01 O'Reilly Amazon

book

Haksun Li, PhD

software-development programming-languages jvm-languages Java Data Science

Implement numerical algorithms in Java using NM Dev, an object-oriented and high-performance programming library for mathematics.You’ll see how it can help you easily create a solution for your complex engineering problem by quickly putting together classes. Numerical Methods Using Java covers a wide range of topics, including chapters on linear algebra, root finding, curve fitting, differentiation and integration, solving differential equations, random numbers and simulation, a whole suite of unconstrained and constrained optimization algorithms, statistics, regression and time series analysis. The mathematical concepts behind the algorithms are clearly explained, with plenty of code examples and illustrations to help even beginners get started. What You Will Learn Program in Java using a high-performance numerical library Learn the mathematics for a wide range of numerical computing algorithms Convert ideas and equations into code Put together algorithms and classes to build your own engineering solution Build solvers for industrial optimization problems Do data analysis using basic and advanced statistics Who This Book Is For Programmers, data scientists, and analysts with prior experience with programming in any language, especially Java.

Data Science in Engineering and Management

2021-12-30 O'Reilly Amazon

book

Sambit Kumar Mishra , Zdzislaw Polkowski , Julian Vasilev

data data-science Data Science

This book brings insight into Data Science and offers applications and implementation strategies. It includes recent developments and future trends and covers the concept of Data Science along with its origin. It focuses on the mechanisms of extracting data along with classifications, architectural concepts, and predictive analysis.

Data Engineering with AWS

2021-12-29 O'Reilly Amazon

book

Gareth Eagar

data data-engineering AI/ML Athena AWS Big Data

Discover how to effectively build and manage data engineering pipelines using AWS with "Data Engineering with AWS". In this hands-on book, you'll explore the foundational principles of data engineering, learn to architect data pipelines, and work with essential AWS services to process, transform, and analyze data. What this Book will help me do Understand and implement modern data engineering pipelines with AWS services. Gain proficiency in automating data ingestion and transformation using Amazon tools. Perform efficient data queries and analysis leveraging Amazon Athena and Redshift. Create insightful data visualizations using Amazon QuickSight. Apply machine learning techniques to enhance data engineering processes. Author(s) None Eagar, a Senior Data Architect with over twenty-five years of experience, specializes in modern data architectures and cloud solutions. With a rich background in applying data engineering to real-world problems, None Eagar shares expertise in a clear and approachable way for readers. Who is it for? This book is perfect for data engineers and data architects aiming to grow their expertise in AWS-based solutions. It's also geared towards beginners in data engineering wanting to adopt the best practices. Those with a basic understanding of big data and cloud platforms will find it particularly valuable, but prior AWS experience is not required.

Data Mesh in Practice

2021-12-25 O'Reilly Amazon

book

Arif Wider , Max Schultze

data data-engineering database-architecture data-mesh Analytics Data Quality

The data mesh is poised to replace data lakes and data warehouses as the dominant architectural pattern in data and analytics. By promoting the concept of domain-focused data products that go beyond file sharing, data mesh helps you deal with data quality at scale by establishing true data ownership. This approach is so new, however, that many misconceptions and a general lack of practical experience for implementing data mesh are widespread. With this report, you'll learn how to successfully overcome challenges in the adoption process. By drawing on their experience building large-scale data infrastructure, designing data architectures, and contributing to data strategies of large and successful corporations, authors Max Schultze and Arif Wider have identified the most common pain points along the data mesh journey. You'll examine the foundations of the data mesh paradigm and gain both technical and organizational insights. This report is ideal for companies just starting to work with data, for organizations already in the process of transforming their data infrastructure landscape, as well as for advanced companies working on federated governance setups for a sustainable data-driven future. This report covers: Data mesh principles and practical examples for getting started Typical challenges and solutions you'll encounter when implementing a data mesh Data mesh pillars including domain ownership, data as a product, and infrastructure as a platform How to move toward a decentralized data product and build a data infrastructure platform

Optimizing Databricks Workloads

2021-12-24 O'Reilly Amazon

book

Sarthak Sarbahi , Anirudh Kala , Anshul Bhatnagar

data data-engineering apache-spark Analytics Big Data Cloud Computing

Unlock the full potential of Apache Spark on the Databricks platform with "Optimizing Databricks Workloads". This book equips you with must-know techniques to effectively configure, manage, and optimize big data processing pipelines. Dive into real-world scenarios and learn practical approaches to reduce costs and improve performance in your data engineering processes. What this Book will help me do Understand and apply optimization techniques for Databricks workloads. Choose the right cluster configurations to maximize efficiency and minimize costs. Leverage Delta Lake for performance-boosted data processing and optimization. Develop skills for managing Spark DataFrames and core functionalities in Databricks. Gain insights into real-world scenarios to effectively improve workload performance. Author(s) Anirudh Kala and the co-authors are experienced practitioners in the fields of data engineering and analytics. With years of professional expertise in leveraging Apache Spark and Databricks, they bring real-world insight into performance optimization. Their approach blends practical instruction with actionable strategies, making this book an essential guide for data engineers aiming to excel in this domain. Who is it for? This book is tailored for data engineers, data scientists, and cloud architects looking to elevate their skills in managing Databricks workloads. Ideal for readers with basic knowledge of Spark and Databricks, it helps them get hands-on with optimization techniques. If you are aiming to enhance your Spark-based data processing systems, this book offers the guidance you need.

Securing IBM Spectrum Scale with QRadar and IBM Cloud Pak for Security

2021-12-20 O'Reilly Amazon

book

IBM

data data-engineering IBM Cloud Computing Cyber Security

Cyberattacks are likely to remain a significant risk for the foreseeable future. Attacks on organizations can be external and internal. Investing in technology and processes to prevent these cyberattacks is the highest priority for these organizations. Organizations need well-designed procedures and processes to recover from attacks. The focus of this document is to demonstrate how the IBM® Unified Data Foundation (UDF) infrastructure plays an important role in delivering the persistence storage (PV) to containerized applications, such as IBM Cloud® Pak for Security (CP4S), with IBM Spectrum® Scale Container Native Storage Access (CNSA) that is deployed with IBM Spectrum scale CSI driver and IBM FlashSystem® storage with IBM Block storage driver with CSI driver. Also demonstrated is how this UDF infrastructure can be used as a preferred storage class to create back-end persistent storage for CP4S deployments. We also highlight how the file I/O events are captured in IBM QRadar® and offenses are generated based on predefined rules. After the offenses are generated, we show how the cases are automatically generated in IBM Cloud Pak® for Security by using the IBM QRadar SOAR Plugin, with a manually automated method to log a case in IBM Cloud Pak for Security. This document also describes the processes that are required for the configuration and integration of the components in this solution, such as: Integration of IBM Spectrum Scale with QRadar QRadar integration with IBM Cloud Pak for Security Integration of the IBM QRadar SOAR Plugin to generate automated cases in CP4S. Finally, this document shows the use of IBM Spectrum Scale CNSA and IBM FlashSystem storage that uses IBM block CSI driver to provision persistent volumes for CP4S deployment. All models of IBM FlashSystem family are supported by this document, including: FlashSystem 9100 and 9200 FlashSystem 7200 and FlashSystem 5000 models FlashSystem 5200 IBM SAN Volume Controller All storage that is running IBM Spectrum Virtualize software

IBM DS8900F Performance Best Practices and Monitoring

2021-12-17 O'Reilly Amazon

book

Lisa Martinez , Peter Kimmel , Rick Pekosh , Ali Rizvi , Ewerson Palacio , Luiz Fernando Moreira , Sherri Brunson , Paul Smith

data data-engineering IBM

This IBM® Redbooks® publication is intended for individuals who want to maximize the performance of their DS8900 storage systems and investigate the planning and monitoring tools that are available.

Access For Dummies

2021-12-14 O'Reilly Amazon

book

Laurie A. Ulrich , Ken Cook

data data-engineering database-management-tools microsoft-access Data Management Data Science

Become a database boss —and have fun doing it—with this accessible and easy-to-follow guide to Microsoft Access Databases hold the key to organizing and accessing all your data in one convenient place. And you don’t have to be a data science wizard to build, populate, and organize your own. With Microsoft Access For Dummies, you’ll learn to use the latest version of Microsoft’s Access software to power your database needs. Need to understand the essentials before diving in? Check out our Basic Training in Part 1 where we teach you how to navigate the Access workspace and explore the foundations of databases. Ready for more advanced tutorials? Skip right to the sections on Data Management, Queries, or Reporting where we walk you through Access’s more sophisticated capabilities. Not sure if you have Access via Office 2021 or Office 365? No worries – this book covers Access now matter how you access it. The book also shows you how to: Handle the most common problems that Access users encounter Import, export, and automatically edit data to populate your next database Write powerful and accurate queries to find exactly what you’re looking for, exactly when you need it Microsoft Access For Dummies is the perfect resource for anyone expected to understand, use, or administer Access databases at the workplace, classroom, or any other data-driven destination.

Snowflake Essentials: Getting Started with Big Data in the Cloud

2021-12-14 O'Reilly Amazon

book

Bjorn Lindstrom , Frank Bell , Ruchi Soni , Sameer Videkar , Bhaskar B. Joshi , Raj Chirumamilla

data data-engineering Snowflake Analytics Big Data Cloud Computing

Understand the essentials of the Snowflake Database and the overall Snowflake Data Cloud. This book covers how Snowflake’s architecture is different from prior on-premises and cloud databases. The authors also discuss, from an insider perspective, how Snowflake grew so fast to become the largest software IPO of all time. Snowflake was the first database made specifically to be optimized with a cloud architecture. This book helps you get started using Snowflake by first understanding its architecture and what separates it from other database platforms you may have used. You will learn about setting up users and accounts, and then creating database objects. You will know how to load data into Snowflake and query and analyze that data, including unstructured data such as data in XML and JSON formats. You will also learn about Snowflake’s compute platform and the different data sharing options that are available. What YouWill Learn Run analytics in the Snowflake Data Cloud Create users and roles in Snowflake Set up security in Snowflake Set up resource monitors in Snowflake Set up and optimize Snowflake Compute Load, unload, and query structured and unstructured data (JSON, XML) within Snowflake Use Snowflake Data Sharing to share data Set up a Snowflake Data Exchange Use the Snowflake Data Marketplace Who This Book Is For Database professionals or information technology professionals who want to move beyond traditional database technologies by learning Snowflake, a new and massively scalable cloud-based database solution

Apache Pulsar in Action

2021-12-13 O'Reilly Amazon

book

David Kjerrumgaard

data data-engineering apache-pulsar Analytics Cloud Computing IoT

Deliver lightning fast and reliable messaging for your distributed applications with the flexible and resilient Apache Pulsar platform. In Apache Pulsar in Action you will learn how to: Publish from Apache Pulsar into third-party data repositories and platforms Design and develop Apache Pulsar functions Perform interactive SQL queries against data stored in Apache Pulsar Apache Pulsar in Action is a comprehensive and practical guide to building high-traffic applications with Pulsar. You’ll learn to use this mature and battle-tested platform to deliver extreme levels of speed and durability to your messaging. Apache Pulsar committer David Kjerrumgaard teaches you to apply Pulsar’s seamless scalability through hands-on case studies, including IOT analytics applications and a microservices app based on Pulsar functions. About the Technology Reliable server-to-server messaging is the heart of a distributed application. Apache Pulsar is a flexible real-time messaging platform built to run on Kubernetes and deliver the scalability and resilience required for cloud-based systems. Pulsar supports both streaming and message queuing, and unlike other solutions, it can communicate over multiple protocols including MQTT, AMQP, and Kafka’s binary protocol. About the Book Apache Pulsar in Action teaches you to build scalable streaming messaging systems using Pulsar. You’ll start with a rapid introduction to enterprise messaging and discover the unique benefits of Pulsar. Following crystal-clear explanations and engaging examples, you’ll use the Pulsar Functions framework to develop a microservices-based application. Real-world case studies illustrate how to implement the most important messaging design patterns. What's Inside Publish from Pulsar into third-party data repositories and platforms Design and develop Apache Pulsar functions Create an event-driven food delivery application About the Reader Written for experienced Java developers. No prior knowledge of Pulsar required. About the Author David Kjerrumgaard is a committer on the Apache Pulsar project. He currently serves as a Developer Advocate for StreamNative, where he develops Pulsar best practices and solutions. Quotes Apache Pulsar in Action is able to seamlessly mix the theory and abstract concepts with the clarity of practical step-by-step examples. I’d recommend to anyone! - Matteo Merli, co-creator of Apache Pulsar Gives readers insights into how the ‘magic’ works… Definitely recommended. - Henry Saputra, Splunk A complete, practical, fun-filled book. - Satej Kumar Sahu, Honeywell A definitive guide that will help you scale your applications. - Alessandro Campeis, Vimar The best book to start working with Pulsar. - Emanuele Piccinelli, Empirix

Machine Learning with PySpark: With Natural Language Processing and Recommender Systems

2021-12-08 O'Reilly Amazon

book

Pramod Singh

data data-engineering apache-spark PySpark AI/ML Airflow

Master the new features in PySpark 3.1 to develop data-driven, intelligent applications. This updated edition covers topics ranging from building scalable machine learning models, to natural language processing, to recommender systems. Machine Learning with PySpark, Second Edition begins with the fundamentals of Apache Spark, including the latest updates to the framework. Next, you will learn the full spectrum of traditional machine learning algorithm implementations, along with natural language processing and recommender systems. You’ll gain familiarity with the critical process of selecting machine learning algorithms, data ingestion, and data processing to solve business problems. You’ll see a demonstration of how to build supervised machine learning models such as linear regression, logistic regression, decision trees, and random forests. You’ll also learn how to automate the steps using Spark pipelines, followed by unsupervised models such as K-means and hierarchical clustering. A section on Natural Language Processing (NLP) covers text processing, text mining, and embeddings for classification. This new edition also introduces Koalas in Spark and how to automate data workflow using Airflow and PySpark’s latest ML library. After completing this book, you will understand how to use PySpark’s machine learning library to build and train various machine learning models, along with related components such as data ingestion, processing and visualization to develop data-driven intelligent applications What you will learn: Build a spectrum of supervised and unsupervised machine learning algorithms Use PySpark's machine learning library to implement machine learning and recommender systems Leverage the new features in PySpark’s machine learning library Understand data processing using Koalas in Spark Handle issues around feature engineering, class balance, bias andvariance, and cross validation to build optimally fit models Who This Book Is For Data science and machine learning professionals.

Mastering Apache Pulsar

2021-12-06 O'Reilly Amazon

book

Jowanza Joseph

data data-engineering apache-pulsar Flink API Big Data

Every enterprise application creates data, including log messages, metrics, user activity, and outgoing messages. Learning how to move these items is almost as important as the data itself. If you're an application architect, developer, or production engineer new to Apache Pulsar, this practical guide shows you how to use this open source event streaming platform to handle real-time data feeds. Jowanza Joseph, staff software engineer at Finicity, explains how to deploy production Pulsar clusters, write reliable event streaming applications, and build scalable real-time data pipelines with this platform. Through detailed examples, you'll learn Pulsar's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the load manager, and the storage layer. This book helps you: Understand how event streaming fits in the big data ecosystem Explore Pulsar producers, consumers, and readers for writing and reading events Build scalable data pipelines by connecting Pulsar with external systems Simplify event-streaming application building with Pulsar Functions Manage Pulsar to perform monitoring, tuning, and maintenance tasks Use Pulsar's operational measurements to secure a production cluster Process event streams using Flink and query event streams using Presto

Cloud-Native Microservices with Apache Pulsar: Build Distributed Messaging Microservices

2021-12-04 O'Reilly Amazon

book

Rahul Sharma , Mohammad Atyab

data data-engineering apache-pulsar AWS AWS Lambda Cloud Computing

Apply different enterprise integration and processing strategies available with Pulsar, Apache's multi-tenant, high-performance, cloud-native messaging and streaming platform. This book is a comprehensive guide that examines using Pulsar Java libraries to build distributed applications with message-driven architecture. You'll begin with an introduction to Apache Pulsar architecture. The first few chapters build a foundation of message-driven architecture. Next, you'll perform a setup of all the required Pulsar components. The book also covers work with Apache Pulsar client library to build producers and consumers for the discussed patterns. You'll then explore the transformation, filter, resiliency, and tracing capabilities available with Pulsar. Moving forward, the book will discuss best practices when building message schemas and demonstrate integration patterns using microservices. Security is an important aspect of any application;the book will cover authentication and authorization in Apache Pulsar such as Transport Layer Security (TLS), OAuth 2.0, and JSON Web Token (JWT). The final chapters will cover Apache Pulsar deployment in Kubernetes. You'll build microservices and serverless components such as AWS Lambda integrated with Apache Pulsar on Kubernetes. After completing the book, you'll be able to comfortably work with the large set of out-of-the-box integration options offered by Apache Pulsar. What You'll Learn Examine the important Apache Pulsar components Build applications using Apache Pulsar client libraries Use Apache Pulsar effectively with microservices Deploy Apache Pulsar to the cloud Who This Book Is For Cloud architects and software developers who build systems in the cloud-native technologies.

Efficient MySQL Performance

2021-11-30 O'Reilly Amazon

book

Daniel Nichter

data data-engineering relational-databases MySQL Cloud Computing SQL

You'll find several books on basic or advanced MySQL performance, but nothing in between. That's because explaining MySQL performance without addressing its complexity is difficult. This practical book bridges the gap by teaching software engineers mid-level MySQL knowledge beyond the fundamentals, but well shy of deep-level internals required by database administrators (DBAs). Daniel Nichter shows you how to apply the best practices and techniques that directly affect MySQL performance. You'll learn how to improve performance by analyzing query execution, indexing for common SQL clauses and table joins, optimizing data access, and understanding the most important MySQL metrics. You'll also discover how replication, transactions, row locking, and the cloud influenceMySQL performance. Understand why query response time is the North Star of MySQL performance Learn query metrics in detail, including aggregation, reporting, and analysis See how to index effectively for common SQL clauses and table joins Explore the most important server metrics and what they reveal about performance Dive into transactions and row locking to gain deep, actionable insight Achieve remarkable MySQL performance at any scale

IBM Supply Chain Transformation

2021-11-30 O'Reilly Amazon

book

Galen Smith

data data-engineering IBM

In the midst of global disruptions, every element of IBM® Supply Chain has been affected. the IBM cognitive supply chain is positioned to win the future by using the exponential technologies that are inherent to our supply chains, and with flexibility, resiliency, and end-to-end visibility. The constant commitment of IBM to building smarter supply chains over the past decade has primed IBM to quickly and effectively navigate these disruptions and course-correct by using cognitive innovation. As a result, IBM Supply Chain teams were able to deliver exceptional outcomes without client disruption. In addition, this widespread impact inspired numerous new solutions that include exponential technologies that better prepare IBM for future disruptions in constantly changing markets.

Innovative SAP SuccessFactors Recruiting: A Guide to Creating Custom Integration and Automation

2021-11-30 O'Reilly Amazon

book

Anand ‘Andy’ Athanur , Mark Ingram , Michael A. Wellens

data data-engineering SAP API

Get creative and optimize your SAP SuccessFactors Recruiting implementation with this guide, which examines a variety of integration and automation opportunities throughout the recruiting process outside of the standard integrations. Innovative SAP SuccessFactors Recruiting walks you through the end-to-end recruiting process and highlights opportunities to create interfaces and automation at each stage using a variety of methods and tools. After a brief overview of the market demands driving growth in this area and an introduction to OData, Anand Athanur, Mark Ingram and Michael A. Wellens detail each step in the recruiting process, starting with automating and integrating requisition creation using APIs and middleware. They then explore ways of enhancing candidate attraction and experience for the initial application process. After that, they jump into automation for overall candidate selection and processing, including automation using Robotic Process Automation, Integration center, the assessment integration framework, custom OData integrations, the background check integration framework, and Business Rules. Additionally, you’ll be shown onboarding optimization techniques using Intelligent Services, as well as hiring into third-party HRIS systems. After finishing this book, you will have a thorough understanding of how to utilize SAP SuccessFactors to recruit the right candidates for every position. What You Will Learn Integrate and automate the requisition creation process in innovative ways outside of SAP documentation Enhance candidate attraction and experience Leverage integration and automation opportunities within the application processing stage Automate hiring into third-party HRIS systems Who this Book For Customers, Consultants, and 3rd Party Vendors wishing to connect their solutions to SAP SuccessFactors Recruiting.

High Performance MySQL, 4th Edition

2021-11-18 O'Reilly Amazon

book

Jeremy Tinley , Silvia Botros

data data-engineering relational-databases MySQL Cloud Computing Cyber Security

How can you realize MySQL's full power? With High Performance MySQL, you'll learn advanced techniques for everything from setting service-level objectives to designing schemas, indexes, and queries to tuning your server, operating system, and hardware to achieve your platform's full potential. This guide also teaches database administrators safe and practical ways to scale applications through replication, load balancing, high availability, and failover. Updated to reflect recent advances in cloud- and self-hosted MySQL, InnoDB performance, and new features and tools, this revised edition helps you design a relational data platform that will scale with your business. You'll learn best practices for database security along with hard-earned lessons in both performance and database stability. Dive into MySQL's architecture, including key facts about its storage engines Learn how server configuration works with your hardware and deployment choices Make query performance part of your software delivery process Examine enhancements to MySQL's replication and high availability Compare different MySQL offerings in managed cloud environments Explore MySQL's full stack optimization from application-side configuration to server tuning Turn traditional database management tasks into automated processes

Storage as a Service Offering Guide

2021-11-16 O'Reilly Amazon

book

Vasfi Gucer , Hartmut Lonzer , Markus Standau

data data-engineering storage-repositories cloud-storage Cloud Computing Cloud Storage

IBM® Storage as a Service (STaaS) extends your hybrid cloud experience with a new flexible consumption model enabled for both your on-premises and hybrid cloud infrastructure needs, giving you the agility, cash flow efficiency, and services of cloud storage with the flexibility to dynamically scale up or down and only pay for what you use beyond the minimal capacity. This IBM Redpaper provides a detailed introduction to the IBM STaaS service. The paper is targeted for data center managers and storage administrators.

IBM HyperSwap and Multi-site HA/DR for IBM FlashSystem A9000 and A9000R

2021-11-15 O'Reilly Amazon

book

Lisa Martinez , Vadim Steckler , Francesco Anderloni , Stephen Solewin , Andrew Greenfield , Bert Dufrasne , Roger Eriksson

data data-engineering IBM

IBM® HyperSwap® is the high availability (HA) solution that provides continuous data availability in case of hardware failure, power failure, connectivity failure, or disasters. The HyperSwap capability is available for IBM FlashSystem® A9000 and IBM FlashSystem A9000R, starting with software version 12.2.1. Version 12.3 introduces a function that combines HyperSwap and Asynchronous replication, which creates a solution that entails HA and Disaster Recovery (DR). One side of the HyperSwap pair has an active async link to the third system, and the other side has a standby link. Known as Multi-site HA/DR, this configuration provides HyperSwap active-active HA while keeping data mirrored to a third copy to ensure two levels of business continuity. This IBM Redpaper™ publication gives a broad understanding of the architecture, design, and implementation of HyperSwap and Multi-site HA/DR solution. It also discusses and illustrates various use cases pertaining to their use and functionality. This paper is intended for those users who want to deploy solutions that take advantage of HyperSwap and Multi-site HA/DR for FlashSystem A9000 and A9000R.

Expert Oracle Database Architecture: Techniques and Solutions for High Performance and Productivity

2021-11-12 O'Reilly Amazon

book

Darl Kuhn , Thomas Kyte

data data-engineering oracle-database-solutions Oracle

Now in its fourth edition and covering Oracle Database 21c, this best-selling book continues to bring you some of the best thinking on how to apply Oracle Database to produce scalable applications that perform well and deliver correct results. Tom Kyte and Darl Kuhn share a simple philosophy: "you can treat Oracle as a black box and just stick data into it, or you can understand how it works and exploit it as a powerful computing environment." If you choose the latter, then you’ll find that there are few information management problems that you cannot solve quickly and elegantly. This fully revised fourth edition covers the developments and new features up to Oracle Database 21c. Up-to-date features are covered for tables, indexes, data types, sequences, partitioning, data loading, temporary tables, and more. All the examples are demonstrated using modern techniques and are executed in container and pluggable databases. The book’s proof-by-example approach encourages you to let evidence be your guide. Try something. See the result. Understand why the result is what it is. Apply your newfound knowledge with confidence. The book covers features by explaining how each one works, how to implement software using it, and the common pitfalls associated with it. Don’t treat Oracle Database as a black box. Get this book. Dive deeply into Oracle Database’s most powerful features that many do not invest the time to learn about. Set yourself apart from your competition and turbo-charge your career. What You Will Learn Identify and effectively resolve application performance issues and bottlenecks Architect systems to leverage the full power and feature set of Oracle’s database engine Configure a database to maximize the use of memory structures and background processes Understand internal locking and latching technology and how it impacts your system Proactively recommend best practices around performance for table and index structures Take advantage of advanced features such as table partitioning and parallel execution Who This Book Is For Oracle developers and Oracle DBAs. If you’re a developer and want a stronger understanding of Oracle features and architecture that will enable your applications to scale regardless of the workload, this book is for you. If you’re a DBA and want to intelligently work with developers to design applications that effectively leverage Oracle technology, then look no further.

Kafka: The Definitive Guide, 2nd Edition

2021-11-08 O'Reilly Amazon

book

Krit Petty , Rajini Sivaram , Gwen Shapira , Todd Palino

data data-engineering streaming-messaging Kafka API Cyber Security

Every enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the data itself. With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover Kafka's AdminClient API, transactions, new security features, and tooling changes. Engineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. You'll examine: Best practices for deploying and configuring Kafka Kafka producers and consumers for writing and reading messages Patterns and use-case requirements to ensure reliable data delivery Best practices for building data pipelines and applications with Kafka How to perform monitoring, tuning, and maintenance tasks with Kafka in production The most critical metrics among Kafka's operational measurements Kafka's delivery capabilities for stream processing systems

Essential PySpark for Scalable Data Analytics

2021-10-29 O'Reilly Amazon

book

Sreeram Nudurupati

data data-engineering apache-spark PySpark AI/ML Analytics

Dive into the world of scalable data processing with 'Essential PySpark for Scalable Data Analytics'. This book is a comprehensive guide that helps beginners understand and utilize PySpark to process, analyze, and draw insights from large datasets effectively. With hands-on tutorials and clear explanations, you will gain the confidence to tackle big data analytics challenges. What this Book will help me do Understand and apply the distributed computing paradigm for big data. Learn to perform scalable data ingestion, cleansing, and preparation using PySpark. Create and utilize data lakes and the Lakehouse paradigm for efficient data storage and access. Develop and deploy machine learning models with scalability in mind. Master real-time analytics pipelines and create impactful data visualizations. Author(s) None Nudurupati is an experienced data engineer and educator, specializing in distributed systems and big data technologies. With years of practical experience in the field, None brings a clear and approachable teaching style to technical topics. Passionate about empowering readers, the author has designed this book to be both practical and inspirational for aspiring data practitioners. Who is it for? This book is ideal for data professionals including data scientists, engineers, and analysts looking to scale their data analytics processes. It assumes familiarity with basic data science concepts and Python, as well as some experience with SQL-like data analysis. This is particularly suitable for individuals aiming to expand their knowledge in distributed computing and PySpark to handle big data challenges. Achieving scalable and efficient data solutions is at the core of this guide.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Getting Started with IBM Hyper Protect Data Controller

Installing and Configuring IBM Db2 AI for IBM z/OS v1.4.0

SAP Enterprise Portfolio and Project Management: A Guide to Implement, Integrate, and Deploy EPPM Solutions

Numerical Methods Using Java: For Data Science, Analysis, and Engineering

Data Science in Engineering and Management

Data Engineering with AWS

Data Mesh in Practice

Optimizing Databricks Workloads

Securing IBM Spectrum Scale with QRadar and IBM Cloud Pak for Security

IBM DS8900F Performance Best Practices and Monitoring

Access For Dummies

Snowflake Essentials: Getting Started with Big Data in the Cloud

Apache Pulsar in Action

Machine Learning with PySpark: With Natural Language Processing and Recommender Systems

Mastering Apache Pulsar

Cloud-Native Microservices with Apache Pulsar: Build Distributed Messaging Microservices

Efficient MySQL Performance

IBM Supply Chain Transformation

Innovative SAP SuccessFactors Recruiting: A Guide to Creating Custom Integration and Automation

High Performance MySQL, 4th Edition

Storage as a Service Offering Guide

IBM HyperSwap and Multi-site HA/DR for IBM FlashSystem A9000 and A9000R

Expert Oracle Database Architecture: Techniques and Solutions for High Performance and Productivity

Kafka: The Definitive Guide, 2nd Edition

Essential PySpark for Scalable Data Analytics