data-engineering

IBM Power Systems for SAS Viya 3.5 Deployment Guide

2021-04-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Sandy Kao , Christopher Chung , Kurt Koehle , Reinaldo Tetsuo Katahira , Abhijit Mane , Adriano Almeida , Travis Siegfried , Taragopal Chattopadhyay , Harry Seifert , Pradyothan Jeedula , Beth L. Hoffman , Antonio Moreira de Oliveira Neto

AI/ML Analytics IBM SAS data

This IBM® Redbooks® publication provides options and best practices for deploying SAS Viya 3.5 on IBM POWER9™ servers. SAS Viya is a complex set of artificial intelligence (AI) and analytics solutions that require a properly planned infrastructure to meet the needs of the data scientists, business analysts, and application developers who use Viya capabilities in their daily work activities. Regardless of the user role, the underlying infrastructure matters to ensure performance expectations and service level agreement (SLA) requirements are met or exceeded. Although the general planning process is similar for deploying SAS Viya on any platform, key IBM POWER9 differentiators must be considered to ensure that an optimized infrastructure deployment is achieved. This guide provides useful information that is needed during the planning, sizing, ordering, installing, configuring, and tuning phases of your SAS Viya deployment on POWER9 processor-based servers. This book addresses topics for IT architects, IT specialists, developers, sellers, and anyone who wants to implement SAS Viya 3.5 on IBM POWER9 servers. Moreover, this publication provides documentation to transfer the how-to-skills to the technical teams, and solution guidance to the sales team. This book compliments the documentation that is available in IBM Knowledge Center and aligns with the educational materials that are provided by the IBM Systems Software Education (SSE).

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

2021-04-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anna Bailliekova , Henrietta Dombrovskaya , Boris Novikov

SQL data postgresql relational-databases

Write optimized queries. This book helps you write queries that perform fast and deliver results on time. You will learn that query optimization is not a dark art practiced by a small, secretive cabal of sorcerers. Any motivated professional can learn to write efficient queries from the get-go and capably optimize existing queries. You will learn to look at the process of writing a query from the database engine’s point of view, and know how to think like the database optimizer. The book begins with a discussion of what a performant system is and progresses to measuring performance and setting performance goals. It introduces different classes of queries and optimization techniques suitable to each, such as the use of indexes and specific join algorithms. You will learn to read and understand query execution plans along with techniques for influencing those plans for better performance. The book also covers advanced topics such as the use of functions and procedures, dynamic SQL, and generated queries. All of these techniques are then used together to produce performant applications, avoiding the pitfalls of object-relational mappers. What You Will Learn Identify optimization goals in OLTP and OLAP systems Read and understand PostgreSQL execution plans Distinguish between short queries and long queries Choose the right optimization technique for each query type Identify indexes that will improve query performance Optimize full table scans Avoid the pitfalls of object-relational mapping systems Optimize the entire application rather than just database queries Who This Book Is For IT professionals working in PostgreSQL who want to develop performant and scalable applications, anyone whosejob title contains the words “database developer” or “database administrator" or who is a backend developer charged with programming database calls, and system architects involved in the overall design of application systems running against a PostgreSQL database

SAP SuccessFactors Talent: Volume 1: A Complete Guide to Configuration, Administration, and Best Practices: Performance and Goals

2021-04-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Susan Traynor , Venki Krishnamoorthy , Michael A. Wellens

SAP Cyber Security XML data

Take an in-depth look at SAP SuccessFactors talent modules with this complete guide to configuration, administration, and best practices. This two-volume series follows a logical progression of SAP SuccessFactors modules that should be configured to complete a comprehensive talent management solution. The authors walk you through fully functional simple implementations in the primary chapters for each module before diving into advanced topics in subsequent chapters. In volume 1, we start with a brief introduction. The next two chapters jump into the Talent Profile and Job Profile Builder. These chapters lay the structures and data that will be utilized across the remaining chapters which detail each module. The following eight chapters walk you through building, administering, and using a goal plan in the Goal Management module as well as performance forms in the Performance Management module. The book also expands on performance topics with the 360form and continuous performance management in two additional chapters. We then dive into configuring the calibration tool and how to set up calibration sessions in the next two chapters before providing a brief conclusion. Within each topic, the book touches on the integration points with other modules as well as internationalization. The authors also provide recommendations and insights from real world experience. Having finished the book, you will have an understanding of what comprises a complete SAP SuccessFactors talent management solution and how to configure, administer, and use each module within it. You will: · Develop custom talent profile portlets · Integrate Job Profile Builder with SAP SuccessFactors talent modules · Set up security, group goals, and team goals in goals management with sample XML · Configure and launch performance forms including rating scales and route maps · Configure and administrate the calibration module and its best practices

The California Privacy Rights Act (CPRA) – An implementation and compliance guide

2021-04-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Preston Bukaty

data data-security-privacy data security & privacy

The California Privacy Rights Act (CPRA) – An implementation and compliance guide is essential reading. Not only does it serve as an introduction to the legislation, it also discusses the challenges a business may face when trying to achieve CPRA compliance.

IBM FlashSystem 7200 Product Guide

2021-04-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jon Herd

IBM data

This IBM® Redbooks® Product Guide publication describes the IBM FlashSystem® 7200 solution, which is a comprehensive, all-flash, and NVMe-enabled enterprise storage solution that delivers the full capabilities of IBM FlashCore® technology. In addition, it provides a rich set of software-defined storage (SDS) features, including data reduction and de-duplication, dynamic tiering, thin-provisioning, snapshots, cloning, replication, data copy services, and IBM HyperSwap® for high availability (HA). Scale-out and scale-up configurations further enhance capacity and throughput for better availability

IBM FlashSystem 9200 Product Guide

2021-04-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jon Herd

IBM data

This IBM® Redbooks® Product Guide publication describes the IBM FlashSystem® 9200 solution, which is a comprehensive, all-flash, and NVMe-enabled enterprise storage solution that delivers the full capabilities of IBM FlashCore® technology. In addition, it provides a rich set of software-defined storage (SDS) features, including data reduction and de-duplication, dynamic tiering, thin-provisioning, snapshots, cloning, replication, data copy services, and IBM HyperSwap® for high availability (HA). Scale-out and scale-up configurations further enhance capacity and throughput for better availability.

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

2021-04-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ed Elliott

AI/ML API Big Data Hive Linux Microsoft Python Scala Spark SQL Data Streaming apache-spark +1 more

Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. This book is an introduction to both Apache Spark and the .NET bindings. Readers new to Apache Spark will get up to speed quickly using Spark for data processing tasks performed against large and very large datasets. You will learn how to combine your knowledge of .NET with Apache Spark to bring massive computing power to bear by distributed processing of extremely large datasets across multiple servers. This book covers how to get a local instance of Apache Spark running on your developer machine and shows you how to create your first .NET program that uses the Microsoft .NET bindings for Apache Spark. Techniques shown in the book allow you to use Apache Spark to distribute your data processing tasks over multiple compute nodes. You will learn to process data using both batch mode and streaming mode so you can make the right choice depending on whether you are processing an existing dataset or are working against new records in micro-batches as they arrive. The goal of the book is leave you comfortable in bringing the power of Apache Spark to your favorite .NET language. What You Will Learn Install and configure Spark .NET on Windows, Linux, and macOS Write Apache Spark programs in C# and F# using the .NET bindings Access and invoke the Apache Spark APIs from .NET with the same high performance as Python, Scala, and R Encapsulate functionality in user-defined functions Transform and aggregate large datasets Execute SQL queries against files through Apache Hive Distribute processing of large datasets across multiple servers Create your own batch, streaming, and machine learning programs Who This Book Is For .NETdevelopers who want to perform big data processing without having to migrate to Python, Scala, or R; and Apache Spark developers who want to run natively on .NET and take advantage of the C# and F# ecosystems

Azure Data Engineering Cookbook

2021-04-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nagaraj Venkatesan , Ahmad Osama

Analytics Azure ADF Cloud Computing Data Engineering Databricks ETL/ELT Microsoft Synapse data

Dive into the world of data engineering with 'Azure Data Engineering Cookbook' to master building efficient ETL workflows using Microsoft Azure Data services. Whether you're working on batch processing solutions or real-time analytics, this book is your guide to implementing effective, scalable data operations. What this Book will help me do Design and implement efficient ETL pipelines for batch and real-time processing on MS Azure. Understand the use of Azure Blob storage for managing large data sets. Ingest, process, and analyze data using tools like Azure Synapse and Databricks. Develop and secure automation pipelines using Azure Data Factory. Leverage Azure Stream Analytics for real-time data processing workflows. Author(s) Ahmad Osama and Nagaraj Venkatesan bring years of expertise in cloud solutions and data engineering. Renowned for their practical teaching approach, they have helped countless professionals master the intricacies of Azure. Their focus is on equipping readers with actionable skills for real-world data challenges. Who is it for? This book is ideal for data engineers and database professionals aiming to hone their expertise in advanced Azure data engineering tasks. Readers should have a working knowledge of Azure fundamentals and basic data engineering concepts. If you're a technical architect or ETL developer seeking to transition or enhance your skills in Azure's ecosystem, you'll find immense value here.

R2DBC Revealed: Reactive Relational Database Connectivity for Java and JVM Programmers

2021-04-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert Hedgpeth

API Java MariaDB Oracle RDBMS SQL data relational-databases

Understand the newest trend in database programming for developers working in Java, Kotlin, Clojure, and other JVM-based languages. This book introduces Reactive Relational Database Connectivity (R2DBC), a modern way of connecting to and querying relational databases from Java and other JVM languages. The book begins by helping you understand not only what reactive programming is, but why it is necessary. Then building on those fundamentals, the book takes you into the world of databases and the newly released Reactive Relational Database Connectivity (R2DBC) specification. Examples in the book are worked using the freely available MariaDB database along with MariaDB’s vendor-implementation of the R2DBC service-provider interface (SPI). Following along with the examples and the provided example code helps prepare you to work with any of the growing number of R2DBC implementations for popular enterprise databases such as Oracle Database and SQL Server. You’ll be well prepared for what is becoming the future of database access from Java and other languages built on the JVM. What You Will Learn Understand why R2DBC was created and how it utilizes the Reactive Streams API Understand the components of the R2DBC service-provider interface Create and manage reactive database connections and connection pools using an R2DBC client Programmatically execute queries on a relational database using an R2DBC client Effectively utilize transactions using an R2DBC client Build relational database-driven applications that are event-driven and non-blocking Who This Book Is For Software developers building solutions using JVM languages and the JVM ecosystem, and developers who need an introduction to the R2DBC specification and reactive programming with relational databases and want to understand what Reactive Relational Database Connectivity is and why it came about. This book includes practical examples of using the R2DBC specification with Java and MariaDB that will provide developers with the knowledge they need to create their own solutions.

MongoDB Performance Tuning: Optimizing MongoDB Databases and their Applications

2021-04-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Harrison , Guy Harrison

MongoDB data nosql-databases

Use this fast and complete guide to optimize the performance of MongoDB databases and the applications that depend on them. You will be able to turbo-charge the performance of your MongoDB applications to provide a better experience for your users, reduce your running costs, and avoid application growing pains. MongoDB is the world’s most popular document database and the foundation for thousands of mission-critical applications. This book helps you get the best possible performance from MongoDB. MongoDB Performance Tuning takes a methodical and comprehensive approach to performance tuning that begins with application and schema design and goes on to cover optimization of code at all levels of an application. The book also explains how to configure MongoDB hardware and cluster configuration for optimal performance. The systematic approach in the book helps you treat the true causes of performance issues and get the best return on your tuninginvestment. Even when you’re under pressure and don’t know where to begin, simply follow the method in this book to set things right and get your MongoDB performance back on track. What You Will Learn Apply a methodical approach to MongoDB performance tuning Understand how to design an efficient MongoDB application Optimize MongoDB document design and indexing strategies Tune MongoDB queries, aggregation pipelines, and transactions Optimize MongoDB server resources: CPU, memory, disk Configure MongoDB Replica sets and Sharded clusters for optimal performance Who This Book Is For Developers and administrators of high-performance MongoDB applications who want to be sure they are getting the best possible performance from their MongoDB system. For developers who wish to create applications that are fast, scalable,and cost-effective. For administrators who want to optimize their MongoDB server and hardware configuration.

High Performant File System Workloads for AI and HPC on AWS using IBM Spectrum Scale

2021-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sanjay Sudam

AI/ML AWS Amazon EC2 Cloud Computing ELK IBM Linux data

This IBM® Redpaper® publication is intended to facilitate the deployment and configuration of the IBM Spectrum® Scale based high-performance storage solutions for the scalable data and AI solutions on Amazon Web Services (AWS). Configuration, testing results, and tuning guidelines for running the IBM Spectrum Scale based high-performance storage solutions for the data and AI workloads on AWS are the focus areas of the paper. The LAB Validation was conducted with the Red Hat Linux nodes to IBM Spectrum Scale by using the various Amazon Elastic Compute Cloud (EC2) instances. Simultaneous workloads are simulated across multiple Amazon EC2 nodes running with Red Hat Linux to determine scalability against the IBM Spectrum Scale clustered file system. Solution architecture, configuration details, and performance tuning demonstrate how to maximize data and AI application performance with IBM Spectrum Scale on AWS.

IBM Spectrum Protect Plus Practical Guidance for Deployment, Configuration, and Usage

2021-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chris Bode , Alberto Delgado Ramos , Daniel Wendler , Peter Minig , Markus Stumpf , Julien Sauvanet , Martin Stuber , Gerd Becker , Jozef Uríča , Axel Westphal , Joerg Walter , Bert Dufrasne , Mikael Lindström , Andre Gaschler

IBM data

IBM® Spectrum Protect Plus is a data protection solution that provides near-instant recovery, replication, retention management, and reuse for virtual machines, databases, and applications backups in hybrid multicloud environments. IBM Knowledge Center for IBM Spectrum® Protect Plus provides extensive documentation for installation, deployment, and usage. In addition, build and size an IBM Spectrum Protect Plus solution. The goal of this IBM Redpaper® publication is to summarize and complement the available information by providing useful hints and tips that are based on the authors' practical experience in installing and supporting IBM Spectrum Protect Plus in customer environments. Over time, our aim is to compile a set of best practices that cover all aspects of the product, from planning and installation to tuning, maintenance, and troubleshooting.

Effortless App Development with Oracle Visual Builder

2021-03-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ankur Jain (Acxiom)

API Cloud Computing JavaScript Oracle SaaS data oracle-database-solutions

In "Effortless App Development with Oracle Visual Builder," you will explore how to quickly design, develop, and deploy robust web and mobile applications using Oracle Visual Builder's intuitive drag-and-drop features. This book equips you with the know-how to simplify application development tasks, making it perfect for professionals looking to boost productivity. What this Book will help me do Master the core architecture and features of Oracle Visual Builder to develop real-world applications effectively. Learn to create, manage, and leverage business objects and connect to various SaaS APIs within your applications. Build scalable and secure web and mobile applications using practical examples and clear implementation guidelines. Discover best practices for application lifecycle management, debugging, and troubleshooting VB applications. Extend Oracle and non-Oracle SaaS applications through hands-on knowledge tailored to real-world scenarios. Author(s) None Jain is an experienced developer and technical writer specializing in Oracle Visual Builder and cloud-based application development. With years of hands-on experience building and deploying cloud applications, they bring expertise and a practical approach to education. Their engaging writing style focuses on enabling readers to learn and apply new skills confidently. Who is it for? This book is perfectly suited for developers, UI designers, and IT professionals who want to master Oracle Visual Builder for developing web and mobile applications. If you already have experience with technologies like JavaScript, UI frameworks, and REST APIs, and seek to create intuitive applications using a simplified interface, this book is for you. Whether you're in the early stages of learning VB or looking to refine your skills, this book serves as a valuable guide.

Automating the Modern Data Warehouse

2021-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Steve Swoyer

AI/ML Cloud Computing Data Governance Data Management DWH data data-warehouse storage-repositories

The opportunity to modernize and improve the enterprise data warehouse is one of the best reasons for moving your application to the cloud. A data warehouse can access a greater diversity of use cases and practices than is possible in an existing environment. In this report, researcher and analyst Stephen Swoyer offers a comprehensive overview of the benefits and challenges of implementing a cloud-based data warehouse. Senior IT decision makers, chief data officers, and data professionals will learn about the shifts and new trends in the data management landscape. Explore ways to improve data management, build a data warehouse strategy, and learn how to modernize a data warehouse effectively. Understand how AI, machine learning, self-service data integration, and built-in developer-oriented services have transformed the data warehouse role Use data warehouses to work with cloud-based data lakes for end-to-end data management and data governance Explore how data warehouse platforms as a service (PaaS) pave the way to automation Migrate, manage, and secure a data warehouse in a hybrid or multicloud environment

The Rise of the Knowledge Graph

2021-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ben Szekely , Dean Allemang , Sean Martin

AI/ML Fabric data

Businesses manage data to understand the connections between their customers, products or services, features, markets, and anything else that affects the business. With a knowledge graph, you can represent these connections directly to analyze and understand the compound relationships that drive business innovation. This report introduces knowledge graphs and examines their ability to weave business data and business knowledge into an architecture known as a data fabric . Authors Sean Martin, Ben Szekely, and Dean Allemang explain graph data and knowledge representation and demonstrate the value of combining these two things in a knowledge graph. You'll learn how knowledge graphs enable an enterprise-scale data fabric and discover what to expect in the near future as this technology evolves. This report also examines the evolution of databases, data integration, and data analysis to help you understand how the industry reached this point. Learn how graph technology enables you to represent knowledge and link it to data Understand how graph technology emphasizes the connected nature of data Use a data fabric to support other data-intensive tasks, including machine learning and data analysis Examine how a data fabric supports intense data-driven business initiatives more robustly than a simple database or data architecture

IBM TS7700 Series DS8000 Object Store User's Guide Version 2.0

2021-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lourie Goodall , Rin Fujiwara

Cloud Computing IBM data

The IBM® TS7700 features a functional enhancement that allows for the TS7700 to act as an object store for transparent cloud tiering with IBM DS8000® (DS8K), DFSMShsm (HSM), and native DFSMSdss (DSS). This function can be used to move data sets directly from DS8000 to TS7700. This IBM Redpaper publication describes the client value, and how DFSMS, DS8000, and TS7700 are set up to enable and use the function.

CDPSE Certified Data Privacy Solutions Engineer All-in-One Exam Guide

2021-03-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter H. Gregory

Cyber Security data data-security-privacy data security & privacy

This study guide offers 100% coverage of every objective for the Certified Data Privacy Solutions Engineer Exam This resource offers complete, up-to-date coverage of all the material included on the current release of the Certified Data Privacy Solutions Engineer exam. Written by an IT security and privacy expert, CDPSE Certified Data Privacy Solutions Engineer All-in-One Exam Guide covers the exam domains and associated job practices developed by ISACA®. You’ll find learning objectives at the beginning of each chapter, exam tips, practice exam questions, and in-depth explanations. Designed to help you pass the CDPSE exam, this comprehensive guide also serves as an essential on-the-job reference for new and established privacy and security professionals. COVERS ALL EXAM TOPICS, INCLUDING: Online content includes: Privacy Governance Governance Management Risk Management Privacy Architecture Infrastructure Applications and Software Technical Privacy Controls Data Cycle Data Purpose Data Persistence 300 practice exam questions Test engine that provides full-length practice exams and customizable quizzes by exam topic

Getting Started: Journey to Modernization with IBM Z

2021-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravinder Akula , Makenzie Manna , Pabitra Mukhopadhyay , Matthew Cousens , Anand Shukla

IBM data

Modernization of enterprise IT applications and infrastructure is key to the survival of organizations. It is no longer a matter of choice. The cost of missing out on business opportunities in an intensely competitive market can be enormous. To aid in their success, organizations are facing increased encouragement to embrace change. They are pushed to think of new and innovative ways to counter, or offer, a response to threats that are posed by competitors who are equally as aggressive in adopting newer methods and technologies. The term modernization often varies in meaning based on perspective. This IBM® Redbooks® publication focuses on the technological advancements that unlock computing environments that are hosted on IBM Z® to enable secure processing at the core of hybrid. This publication is intended for IT executives, IT managers, IT architects, System Programmers, and Application Developer professionals.

The Problems of Viewing Performance

2021-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Y. Bennett

Netezza data relational-databases

The Problems of Viewing Performance challenges long-held assumptions by considering the ways in which knowledge is received by more than a single audience member, and breaks new ground by, counterintuitively, claiming that viewing performance is not a shared experience.

LDAP Authentication for IBM DS8000 Systems: Updated for DS8000 Release 9.1

2021-03-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Claudio Di Celio , Bjoern Wesselbaum , Connie Riggins , Alex Warmuth , Robert Tondini , Bert Dufrasne

IBM data

The IBM® DS8000® series includes the option to replace the locally based user ID and password authentication with a centralized directory-based approach. This IBM Redpaper publication helps DS8000 storage administrators understand the concepts and benefits of a centralized directory. It provides the information that is required for implementing a DS8000 authentication mechanism that is based on the Lightweight Directory Access Protocol (LDAP). Starting with DS8000 Release 9.1 code, a simpler, native LDAP authentication method is supported along with the former implementation that relies on IBM Copy Services Manager (CSM) acting as a proxy between the DS8000 and external LDAP servers. Note that examples and operations shown in this Redpaper refer to the DS8000 R9.1 SP1, code release bundle 89.11.33.0.

talk-data.com

Activity Trend

Top Events

Top Speakers

IBM Power Systems for SAS Viya 3.5 Deployment Guide

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

SAP SuccessFactors Talent: Volume 1: A Complete Guide to Configuration, Administration, and Best Practices: Performance and Goals

The California Privacy Rights Act (CPRA) – An implementation and compliance guide

IBM FlashSystem 7200 Product Guide

IBM FlashSystem 9200 Product Guide

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

Azure Data Engineering Cookbook

R2DBC Revealed: Reactive Relational Database Connectivity for Java and JVM Programmers

MongoDB Performance Tuning: Optimizing MongoDB Databases and their Applications

High Performant File System Workloads for AI and HPC on AWS using IBM Spectrum Scale

IBM Spectrum Protect Plus Practical Guidance for Deployment, Configuration, and Usage

Effortless App Development with Oracle Visual Builder

Automating the Modern Data Warehouse

The Rise of the Knowledge Graph

IBM TS7700 Series DS8000 Object Store User's Guide Version 2.0

CDPSE Certified Data Privacy Solutions Engineer All-in-One Exam Guide

Getting Started: Journey to Modernization with IBM Z

The Problems of Viewing Performance

LDAP Authentication for IBM DS8000 Systems: Updated for DS8000 Release 9.1