talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3432

Collection of O'Reilly books on Data Engineering.

Sessions & talks

Showing 726–750 of 3432 · Newest first

Search within this event →
Mastering PostgreSQL 12 - Third Edition

Mastering PostgreSQL 12 delves into advanced features of PostgreSQL to help database professionals optimize, secure, and scale their database systems. Through practical examples, this book equips you with the necessary skills to address challenges in modern PostgreSQL environments. What this Book will help me do Gain expertise in PostgreSQL 12's advanced SQL functions and features. Master replication and backup techniques for scalable and fault-tolerant databases. Effectively optimize PostgreSQL queries and index utilization for performance gains. Enhance the security of PostgreSQL servers to ensure data integrity. Acquire hands-on experience in troubleshooting and resolving PostgreSQL-related issues. Author(s) Hans-Jürgen Schönig is a renowned database expert specializing in PostgreSQL. With years of experience in both database administration and development, he brings clarity to complex technical topics. His teaching approach emphasizes practical applications, making PostgreSQL's advanced features accessible for professionals. Who is it for? This book is ideal for PostgreSQL developers, administrators, and database professionals who have foundational knowledge and intend to enhance their expertise. Readers should be familiar with general database concepts and aim to master PostgreSQL's advanced functionalities. Whether you are handling enterprise environments or exploring data topology, this book serves as a vital resource.

Expert Performance Indexing in SQL Server 2019: Toward Faster Results and Lower Maintenance

Take a deep dive into perhaps the single most important facet of good performance: indexes, and how to best use them. Recent updates to SQL Server have made it possible to create indexes in situations that in the past would have prevented their use. Other improvements covered in this book include new dynamic management views, the ability to pause and resume index maintenance, and the ability to more easily recover from failures during index creation and maintenance operations. This new edition also brings new content around the indexing of columnstore and in-memory tables, showing how these new types of tables and the queries that execute against them can also benefit from good indexing practices. The book begins with explanations of the types of indexes and how they are stored in databases. Moving deeper into the topic, and further into the book, you will look at the statistics that are accumulated both by indexes and on indexes. You will better understand what indexes are doing in the database and what can be done to mitigate and improve their effect on performance. You will get a look at the Index Advisor now available in Azure SQL Database, and learn how to review and maintain the health of your indexes. The final chapters present a guided tour through a number of scenarios showing approaches you can take to investigate, mitigate, and improve the performance of your database. What You Will Learn Properly index row store, columnstore, and in-memory tables Review statistics to understand indexing choices made by the optimizer Apply indexing strategies such as covering indexes, included columns, and index intersections Recognize and remove unnecessary indexes Design effective indexes for full-text, spatial, and XML data types Manage the big picture: Encompass all indexes in adatabase, and all database instances on a server Who This Book Is For Database administrators and developers who are ready to lift the performance of their database environment by thoughtfully building indexes to speed up queries that matter the most and make a difference to the business

SQL Server Big Data Clusters: Early First Edition Based on Release Candidate 1

Get a head-start on learning one of SQL Server 2019’s latest and most impactful features—Big Data Clusters—that combines large volumes of non-relational data for analysis along with data stored relationally inside a SQL Server database. This book provides a first look at Big Data Clusters based upon SQL Server 2019 Release Candidate 1. Start now and get a jump on your competition in learning this important new feature. Big Data Clusters is a feature set covering data virtualization, distributed computing, and relational databases and provides a complete AI platform across the entire cluster environment. This book shows you how to deploy, manage, and use Big Data Clusters. For example, you will learn how to combine data stored on the HDFS file system together with data stored inside the SQL Server instances that make up the Big Data Cluster. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019 using Release Candidate 1. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will Learn Install, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it were relational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For For data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environment

IBM z14 Model ZR1 Configuration Setup

This IBM® Redbooks® publication helps you install, configure, and maintain the IBM z14® Model ZR1 (Machine Type 3907). The z14 ZR1 offers new functions that require a comprehensive understanding of the available configuration options. This book presents configuration setup scenarios and describes implementation examples in detail. This publication is intended for systems engineers, hardware planners, and anyone who needs to understand IBM Z® configuration and implementation. Readers should be generally familiar with current IBM Z technology and terminology. For more information about the functions of the z14 Model ZR1, see IBM z14 Model ZR1 Technical Introduction, SG24-8550, and IBM z14 Model ZR1 Technical Guide, SG24-8651.

Building Big Data Applications

Building Big Data Applications helps data managers and their organizations make the most of unstructured data with an existing data warehouse. It provides readers with what they need to know to make sense of how Big Data fits into the world of Data Warehousing. Readers will learn about infrastructure options and integration and come away with a solid understanding on how to leverage various architectures for integration. The book includes a wide range of use cases that will help data managers visualize reference architectures in the context of specific industries (healthcare, big oil, transportation, software, etc.). Explores various ways to leverage Big Data by effectively integrating it into the data warehouse Includes real-world case studies which clearly demonstrate Big Data technologies Provides insights on how to optimize current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

IBM DS8000 SafeGuarded Copy

This IBM® Redpaper™ publication explains the IBM DS8000 Safeguarded Copy functionality. With Safeguarded Copy, organizations have the ability to improve their cyber resiliency by frequently creating protected point-in-time backups of their critical data, with minimum impact and effective resource utilization. The paper introduces Safeguarded Copy and discusses the need for logical corruption protection (LCP) and information about regulatory requirements. It presents the general concepts of LCP, and then explore various use cases for recovery. The paper is intended for IT security architects, who plan and design an organization's cyber security strategy, as well as the infrastructure technical specialists who implement them.

T-SQL Window Functions: For data analysis and beyond, 2nd Edition

Use window functions to write simpler, better, more efficient T-SQL queries Most T-SQL developers recognize the value of window functions for data analysis calculations. But they can do far more, and recent optimizations make them even more powerful. In T-SQL Window Functions, renowned T-SQL expert Itzik Ben-Gan introduces breakthrough techniques for using them to handle many common T-SQL querying tasks with unprecedented elegance and power. Using extensive code examples, he guides you through window aggregate, ranking, distribution, offset, and ordered set functions. You'll find a detailed section on optimization, plus an extensive collection of business solutions — including novel techniques available in no other book. Microsoft MVP Itzik Ben-Gan shows how to: • Use window functions to improve queries you previously built with predicates • Master essential SQL windowing concepts, and efficiently design window functions • Effectively utilize partitioning, ordering, and framing • Gain practical in-depth insight into window aggregate, ranking, offset, and statistical functions • Understand how the SQL standard supports ordered set functions, and find working solutions for functions not yet available in the language • Preview advanced Row Pattern Recognition (RPR) data analysis techniques • Optimize window functions in SQL Server and Azure SQL Database, making the most of indexing, parallelism, and more • Discover a full library of window function solutions for common business problems About This Book • For developers, DBAs, data analysts, data scientists, BI professionals, and power users familiar with T-SQL queries • Addresses any edition of the SQL Server 2019 database engine or later, as well as Azure SQL Database Get all code samples at: MicrosoftPressStore.com/TSQLWindowFunctions/downloads

Monitoring and Managing the IBM Elastic Storage Server Using the GUI

The IBM® Elastic Storage Server GUI provides an easy way to configure and monitor various features that are available with the IBM ESS system. It is a web application that runs on common web browsers, such as Chrome, Firefox, and Edge. The ESS GUI uses Java Script and Ajax technologies to enable smooth and desktop-like interfacing. This IBM Redpaper publication provides a broad understanding of the architecture and features of the ESS GUI. It includes information about how to install and configure the GUI and in-depth information about the use of the GUI options. The primary audience for this paper includes experienced and new users of the ESS system.

Implementing the IBM Storwize V7000 with IBM Spectrum Virtualize V8.2.1

Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® introduces the IBM Storwize® V7000 solution powered by IBM Spectrum™ Virtualize. This innovative storage offering delivers essential storage efficiency technologies and exceptional ease of use and performance, all integrated into a compact, modular design that is offered at a competitive, midrange price. The IBM Storwize V7000 solution incorporates some of the top IBM technologies that are typically found only in enterprise-class storage systems, which raises the standard for storage efficiency in midrange disk systems. This cutting-edge storage system extends the comprehensive storage portfolio from IBM and can help change the way organizations address the ongoing information explosion. This IBM Redbooks® publication introduces the features and functions of the IBM Storwize V7000 and IBM Spectrum Virtualize™ V8.2.1 system through several examples. This book is aimed at pre-sales and post-sales technical support and marketing and storage administrators. It helps you understand the architecture of the Storwize V7000, how to implement it, and how to take advantage of its industry-leading functions and features.

SAP Landscape Management 3.0 and IBM Power Systems Servers

This IBM® Redpaper publication is part of a series of technical documentation to help the enablement of SAP on Linux for IBM Power Systems servers and IBM System Storage™ servers. This book describes how by using SAP Landscape Management (SAP LaMa) 3.0 software that clients gain full visibility and control over their SAP and non-SAP systems, including the underlying physical, virtual, and cloud infrastructures. With SAP LaMa, you can automate repetitive tasks to manage critical applications across complex, hybrid IT landscapes. This publication helps you to better control IT costs and increase business agility, for example, by freeing staff to focus on more strategic work rather than manual, error-prone tasks. The target audiences of this book are architects, IT specialists, and systems administrators deploying SAP LaMa 3.0 whom often spend much time and effort managing and provisioning SAP software systems and landscapes.

A Guide to JES3 to JES2 Migration

This IBM® Redbooks® publication provides information to help clients that have JES3 and want to migrate to JES2. It provides a comprehensive list of the differences between the two job entry subsystems and provides information to help you determine the migration effort and actions. This book considers the features of JES2 as available on releases of IBM z/OS® V2R3 and V2R4. It should be used with JES3 to JES2 Migration Considerations, SG24-8083. This publication is divided into three parts: Part 1, "Planning to migrate from JES3 to JES2" on page 1, gives you information to make the decision and plan your migration. Part 2, "Use case study" on page 111, provides a Use Case Study that is based on an actual customer experience in a successful migration. Part 3, "Appendixes" on page 193, provides an appendix with sample tools that can help the migration process and exploitation of some of the new JES2 functions. This book is aimed at operations personnel, system programmers, and application developers

Electronic Health Records with Epic and IBM FlashSystem 9100 Blueprint Version 2 Release 2

This information is intended to facilitate the deployment of IBM® FlashSystem for the Epic Corporation electronic health record (EHR) solution by describing the requirements and specifications for configuring IBM FlashSystem® 9100 and its parameters. The document also describes the steps that are required to configure the server that host the EHR application. To complete the tasks, you must have a working knowledge of IBM FlashSystem 9100 and Epic applications. The information in this document is distributed on an "as is" basis, without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM FlashSystem storage devices are supported and entitled and where the issues are not specific to a blueprint implementation.

Data Privacy and GDPR Handbook

The definitive guide for ensuring data privacy and GDPR compliance Privacy regulation is increasingly rigorous around the world and has become a serious concern for senior management of companies regardless of industry, size, scope, and geographic area. The Global Data Protection Regulation (GDPR) imposes complex, elaborate, and stringent requirements for any organization or individuals conducting business in the European Union (EU) and the European Economic Area (EEA)—while also addressing the export of personal data outside of the EU and EEA. This recently-enacted law allows the imposition of fines of up to 5% of global revenue for privacy and data protection violations. Despite the massive potential for steep fines and regulatory penalties, there is a distressing lack of awareness of the GDPR within the business community. A recent survey conducted in the UK suggests that only 40% of firms are even aware of the new law and their responsibilities to maintain compliance. The Data Privacy and GDPR Handbook helps organizations strictly adhere to data privacy laws in the EU, the USA, and governments around the world. This authoritative and comprehensive guide includes the history and foundation of data privacy, the framework for ensuring data privacy across major global jurisdictions, a detailed framework for complying with the GDPR, and perspectives on the future of data collection and privacy practices. Comply with the latest data privacy regulations in the EU, EEA, US, and others Avoid hefty fines, damage to your reputation, and losing your customers Keep pace with the latest privacy policies, guidelines, and legislation Understand the framework necessary to ensure data privacy today and gain insights on future privacy practices The Data Privacy and GDPR Handbook is an indispensable resource for Chief Data Officers, Chief Technology Officers, legal counsel, C-Level Executives, regulators and legislators, data privacy consultants, compliance officers, and audit managers.

EU General Data Protection Regulation (GDPR), third edition - An Implementation and Compliance Guide

EU GDPR – An Implementation and Compliance Guide is a perfect companion for anyone managing a GDPR compliance project. It explains the changes you need to make to your data protection and information security regimes and tells you exactly what you need to do to avoid severe financial penalties.

Oracle Database Application Security: With Oracle Internet Directory, Oracle Access Manager, and Oracle Identity Manager

Focus on the security aspects of designing, building, and maintaining a secure Oracle Database application. Starting with data encryption, you will learn to work with transparent data, back-up, and networks. You will then go through the key principles of audits, where you will get to know more about identity preservation, policies and fine-grained audits. Moving on to virtual private databases, you’ll set up and configure a VPD to work in concert with other security features in Oracle, followed by tips on managing configuration drift, profiles, and default users. Shifting focus to coding, you will take a look at secure coding standards, multi-schema database models, code-based access control, and SQL injection. Finally, you’ll cover single sign-on (SSO), and will be introduced to Oracle Internet Directory (OID), Oracle Access Manager (OAM), and Oracle Identity Management (OIM) by installing and configuring them to meet your needs. Oracle databases hold the majority of the world’s relational data, and are attractive targets for attackers seeking high-value targets for data theft. Compromise of a single Oracle Database can result in tens of millions of breached records costing millions in breach-mitigation activity. This book gets you ready to avoid that nightmare scenario. What You Will Learn Work with Oracle Internet Directory using the command-line and the console Integrate Oracle Access Manager with different applications Work with the Oracle Identity Manager console and connectors, while creating your own custom one Troubleshooting issues with OID, OAM, and OID Dive deep into file system and network security concepts Who This Book Is For Oracle DBAs and developers. Readers will need a basic understanding of Oracle RDBMS and Oracle Application Server to take complete advantage of this book.

Google BigQuery: The Definitive Guide

Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently. Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.

Elasticsearch 7 Quick Start Guide

Elasticsearch 7 Quick Start Guide introduces the core capabilities of Elasticsearch, one of the most powerful distributed search and analytics tools available. Through this concise and practical guide, you will learn how to install, configure, and effectively utilize Elasticsearch while exploring its powerful features, including real-time search and data aggregation. What this Book will help me do Install and configure Elasticsearch to create secure and scalable deployments. Understand and utilize analyzers, filters, and mappings to optimize search results. Perform data aggregations using advanced techniques in metric and bucket operations. Identify and troubleshoot common Elasticsearch performance issues for smooth operation. Leverage best practices to ensure effective deployment in production environments. Author(s) None Srivastava and None Miller are experienced writers and technologists who bring real-world expertise in search systems and analytics. With practical backgrounds in distributed systems and data management, the authors deliver a straightforward and hands-on approach in their writing. They aim to make Elasticsearch concepts approachable and practical for developers and administrators alike. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who are seeking to implement Elasticsearch within their projects. It is particularly suited for those with basic to intermediate technical experience and a need for robust search and analytics solutions. If you're aiming to learn the fundamentals and acquire practical skills in Elasticsearch 7, this book will serve as an excellent resource for you.

Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries

Become an expert who can use window functions to solve T-SQL query problems. Replace slow cursors and self-joins with queries that are easy to write and perform better. This new edition provides expanded examples, including a chapter from the world of sports, and covers the latest performance enhancements through SQL Server 2019. Window functions are useful in analytics and business intelligence reporting. They came into full blossom with SQL Server 2012, yet they are not as well known and used as often as they ought to be. This group of functions is one of the most notable developments in SQL, and this book shows how every developer and DBA can benefit from their expressive power in solving day-to-day business problems. Once you begin using window functions, such as ROW_NUMBER and LAG, you will discover many ways to use them. You will approach SQL Server queries in a different way, thinking about sets of data instead of individual rows. Your querieswill run faster, be easier to write, and easier to deconstruct, maintain, and enhance in the future. Just knowing and using these functions is not enough. You also need to understand how to tune the queries. Expert T-SQL Window Functions in SQL Server clearly explains how to get the best performance. The book also covers the rare cases when older techniques are the best bet. What You Will Learn Solve complex query problems without cumbersome self-joins that run slowly and are difficult to read Create sliding windows in a result set for computing such as running totals and moving averages Return aggregate and detail data simultaneously from the same SELECT statement Compute lag and lead and other values that access data from multiple rows in a result set Understand the OVER clause syntax and how to control the window Avoid framing errors that can lead to unexpected results Who This Book Is For Anyone who writes T-SQL queries, including database administrators, developers, business analysts, and data scientists. Before reading this book, you should understand how to join tables, write WHERE clauses, and build aggregate queries.

Pro SQL Server 2019 Administration: A Guide for the Modern DBA

Use this comprehensive guide for the SQL Server DBA, covering all that practicing database administrators need to know to get their daily work done. Updated for SQL Server 2019, this edition includes coverage of new features such as Memory-optimized TempDB Metadata, and Always Encrypted with Secure Enclaves. Other new content includes coverage of Query Store, resumable index operations, installation on Linux, and containerized SQL. Pro SQL Server 2019 Administration takes DBAs on a journey that begins with planning their SQL Server deployment and runs through installing and configuring the instance, administering and optimizing database objects, and ensuring that data is secure and highly available. Finally, readers will learn how to perform advanced maintenance and tuning techniques. This book teaches you to make the most of new SQL Server 2019 functionality, including Data Discovery and Classification. The bookpromotes best-practice installation, shows how to configure for scalability and high workloads, and demonstrates the gamut of database-level maintenance tasks such as index maintenance, database consistency checks, and table optimizations. What You Will Learn Install and configure SQL Server on Windows through the GUI and with PowerShell Install and configure SQL Server on Linux and in Containers Optimize tables through in-memory OLTP, table partitioning, and the creation of indexes Secure and encrypt data to protect against embarrassing data breaches Ensure 24x7x365 access through high-availability and disaster recovery features Back up your data to ensure against loss, and recover data when needed Perform routine maintenance tasks such as database consistency checks Troubleshoot and solve performance problems inSQL queries and in the database engine Who This Book Is For SQL Server DBAs who manage on-premise installations of SQL Server. This book is also useful for DBAs who wish to learn advanced features such as Query Store, Extended Events, Distributed Replay, and Policy-Based Management, or those who need to install SQL Server in a variety of environments.

SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

Get up to speed on the game-changing developments in SQL Server 2019. No longer just a database engine, SQL Server 2019 is cutting edge with support for machine learning (ML), big data analytics, Linux, containers, Kubernetes, Java, and data virtualization to Azure. This is not a book on traditional database administration for SQL Server. It focuses on all that is new for one of the most successful modernized data platforms in the industry. It is a book for data professionals who already know the fundamentals of SQL Server and want to up their game by building their skills in some of the hottest new areas in technology. SQL Server 2019 Revealed begins with a look at the project's team goal to integrate the world of big data with SQL Server into a major product release. The book then dives into the details of key new capabilities in SQL Server 2019 using a “learn by example” approach for Intelligent Performance, security, mission-criticalavailability, and features for the modern developer. Also covered are enhancements to SQL Server 2019 for Linux and gain a comprehensive look at SQL Server using containers and Kubernetes clusters. The book concludes by showing you how to virtualize your data access with Polybase to Oracle, MongoDB, Hadoop, and Azure, allowing you to reduce the need for expensive extract, transform, and load (ETL) applications. You will then learn how to take your knowledge of containers, Kubernetes, and Polybase to build a comprehensive solution called Big Data Clusters, which is a marquee feature of 2019. You will also learn how to gain access to Spark, SQL Server, and HDFS to build intelligence over your own data lake and deploy end-to-end machine learning applications. What You Will Learn Implement Big Data Clusters with SQL Server, Spark, and HDFS Create a Data Hub with connections to Oracle, Azure, Hadoop, and other sources Combine SQL and Spark to build a machine learning platform for AI applications Boost your performance with no application changes using Intelligent Performance Increase security of your SQL Server through Secure Enclaves and Data Classification Maximize database uptime through online indexing and Accelerated Database Recovery Build new modern applications with Graph, ML Services, and T-SQL Extensibility with Java Improve your ability to deploy SQL Server on Linux Gain in-depth knowledge to run SQL Server with containers and Kubernetes Know all the new database engine features for performance, usability, and diagnostics Use the latest tools and methods to migrate your database to SQL Server 2019 Apply your knowledge of SQL Server 2019 to Azure Who This Book Is For IT professionals and developers who understand the fundamentals of SQL Server and wish to focus on learning about the new, modern capabilities of SQL Server 2019. The book is for those who want to learn about SQL Server 2019 and the new Big Data Clusters and AI feature set, support for machine learning and Java, how to run SQL Server with containers and Kubernetes, and increased capabilities around Intelligent Performance, advanced security, and high availability.

Cognitive Computing Featuring the IBM Power System AC922

This IBM® Redpaper publication describes the advantages of using IBM Power System AC922 for cognitive solutions, and how it can enhance clients' businesses. In order to optimize the hardware and software, IBM partners with NVIDIA, Mellanox, H2O.ai, SQream, Kinetica, and other prominent companies to design the Power AC922 server, specifically enhanced for the cognitive era. Most of its outstanding hardware features, such as NVIDIA NVLink 2.0 and PCIe 4.0, are described in this publication to illustrate the advantages that clients can realize in comparison with IBM competitors. We also include a brief description about what cognitive computing is, and how to use IBM Watson® Machine Learning cognitive solutions to bring more value to your business ecosystem. Additionally, we show performance charts that show the advantages of using Power AC922 versus x86 competitors. In the last chapter, we describe the most remarkable use cases in which IBM solves real problems using cognitive solutions. This IBM Redpaper publication is aimed at IT technical audiences, especially decision-making levels that need a full look at the benefits and improvements that an IBM Cognitive Solution can offer. It also provides valuable information to data science professionals, enabling them to plan their modeling needs. Finally, it offers information to the infrastructure support group in charge of maintaining the solution.

IBM Spectrum Scale Erasure Code Edition: Planning and Implementation Guide

This IBM® Redpaper introduces the IBM Spectrum® Scale Erasure Code Edition (ECE) as a scalable, high-performance data and file management solution. ECE is designed to run on any commodity server that meets the ECE minimum hardware requirements. ECE provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale with the added benefit of network-dispersed IBM Spectrum Scale RAID, which provides data protection, storage efficiency, and the ability to manage storage in hyperscale environments that are composed from commodity hardware. In this publication, we explain the benefits of ECE and the use cases where we believe it fits best. We also provide a technical introduction to IBM Spectrum Scale RAID. Next, we explain the key aspects of planning an installation, provide an example of an installation scenario, and describe the key aspects of day-to-day management and a process for problem determination. We conclude with an overview of possible enhancements that are being considered for future versions of IBM Spectrum Scale Erasure Code Edition. Overall knowledge of IBM Spectrum Scale Erasure Code Edition is critical to planning a successful storage system deployment. This paper is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost effective storage solutions. The goal of this paper is to describe the benefits of using IBM Spectrum Scale Erasure Code Edition for the creation of high performing storage systems.

Mastering Spark with R

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems. Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users. Analyze, explore, transform, and visualize data in Apache Spark with R Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows Perform analysis and modeling across many machines using distributed computing techniques Use large-scale data from multiple sources and different formats with ease from within Spark Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions