talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3406

Collection of O'Reilly books on Data Engineering.

Filtering by: data ×

Sessions & talks

Showing 801–825 of 3406 · Newest first

Search within this event →
IBM Spectrum Archive Enterprise Edition V1.2.6 Installation and Configuration Guide

Note: This is a republication of IBM Spectrum Archive Enterprise Edition V1.2.6: Installation and Configuration Guide with new book number SG24-8445 to keep the content available on the Internet along with the recent publication IBM Spectrum Archive Enterprise Edition V1.3.0: Installation and Configuration Guide, SG24-8333. This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive V1.2.6 for the IBM TS3310, IBM TS3500, IBM TS4300, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale™ based environment. It helps encourage the use of tape as a critical tier in the storage environment. This is the sixth edition of IBM Spectrum Archive Installation and Configuration Guide. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 8, 7, 6, and 5 tape drives in IBM TS3310, TS3500, TS4300, and TS4500 tape libraries. In addition, IBM TS1155, TS1150, and TS1140 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

IBM TS7700 Release 4.2 Guide

This IBM® Redbooks® publication covers IBM TS7700 R4.2. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on over 20 years of virtual tape experience, the TS7760 now supports the ability to store virtual tape volumes in an object store. The TS7700 has supported off loading to physical tape for over two decades. Off loading to physical tape behind a TS7700 is utilized by hundreds of organizations around the world. Using the same hierarchical storage techniques, the TS7700 can also off load to object storage. Given object storage is cloud based and accessible from different regions, the TS7760 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of the release of this document, the TS7760C supports the ability to off load to IBM Cloud Object Storage as well as Amazon S3. To learn about the TS7760 cloud storage tier function, planning, implementation, best practices, and support see IBM Redpaper IBM TS7760 R4.2 Cloud Storage Tier Guide, redp-5514 at: http://www.redbooks.ibm.com/abstracts/redp5514.html The IBM TS7700 offers a modular, scalable, and high-performance architecture for mainframe tape virtualization for the IBM Z® environment. It is a fully integrated, tiered storage hierarchy of disk and tape. This storage hierarchy is managed by robust storage management microcode with extensive self-management capability. It includes the following advanced functions: Improved reliability and resiliency Reduction in the time that is needed for the backup and restore process Reduction of services downtime that is caused by physical tape drive and library outages Reduction in cost, time, and complexity by moving primary workloads to virtual tape More efficient procedures for managing daily backup and restore processing Infrastructure simplification through reduction of the number of physical tape libraries, drives, and media TS7700 delivers the following new capabilities: TS7760C supports the ability to off load to IBM Cloud Object Storage as well as Amazon S3 8-way Grid Cloud consisting of any generation of TS7700 Synchronous and asynchronous replication Tight integration with IBM Z and DFSMS policy management Optional Transparent Cloud Tiering Optional integration with physical tape Cumulative 16Gb FICON throughput up to 4.8GB/s 8 IBM Z hosts view up to 496 8 equivalent devices Grid access to all data independent of where it exists The TS7760T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1150 and IBM TS1140 tape drives installed in an IBM TS4500 or TS3500 tape library. The TS7760 models are based on high-performance and redundant IBM POWER8® technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM DS8880 Encryption for data at rest and Transparent Cloud Tiering (DS8000 Release 8.5)

-update for Release 8.5 - IBM experts recognize the need for data protection, both from hardware or software failures, and also from physical relocation of hardware, theft, and retasking of existing hardware. The IBM DS8880 supports encryption-capable hard disk drives (HDDs) and flash drives. These Full Disk Encryption (FDE) drive sets are used with key management services that are provided by IBM Security Key Lifecycle Manager software or Gemalto SafeNet KeySecure to allow encryption for data at rest on a DS8880. Use of encryption technology involves several considerations that are critical for you to understand to maintain the security and accessibility of encrypted data. The IBM Security Key Lifecycle Manager software also supports Transparent Cloud Tiering (TCT) data object encryption, which is part of this publication. With TCT encryption, data is encrypted before it is transmitted to the Cloud. The data remains encrypted in cloud storage and is decrypted after it is transmitted back to the DS8000®. This IBM Redpaper™ publication contains information that can help storage administrators plan for disk and TCT data object encryption. It also explains how to install and manage the encrypted storage and how to comply with IBM requirements for using the IBM DS8000 encrypted disk storage system. This edition focuses on IBM Security Key Lifecycle Manager Version 3.0 which enables support Key Management Interoperability Protocol (KMIP) with the DS8000 Release 8.5 code or later and updated GUI for encryption functions. The publication also discusses support for data at rest encryption with Gemalto SafeNet KeySecure Version 8.3.2.

IBM Storage Solutions for IBM Cloud Private Blueprint

IBM Storage Solutions for IBM Cloud™ Private delivers a blueprint for multicloud architecture. IBM, delivering solutions to help you win. In this blueprint, learn how to: Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today. Deliver optimized private cloud services ahead of schedule and under budget with a complete IBM Cloud Private stack. Containerize applications and deliver the SLAs that your team needs to thrive and win. Implement IBM Cloud Private to deploy modern applications like blockchain and AI or modernize what you already have. You now have the capabilities. This edition applies to IBM Storage Solutions for IBM Cloud Private Version 1 Release 5.0.

IBM Hyper-Scale Manager for IBM Spectrum Accelerate Family: IBM XIV, IBM FlashSystem A9000 and A9000R, and IBM Spectrum Accelerate

This IBM® Redbooks® publication describes storage management functions and their configuration and use with the IBM Hyper-Scale Manager management graphical user interface (GUI) for IBM XIV® Gen3, IBM FlashSystem® A9000 and A9000R, and IBM Spectrum™ Accelerate software. The web-based GUI provides a revolutionary object-centered interface design that is aimed toward ease of use together with enhanced efficiency for storage administrators. The first chapter describes general features of the GUI and installation of the IBM Hyper-Scale Manager server. Subsequent chapters illustrate some typical GUI actions, among many other possibilities, to manage and configure the storage systems, to define security roles, and to set up multitenancy. For most of the GUI-based actions that are illustrated in this book, the corresponding XIV Storage System command-line interface (XCLI) commands are also shown. This edition applies to IBM Hyper-Scale Manager V5.4. IBM Hyper-Scale Manager based GUI information regarding host attachment and replication is covered in IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Storage System: Host Attachment and Interoperability, SG24-8368 and IBM FlashSystem A9000 and A9000R Business Continuity Solutions, REDP-5401. See also IBM HyperSwap and Multi-site HA/DR for IBM FlashSystem A9000 and A9000R, REDP-5434.

Implementing IBM FlashSystem 900 Model AE3

Today's global organizations depend on being able to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900 Model AE3 that is powered by IBM FlashCore® technology, they can make faster decisions that are based on real-time insights. They also can unleash the power of the most demanding applications, including online transaction processing (OLTP) and analytics databases, virtual desktop infrastructures (VDIs), technical computing applications, and cloud environments. This IBM Redbooks® publication introduces clients to the IBM FlashSystem® 900 Model AE3. It provides in-depth knowledge of the product architecture, software and hardware, implementation, and hints and tips. Also presented are use cases that show real-world solutions for tiering, flash-only, and preferred-read. Examples of the benefits that are gained by integrating the FlashSystem storage into business environments also are described. This book is intended for pre-sales and post-sales technical support professionals and storage administrators, and anyone who wants to understand how to implement this new and exciting technology.

Stream Processing with Apache Flink

Get started with Apache Flink, the open source framework that powers some of the world’s largest stream processing applications. With this practical book, you’ll explore the fundamental concepts of parallel stream processing and discover how this technology differs from traditional batch data processing. Longtime Apache Flink committers Fabian Hueske and Vasia Kalavri show you how to implement scalable streaming applications with Flink’s DataStream API and continuously run and maintain these applications in operational environments. Stream processing is ideal for many use cases, including low-latency ETL, streaming analytics, and real-time dashboards as well as fraud detection, anomaly detection, and alerting. You can process continuous data of any kind, including user interactions, financial transactions, and IoT data, as soon as you generate them. Learn concepts and challenges of distributed stateful stream processing Explore Flink’s system architecture, including its event-time processing mode and fault-tolerance model Understand the fundamentals and building blocks of the DataStream API, including its time-based and statefuloperators Read data from and write data to external systems with exactly-once consistency Deploy and configure Flink clusters Operate continuously running streaming applications

Social-Behavioral Modeling for Complex Systems

This volume describes frontiers in social-behavioral modeling for contexts as diverse as national security, health, and on-line social gaming. Recent scientific and technological advances have created exciting opportunities for such improvements. However, the book also identifies crucial scientific, ethical, and cultural challenges to be met if social-behavioral modeling is to achieve its potential. Doing so will require new methods, data sources, and technology. The volume discusses these, including those needed to achieve and maintain high standards of ethics and privacy. The result should be a new generation of modeling that will advance science and, separately, aid decision-making on major social and security-related subjects despite the myriad uncertainties and complexities of social phenomena. Intended to be relatively comprehensive in scope, the volume balances theory-driven, data-driven, and hybrid approaches. The latter may be rapidly iterative, as when artificial-intelligence methods are coupled with theory-driven insights to build models that are sound, comprehensible and usable in new situations. With the intent of being a milestone document that sketches a research agenda for the next decade, the volume draws on the wisdom, ideas and suggestions of many noted researchers who draw in turn from anthropology, communications, complexity science, computer science, defense planning, economics, engineering, health systems, medicine, neuroscience, physics, political science, psychology, public policy and sociology. In brief, the volume discusses: Cutting-edge challenges and opportunities in modeling for social and behavioral science Special requirements for achieving high standards of privacy and ethics New approaches for developing theory while exploiting both empirical and computational data Issues of reproducibility, communication, explanation, and validation Special requirements for models intended to inform decision making about complex social systems

Getting Started with Linux on Z Encryption for Data At-Rest

This IBM® Redbooks® publication provides a general explanation of data protection through encryption and IBM Z® pervasive encryption with a focus on Linux on IBM Z encryption for data at-rest. It also describes how the various hardware and software components interact in a Linux on Z encryption environment for . In addition, this book concentrates on the planning and preparing of the environment. It offers implementation, configuration, and operational examples that can be used in Linux on Z volume encryption environments. This publication is intended for IT architects, system administrators, and security administrators who plan for, deploy, and manage security on the Z platform. The reader is expected to have a basic understanding of IBM Z security concepts.

IBM FlashCore Module Cryptographic Erase

IBM® FlashCore Modules (FCMs) are storage devices that are available in 4.8 TB, 9.6 TB, and 19.2 TB capacities. They are a 2.5-inch drive form factor device and use second-generation 3D triple-level cell (TLC) flash memory on which to store data. This paper describes the cryptographic erasure of data that is stored on these devices when used in an IBM FlashSystem® 9100 (9846-AF7, 9846-AF8, 9848-AF7, 9848-AF8, 9848-UF7, and 9848-UF8), or IBM Storwize® V5100 (2077-424, 2077-A4F, 2078-424, and 2078-A4F).

Data-at-rest Encryption for the IBM Spectrum Accelerate Family

With the ever-growing landscape of national, state, and local regulations, industry requirements, and increased security threats, ensuring the protection of an organization's information is a key part of operating a successful business. Encrypting data-at-rest is a key element when addressing these concerns. Most storage products offer encryption at an additional cost. The IBM® Spectrum Accelerate family, which includes IBM XIV® Storage System, IBM FlashSystem® A9000, IBM FlashSystem A9000R system(s), and IBM Spectrum™ Accelerate Software provides data-at-rest encryption at no charge. Clients can take advantage of encryption and still benefit from the lower total cost of ownership (TCO) that the IBM Spectrum Accelerate™ family offers. For IBM FlashSystem A9000 and A9000R, clients now have a choice between an external key manager-based implementation or a local key based encryption implementation. The local key solution offers a simplified deployment of data-at-rest encryption. This IBM Redpaper™ publication explains the architecture and design of the XIV and IBM FlashSystem A9000 and A9000R encryption solutions. Details are provided for configuring and implementing both solutions.

IBM DS8880 and IBM Z Synergy

IBM® Z has a close and unique relationship to its storage. Over the years, improvements to the Z processors and storage software, the disk storage systems, and their communication architecture consistently reinforced this synergy. This IBM Redpaper™ Redbooks publication summarizes and highlights the various aspects, advanced functions, and technologies that are often pioneered by IBM, and that make the IBM Z® and the IBM DS8880 products an ideal combination. This paper is intended for those users who have some familiarity with IBM Z and the IBM DS8000® series and want a condensed but comprehensive overview of the synergy items up to the IBM z14™ server and the IBM DS8880 Release 8.51 firmware.

My Online Privacy for Seniors, First Edition

My Online Privacy for Seniors is an exceptionally easy and complete guide to protecting your privacy while you take advantage of the extraordinary resources available to you through the Internet and your mobile devices. It approaches every topic from a senior’s point of view, using meaningful examples, step-by-step tasks, large text, close-up screen shots, and a custom full-color interior designed for comfortable reading. Full-color, step-by-step tasks–in legible print–walk you through how to keep your personal information and content secure on computers and mobile devices. Learn how to: Strengthen your web browser’s privacy in just a few steps Make it harder to track and target you with personalized ads Protect against dangerous fake emails and ransomware Securely bank and shop online Control who sees your Facebook or Instagram posts and photos you share Securely use cloud services for backups or shared projects Protect private data on your mobile device, even if it’s stolen Block most unwanted calls on your smartphone Improve your home’s Internet security quickly and inexpensively Get straight answers to online privacy questions–in steps that are simple to follow and easy to understand You don’t have to avoid today’s amazing digital world: you can enrich your life, deepen your connections, and still keep yourself safe.

IBM Spectrum Archive Enterprise Edition V1.3.0: Installation and Configuration Guide

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive Enterprise Edition v1.3.0 for the IBM TS3310, IBM TS3500, IBM TS4300, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale™ based environment. It helps encourage the use of tape as a critical tier in the storage environment. This is the seventh edition of IBM Spectrum Archive Installation and Configuration Guide. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 8, 7, 6, and 5 tape drives in IBM TS3310, TS3500, TS4300, and TS4500 tape libraries. In addition, IBM TS1160, TS1155, TS1150, and TS1140 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Mastering MongoDB 4.x - Second Edition

This book, Mastering MongoDB 4.x, provides an in-depth exploration of MongoDB's features and capabilities, empowering readers to create high-performance and fault-tolerant database solutions. Through practical examples and clear explanations, you will learn how to implement complex queries, optimize database performance, manage large-scale clusters, and ensure robust failover and backup strategies. What this Book will help me do Understand advanced querying techniques and best practices in data indexing and management. Effectively configure and monitor MongoDB instances for scalability and optimized performance. Master techniques for replication and sharding to support high-availability systems. Deploy MongoDB-based applications seamlessly across on-premise and cloud environments. Learn to integrate MongoDB with modern technologies like big data platforms, containers, and IoT applications. Author(s) Alex Giamas is a seasoned database administrator and developer with significant experience in working with both relational and non-relational databases. Having authored numerous articles and given lectures on MongoDB and other data management technologies, Alex brings practical insights to his writing. He emphasizes real-world applications with examples drawn from his extensive career. Who is it for? This book is designed for developers and database administrators already familiar with MongoDB and basic database concepts, who are looking to enhance their expertise for implementing advanced MongoDB solutions. It is also suitable for professionals aspiring to earn MongoDB certifications and expand their skills to manage large, high-performance database systems efficiently.

Hands-On Big Data Analytics with PySpark

Dive into the exciting world of big data analytics with 'Hands-On Big Data Analytics with PySpark'. This practical guide offers you the tools and knowledge to tackle massive datasets using PySpark. By exploring real-world examples, you'll learn to unleash the power of distributed systems to analyze and manipulate data at scale. What this Book will help me do Master using PySpark to handle large and complex datasets efficiently and effectively. Develop skills to optimize Spark programs using best practices like reducing shuffle operations. Learn to set up a PySpark environment, process data from platforms like HDFS, Hive, and S3. Enhance your data analytics capabilities by implementing powerful SQL queries and data visualizations. Understand testing and debugging techniques to build reliable, production-quality data pipelines. Author(s) Authored by Rudy Lai and Bartłomiej Potaczek, both seasoned data engineers and authors in the big data field. Rudy and Bartłomiej bring their extensive experience working with distributed systems and scalable data architectures into this book. Their approach is hands-on, focusing on real-world applications and best practices. Who is it for? This book is tailored for data scientists, engineers, and developers eager to advance their big data analytics capabilities. Whether you're new to big data or experienced with other analytics frameworks, this book will equip you with practical knowledge to utilize PySpark for scalable data solutions.

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globally

Economic globalization requires data to be available globally. With most data stored in file systems, solutions to make this data globally available become more important. Files that are in file systems can be protected or shared by replicating these files to another file system that is in a remote location. The remote location might be just around the corner or in a different country. Therefore, the techniques that are used to protect and share files must account for long distances and slow and unreliable wide area network (WAN) connections. IBM® Spectrum Scale is a scalable clustered file system that can be used to store all kinds of unstructured data. It provides open data access by way of Network File System (NFS); Server Message Block (SMB); POSIX Object Storage APIs, such as S3 and OpenStack Swift; and the Hadoop Distributed File System (HDFS) for accessing and sharing data. The IBM Aspera® file transfer solution (IBM Aspera Sync) provides predictable and reliable data transfer across large distance for small and large files. The combination of both can be used for global sharing and protection of data. This IBM Redpaper™ publication describes how IBM Aspera Sync can be used to protect and share data that is stored in IBM Spectrum™ Scale file systems across large distances of several hundred to thousands of miles. We also explain the integration of IBM Aspera Sync with IBM Spectrum Scale™ and differentiate it from solutions that are built into IBM Spectrum Scale for protection and sharing. We also describe different use cases for IBM Aspera Sync with IBM Spectrum Scale.

Oracle DBA Mentor: Succeeding as an Oracle Database Administrator

New Oracle database administrators can get off the ground running. This book helps you develop the ability to think on your feet and move focus in an instant from arcane syntax details to broad, corporate issues. Along the way, you will see how to create your first database and implement best practices to ensure a well-running database system. What makes Oracle DBA Mentor different is that it also teaches you how to obtain answers that are not found in this or other books. Focus is given to creating a test bed and running test cases to examine hypotheses and prove out solutions so you can be sure they work in production. Attention is given to navigating product documentation and networking in forums and social media to build your skills and a network to draw on when solving problems under pressure. There are chapters of step-by-step technical content as well as coverage of essential skills to succeed as a DBA no matter which database engine you administer. By the time you are done reading this book, you will have confidence to face many of the situations thrown in your direction. You will know where to go for the answers you don’t yet know that you need. You’ll be able to work and troubleshoot under pressure. You’ll know how to create a database, institute backup and recovery procedures, secure the database and its valuable corporate data, and acquire more knowledge as needed so you can run a database to meet the needs of your organization. What You'll Learn Install Oracle Database with best practices Implement backup and recovery procedures Understand the fundamentals of databases and data security Find answers to technical problems using Oracle documentation, Oracle Support, and other resources Patch and upgrade an Oracle database Who This Book Is For The novice database administrator who wants help getting off the ground with their DBA career, and in building the skills to let that career flourish in the long term. Mid-level DBAs will also find the book helpful as they try to grow their career to the next level. While the book is geared toward the Oracle platform, database administrators from other platforms can benefit from the soft skills covered in this book.

Mastering Geospatial Development with QGIS 3.x - Third Edition

This book, "Mastering Geospatial Development with QGIS 3.x", is your comprehensive guide to becoming skilled in QGIS, an open-source GIS software. Covering functionalities of QGIS 3.4 and 3.6, you will advance your knowledge in spatial data analysis, styling, and spatial database management through practical examples and in-depth discussions. What this Book will help me do Understand the latest features and updates in QGIS 3.6. Master spatial data styling for impactful geographic visualizations. Learn to create and manage spatial databases and GeoPackages. Automate workflows using QGIS's graphical modeler and Python scripting. Develop custom QGIS plugins to extend its capabilities. Author(s) This book is written by a team of GIS experts with extensive experience in spatial data analysis and QGIS. Authors include professionals with GISP credentials who have taught GIS at various levels. With their deep understanding of QGIS and practical teaching approach, they aim to make premium GIS knowledge accessible to all. Who is it for? The book is ideal for GIS professionals seeking to enhance their QGIS expertise. Beginners looking to establish a firm foundation in GIS and QGIS will also benefit. Developers interested in extending QGIS capabilities using Python will find invaluable guidance here. Whether for career growth, project management, or academic purposes, this book suits users aspiring to excel in geospatial development.

Data Lake Maturity Model

Data is changing everything. Many industries today are being fundamentally transformed through the accumulation and analysis of large quantities of data, stored in diversified but flexible repositories known as data lakes. Whether your company has just begun to think about big data or has already initiated a strategy for handling it, this practical ebook shows you how to plan a successful data lake migration. You’ll learn the value of data lakes, their structure, and the problems they attempt to solve. Using Zaloni’s data lake maturity model, you’ll then explore your organization’s readiness for putting a data lake into action. Do you have the tools and data architectures to support big data analysis? Are your people and processes prepared? The data lake maturity model will help you rate your organization’s readiness. This report includes: The structure and purpose of a data lake Descriptive, predictive, and prescriptive analytics Data lake curation, self-service, and the use of data lake zones How to rate your organization using the data lake maturity model A complete checklist to help you determine your strategic path forward

AI and Big Data on IBM Power Systems Servers

Abstract As big data becomes more ubiquitous, businesses are wondering how they can best leverage it to gain insight into their most important business questions. Using machine learning (ML) and deep learning (DL) in big data environments can identify historical patterns and build artificial intelligence (AI) models that can help businesses to improve customer experience, add services and offerings, identify new revenue streams or lines of business (LOBs), and optimize business or manufacturing operations. The power of AI for predictive analytics is being harnessed across all industries, so it is important that businesses familiarize themselves with all of the tools and techniques that are available for integration with their data lake environments. In this IBM® Redbooks® publication, we cover the best practices for deploying and integrating some of the best AI solutions on the market, including: IBM Watson Machine Learning Accelerator (see note for product naming) IBM Watson Studio Local IBM Power Systems™ IBM Spectrum™ Scale IBM Data Science Experience (IBM DSX) IBM Elastic Storage™ Server Hortonworks Data Platform (HDP) Hortonworks DataFlow (HDF) H2O Driverless AI We map out all the integrations that are possible with our different AI solutions and how they can integrate with your existing or new data lake. We also walk you through some of our client use cases and show you how some of the industry leaders are using Hortonworks, IBM PowerAI, and IBM Watson Studio Local to drive decision making. We also advise you on your deployment options, when to use a GPU, and why you should use the IBM Elastic Storage Server (IBM ESS) to improve storage management. Lastly, we describe how to integrate IBM Watson Machine Learning Accelerator and Hortonworks with or without IBM Watson Studio Local, how to access real-time data, and security. Note: IBM Watson Machine Learning Accelerator is the new product name for IBM PowerAI Enterprise. Note: Hortonworks merged with Cloudera in January 2019. The new company is called Cloudera. References to Hortonworks as a business entity in this publication are now referring to the merged company. Product names beginning with Hortonworks continue to be marketed and sold under their original names.

PROC SQL, 3rd Edition

PROC SQL: Beyond the Basics Using SAS®, Third Edition, is a step-by-step, example-driven guide that helps readers master the language of PROC SQL. Packed with analysis and examples illustrating an assortment of PROC SQL options, statements, and clauses, this book not only covers all the basics, but it also offers extensive guidance on complex topics such as set operators and correlated subqueries. Programmers at all levels will appreciate Kirk Lafler’s easy-to-follow examples, clear explanations, and handy tips to extend their knowledge of PROC SQL. This third edition explores new and powerful features in SAS® 9.4, including topics such as: IFC and IFN functions nearest neighbor processing the HAVING clause indexes It also features two completely new chapters on fuzzy matching and data-driven programming. Delving into the workings of PROC SQL with greater analysis and discussion, PROC SQL: Beyond the Basics Using SAS®, Third Edition, explores this powerful database language using discussion and numerous real-world examples.

PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes

Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. This book provides solutions to problems related to dataframes, data manipulation summarization, and exploratory analysis. You will improve your skills in graph data analysis using graphframes and see how to optimize your PySpark SQL code. PySpark SQL Recipes starts with recipes on creating dataframes from different types of data source, data aggregation and summarization, and exploratory data analysis using PySpark SQL. You’ll also discover how to solve problems in graph analysis using graphframes. On completing this book, you’ll have ready-made code for all your PySpark SQL tasks, including creating dataframes using data from different file formats as well as from SQL or NoSQL databases. What You Will Learn Understand PySpark SQL and its advanced features Use SQL and HiveQL with PySpark SQL Work with structured streaming Optimize PySpark SQL Master graphframes and graph processing Who This Book Is For Data scientists, Python programmers, and SQL programmers.

The Enterprise Big Data Lake

The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book. Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you’ll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries. Get a succinct introduction to data warehousing, big data, and data science Learn various paths enterprises take to build a data lake Explore how to build a self-service model and best practices for providing analysts access to the data Use different methods for architecting your data lake Discover ways to implement a data lake from experts in different industries

Mastering Ceph - Second Edition

Mastering Ceph is your comprehensive guide to understanding and deploying Ceph for scalable storage solutions. From planning and design to advanced disaster recovery practices, this book equips you with practical knowledge and hands-on techniques to harness the power of Ceph effectively. What this Book will help me do Design and deploy scalable Ceph clusters tailored to your needs. Optimize Ceph's performance with state-of-the-art tuning techniques. Implement effective disaster recovery strategies for robust storage systems. Extend Ceph's functionality with programming using Librados. Troubleshoot and maintain Ceph to ensure reliability and performance. Author(s) None Fisk is a recognized expert in storage infrastructure. With years of hands-on experience with Ceph and storage systems, None has been involved in numerous successful deployments and performance optimizations. Drawing from real-world scenarios, the author's insights make this guide invaluable for professionals. Who is it for? This book is tailored for storage administrators, cloud engineers, and system administrators aiming to enhance their expertise in storage technologies. Whether you're new to Ceph or looking to deepen your knowledge, the clear examples and practical advice make it a perfect pick.