talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3406

Collection of O'Reilly books on Data Engineering.

Filtering by: data ×

Sessions & talks

Showing 326–350 of 3406 · Newest first

Search within this event →
Serverless ETL and Analytics with AWS Glue

Discover how to harness AWS Glue for your ETL and data analysis workflows with "Serverless ETL and Analytics with AWS Glue." This comprehensive guide introduces readers to the capabilities of AWS Glue, from building data lakes to performing advanced ETL tasks, allowing you to create efficient, secure, and scalable data pipelines with serverless technology. What this Book will help me do Understand and utilize various AWS Glue features for data lake and ETL pipeline creation. Leverage AWS Glue Studio and DataBrew for intuitive data preparation workflows. Implement effective storage optimization techniques for enhanced data analytics. Apply robust data security measures, including encryption and access control, to protect data. Integrate AWS Glue with machine learning tools like SageMaker to build intelligent models. Author(s) The authors of this book include experts across the fields of data engineering and AWS technologies. With backgrounds in data analytics, software development, and cloud architecture, they bring a depth of practical experience. Their approach combines hands-on tutorials with conceptual clarity, ensuring a blend of foundational knowledge and actionable insights. Who is it for? This book is designed for ETL developers, data engineers, and data analysts who are familiar with data management concepts and want to extend their skills into serverless cloud solutions. If you're looking to master AWS Glue for building scalable and efficient ETL pipelines or are transitioning existing systems to the cloud, this book is ideal for you.

Building the Snowflake Data Cloud: Monetizing and Democratizing Your Data

Implement the Snowflake Data Cloud using best practices and reap the benefits of scalability and low-cost from the industry-leading, cloud-based, data warehousing platform. This book provides a detailed how-to explanation, and assumes familiarity with Snowflake core concepts and principles. It is a project-oriented book with a hands-on approach to designing, developing, and implementing your Data Cloud with security at the center. As you work through the examples, you will develop the skill, knowledge, and expertise to expand your capability by incorporating additional Snowflake features, tools, and techniques. Your Snowflake Data Cloud will be fit for purpose, extensible, and at the forefront of both Direct Share, Data Exchange, and Snowflake Marketplace. Building the Snowflake Data Cloud helps you transform your organization into monetizing the value locked up within your data. As the digital economy takes hold, with data volume, velocity, and variety growing at exponential rates, you need tools and techniques to quickly categorize, collate, summarize, and aggregate data. You also need the means to seamlessly distribute to release value. This book shows how Snowflake provides all these things and how to use them to your advantage. The book helps you succeed by delivering faster than you can deliver with legacy products and techniques. You will learn how to leverage what you already know, and what you don’t, all applied in a Snowflake Data Cloud context. After reading this book, you will discover and embrace the future where the Data Cloud is central. You will be able to position your organization to take advantage by identifying, adopting, and preparing your tooling for the coming wave of opportunity around sharing and monetizing valuable, corporate data. What You Will Learn Understand why Data Cloud is important tothe success of your organization Up-skill and adopt Snowflake, leveraging the benefits of cloud platforms Articulate the Snowflake Marketplace and identify opportunities to monetize data Identify tools and techniques to accelerate integration with Data Cloud Manage data consumption by monitoring and controlling access to datasets Develop data load and transform capabilities for use in future projects Who This Book Is For Solution architects seeking implementation patterns to integrate with a Data Cloud; data warehouse developers looking for tips, tools, and techniques to rapidly deliver data pipelines; sales managers who want to monetize their datasets and understand the opportunities that Data Cloud presents; and anyone who wishes to unlock value contained within their data silos

IBM Power Systems S922, S914, and S924 Technical Overview and Introduction

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power System S922 (9009-22A), IBM Power System S914 (9009-41A), and IBM Power System S924 (9009-42A) servers that support IBM AIX®, IBM i, and Linux operating systems. The objective of this paper is to introduce the major innovative Power S914, Power S922, and Power 924 offerings and their relevant functions: The new IBM POWER9™ processor, which is available at frequencies of 2.3 - 3.8 GHz, 2.8 - 3.8 GHz, 2.9 - 3.8 GHz, 3.4 - 3.9 GHz, 3.5 - 3.9 GHz, and 3.8 - 4.0 GHz. Significantly strengthened cores and larger caches. Two integrated memory controllers that double the memory footprint of IBM POWER8® servers. Integrated I/O subsystem and hot-pluggable Peripheral Component Interconnect Express (PCIe) Gen4 and Gen3 I/O slots. I/O drawer expansion options offer greater flexibility. Support for Coherent Accelerator Processor Interface (CAPI) 2.0. New IBM EnergyScale™ technology offers new variable processor frequency modes that provide a significant performance boost beyond the static nominal frequency. This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power S914, Power S922, and Power S924 systems. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

VSAM Demystified

Virtual Storage Access Method (VSAM) is one of the access methods used to process data. Many of us have used VSAM and work with VSAM data sets daily, but exactly how it works and why we use it instead of another access method is a mystery. This book helps to demystify VSAM and gives you the information necessary to understand, evaluate, and use VSAM properly. This book also builds upon the subject of Record Level Sharing and DFSMStvs. It clarifies VSAM functions for application programmers who work with VSAM. The practical, straightforward approach should dispel much of the complexity associated with VSAM. Wherever possible an example is used to reinforce a description of a VSAM function. This IBM® Redbooks® publication is intended as a supplement to existing product manuals. It is intended to be used as an initial point of reference for VSAM functions.

Vertical Growth

Learn the secrets to self-awareness, life-changing growth and happy, high-performing teams—from the bestselling author of The Mindful Leader Great leaders and teams don’t know everything, and they don’t get it right every time. What sets them apart is their commitment to continual learning and vertical growth. Vertical growth is about cultivating the self-awareness to see our self-defeating thoughts, assumptions and behaviours, and then consciously creating new behaviours that are aligned with our best intentions and aspirations. By embracing the deliberate practices and processes for vertical growth laid out in this book, you’ll not only radically improve your leadership and personal wellbeing—you’ll also foster the highest levels of trust, psychological safety, motivation, and creativity in the teams and groups you work with. You’ll to discover how to: Identify when, where and how to develop new leadership behaviours to get better results Regulate your emotional responses in real time and handle the most difficult challenges with balance, wisdom and accountability Cultivate practices for self-awareness that foster lifelong internal growth and personal happiness Uncover and change the limiting assumptions and beliefs that keep you, your team and organisation locked in unproductive habits and behaviours Create practices and rituals that enable the highest levels of psychological safety, innovation and growth Filled with fascinating real-life case studies as well as practical tools and strategies, this is your handbook for mastering vertical growth in yourself, your team and your organisation.

Python for Data Analysis, 3rd Edition

Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.10 and pandas 1.4, the third edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You'll learn the latest versions of pandas, NumPy, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It's ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the Jupyter notebook and IPython shell for exploratory computing Learn basic and advanced features in NumPy Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Analyze and manipulate regular and irregular time series data Learn how to solve real-world data analysis problems with thorough, detailed examples

Snowflake: The Definitive Guide

Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you. You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily. You'll be able to: Efficiently capture, store, and process large amounts of data at an amazing speed Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace

Building a Red Hat OpenShift Environment on IBM Z

Cybersecurity is the most important arm of defense against cyberattacks. With the recent increase in cyberattacks, corporations must focus on how they are combating these new high-tech threats. When establishing best practices, a corporation must focus on employees' access to specific workspaces and information. IBM Z® focuses on allowing high processing virtual environments while maintaining a high level of security in each workspace. Organizations not only need to adjust their approach to security, but also their approach to IT environments. To meet new customer needs and expectations, organizations must take a more agile approach to their business. IBM® Z allows companies to work with hybrid and multi-cloud environments that allows more ease of use for the user and efficiency overall. Working with IBM Z, organizations can also work with many databases that are included in IBM Cloud Pak® for Data. IBM Cloud Pak for Data allows organizations to make more informed decisions with improved data usage. Along with the improved data usage, organizations can see the effects from working in a Red Hat OpenShift environment. Red Hat OpenShift is compatible across many hardware services and allows the user to run applications in the most efficient manner. The purpose of this IBM Redbooks® publication is to: Introduce IBM Z and LinuxONE platforms and how they work with the Red Hat OpenShift environment and IBMCloud Pak for Data Provide examples and the uses of IBM Z with Cloud Paks for Data that show data gravity, consistent development experience, and consolidation and business resiliency The target audience for this book is IBM Z Technical Specialists, IT Architects, and System Administrators.

Proactive EarlyThreat Detection and Securing Oracle Database with IBM QRadar, IBM Security Guardium Data Protection, and IBM Copy Services Manager by using IBM FlashSystem Safeguarded Copy

This IBM® blueprint publication focuses on early threat detection within a database environment by using IBM Security Guardium® Data Protection and IBM QRadar®. It also highlights how to proactively start a cyber resilience workflow in response to a cyberattack or potential malicious user actions. The workflow that is presented here uses IBM Copy Services Manager as orchestration software to start IBM FlashSystem® Safeguarded Copy functions. The Safeguarded Copy creates an immutable copy of the data in an air-gapped form on the same IBM FlashSystem for isolation and eventual quick recovery. This document describes how to enable and forward Oracle database user activities (by using IBM Security Guardium Data Protection) and IBM FlashSystem audit logs by using IBM FlashSystem to IBM QRadar. This document also describes how to create various rules to determine a threat, and configure and launch a suitable response to the detected threat in IBM QRadar. The document also outlines the steps that are involved to create a Scheduled Task by using IBM Copy Services Manager with various actions.

Pro Database Migration to Azure: Data Modernization for the Enterprise

Migrate your existing, on-premises applications into the Microsoft Azure cloud platform. This book covers the best practices to plan, implement, and operationalize the migration of a database application from your organization’s data center to Microsoft’s Azure cloud platform. Data modernization and migration is a technologically complex endeavor that can also be taxing from a leadership and operational standpoint. This book covers not only the technology, but also the most important aspects of organization culture, communication, and politics that so frequently derail such projects. You will learn the most important steps to ensuring a successful migration and see battle-tested wisdom from industry veterans. From executive sponsorship, to executing the migration, to the important steps following migration, you will learn how to effectively conduct future migrations and ensure that your team and your database application delivers on the expected business value of the project. This book is unlike any other currently in the market. It takes you through the most critical business and technical considerations and workflows for moving your data and databases into the cloud, with special attention paid to those who are deploying to the Microsoft Data Platform in Azure, especially SQL Server. Although this book focuses on migrating on-premises SQL Server enterprises to hybrid or fully cloud-based Azure SQL Database and Azure SQL Managed Instances, it also cover topics involving migrating non-SQL Server database platforms such as Oracle, MySQL, and PostgreSQL applications to Microsoft Azure. What You Will Learn Plan a database migration that ensures smooth project progress, optimal performance, low operating cost, and minimal downtime Properly analyze and manage non-technical considerations, such as legal compliance, privacy, and team execution Perform athorough architectural analysis to select the best Azure services, performance tiers, and cost-containment features Avoid pitfalls and common reasons for failure relating to corporate culture, intra-office politics, and poor communications Secure the proper executive champions who can execute the business planning needed for success Apply proven criteria to determine your future-state architecture and your migration method Execute your migration using a process proven by the authors over years of successful projects Who This Book Is For IT leadership, strategic IT decision makers, project owners and managers, and enterprise and application architects. For anyone looking toward cloud migration projects as the next stage of growth in their careers. Also useful for enterprise DBAs and consultants who might be involved in such projects. Readers should have experience and be competent in designing, coding, implementing, and supporting database applications in an on-premises environment.

IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power Enterprise Pools 2.0

This IBM® Redbooks® publication is a guide to IBM Power Systems Private Cloud with Shared Utility Capacity featuring Power Enterprise Pools (PEP) 2.0. This technology enables multiple servers in an to share base processor and memory resources and draw on pre-paid credits when the base is exceeded. Previously, the Shared Utility Capacity feature supported IBM Power E950 (9040-MR9) and IBM Power E980 (9080-M9S). The feature was extended in August 2020 to include the scale-out IBM Power servers that were announced on 14 July 2020, and it received dedicated processor support later in the year. The IBM Power S922 (9009-22G), and IBM Power S924 (9009-42G) servers, which use the latest IBM POWER9™ processor-based technology and support the IBM AIX®, IBM i, and Linux operating systems (OSs), are now supported. The previous scale-out models of Power S922 (9009-22A), and Power S924 (9009-42A) servers cannot be added to an enterprise pool. With the availability of the IBM Power E1080 (9080-HEX) in September 2021, support for this system as part of a Shared Utility Pool has become available. The goal of this book is to provide an overview of the solution's environment and guidance for planning a deployment of it. The book also covers how to configure IBM Power Systems Private Cloud with Shared Utility Capacity. There are also chapters about migrating from PEP 1.0 to PEP 2.0 and various use cases. This publication is for professionals who want to acquire a better understanding of IBM Power Systems Private Cloud, and Shared Utility Capacity. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners This book expands the set of IBM Power documentation by providing a desktop reference that offers a detailed technical description of IBM Power Systems Private Cloud with Shared Utility Capacity.

MySQL Cookbook, 4th Edition

For MySQL, the price of popularity comes with a flood of questions from users on how to solve specific data-related issues. That's where this cookbook comes in. When you need quick solutions or techniques, this handy resource provides scores of short, focused pieces of code, hundreds of worked-out examples, and clear, concise explanations for programmers who don't have the time (or expertise) to resolve MySQL problems from scratch. In this updated fourth edition, authors Sveta Smirnova and Alkin Tezuysal provide more than 200 recipes that cover powerful features in both MySQL 5.7 and 8.0. Beginners as well as professional database and web developers will dive into topics such as MySQL Shell, MySQL replication, and working with JSON. You'll learn how to: Connect to a server, issue queries, and retrieve results Retrieve data from the MySQL Server Store, retrieve, and manipulate strings Work with dates and times Sort query results and generate summaries Assess the characteristics of a dataset Write stored functions and procedures Use stored routines, triggers, and scheduled events Perform basic MySQL administration tasks Understand MySQL monitoring fundamentals

Simplifying Data Engineering and Analytics with Delta

This book will guide you through mastering Delta, a robust and versatile protocol for data engineering and analytics. You'll discover how Delta simplifies data workflows, supports both batch and streaming data, and is optimized for analytics applications in various industries. By the end, you will know how to create high-performing, analytics-ready data pipelines. What this Book will help me do Understand Delta's unique offering for unifying batch and streaming data processing. Learn approaches to address data governance, reliability, and scalability challenges. Gain technical expertise in building data pipelines optimized for analytics and machine learning use. Master core concepts like data modeling, distributed computing, and Delta's schema evolution features. Develop and deploy production-grade data engineering solutions leveraging Delta for business intelligence. Author(s) Anindita Mahapatra is an experienced data engineer and author with years of expertise in working on Delta and data-driven solutions. Her hands-on approach to explaining complex data concepts makes this book an invaluable resource for professionals in data engineering and analytics. Who is it for? Ideal for data engineers, data analysts, and anyone involved in AI/BI workflows, this book suits learners with some basic knowledge of SQL and Python. Whether you're an experienced professional or looking to upgrade your skills with Delta, this book will provide practical insights and actionable knowledge.

Unlock Complex and Streaming Data with Declarative Data Pipelines

Unlocking the value of modern data is critical for data-driven companies. This report provides a concise, practical guide to building a data architecture that efficiently delivers big, complex, and streaming data to both internal users and customers. Authors Ori Rafael, Roy Hasson, and Rick Bilodeau from Upsolver examine how modern data pipelines can improve business outcomes. Tech leaders and data engineers will explore the role these pipelines play in the data architecture and learn how to intelligently consider tradeoffs between different data architecture patterns and data pipeline development approaches. You will: Examine how recent changes in data, data management systems, and data consumption patterns have made data pipelines challenging to engineer Learn how three data architecture patterns (event sourcing, stateful streaming, and declarative data pipelines) can help you upgrade your practices to address modern data Compare five approaches for building modern data pipelines, including pure data replication, ELT over a data warehouse, Apache Spark over data lakes, declarative pipelines over data lakes, and declarative data lake staging to a data warehouse

IBM FlashSystem 5200 Product Guide

This IBM® Redbooks® Product Guide publication describes the IBM FlashSystem® 5200 solution, which is a next-generation IBM FlashSystem control enclosure. It is an NVMe end-to-end platform that is targeted at the entry and midrange market and delivers the full capabilities of IBM FlashCore® technology. It also provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum® Virtualize, including the following features: Data reduction and deduplication Dynamic tiering Thin provisioning Snapshots Cloning Replication Data copy services Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for high availability (HA) Scale-out and scale-up configurations further enhance capacity and throughput for better availability. The IBM FlashSystem 5200 is a high-performance storage solution that is based on a revolutionary 1U form factor. It consists of 12 NVMe Flash Devices in a 1U storage enclosure drawer with full redundant canister components and no single point of failure. It is designed for businesses of all sizes, including small, remote, branch offices and regional clients. It is a smarter, self-optimizing solution that requires less management, which enables organizations to overcome their storage challenges. Flash has come of age and price point reductions mean that lower parts of the storage market are seeing the value of moving over to flash and NVMe--based solutions. The IBM FlashSystem 5200 advances this transition by providing incredibly dense tiers of flash in a more affordable package. With the benefit of IBM FlashCore Module compression and new QLC flash-based technology becoming available, a compelling argument exists to move away from Nearline SAS storage and on to NVMe. With the release of IBM FlashSystem 5200 Software V8.4, extra functions and features are available, including support for new Distributed RAID1 (DRAID1) features, GUI enhancements, Redirect-on-write for Data Reduction Pool (DRP) snapshots, and 3-site replication capabilities. This book is aimed at pre-sales and post-sales technical support and marketing and storage administrators.

Learn dbatools in a Month of Lunches

If you work with SQL Server, dbatools is a lifesaver. This book will show you how to use this free and open source PowerShell module to automate just about every SQL server task you can imagine—all in just one month! In Learn dbatools in a Month of Lunches you will learn how to: Perform instance-to-instance and customized migrations Automate security audits, tempdb configuration, alerting, and reporting Schedule and monitor PowerShell tasks in SQL Server Agent Bulk-import any type of data into SQL Server Install dbatools in secure environments Written by a group of expert authors including dbatools creator Chrissy LeMaire, Learn dbatools in a Month of Lunches teaches you techniques that will make you more effective—and efficient—than you ever thought possible. In twenty-eight lunchbreak lessons, you’ll learn the most important use cases of dbatools and the favorite functions of its core developers. Stabilize and standardize your SQL server environment, and simplify your tasks by building automation, alerting, and reporting with this powerful tool. About the Technology For SQL Server DBAs, automation is the key to efficiency. Using the open-source dbatools PowerShell module, you can easily execute tasks on thousands of database servers at once—all from the command line. dbatools gives you over 500 pre-built commands, with countless new options for managing SQL Server at scale. There’s nothing else like it. About the Book Learn dbatools in a Month of Lunches teaches you how to automate SQL Server using the dbatools PowerShell module. Each 30-minute lesson introduces a new automation that will make your daily duties easier. Following the expert advice of dbatools creator Chrissy LeMaire and other top community contributors, you’ll learn to script everything from backups to disaster recovery. What's Inside Performing instance-to-instance and customized migrations Automating security audits, best practices, and standardized configurations Administering SQL Server Agent including running PowerShell scripts effectively Bulk-importing many types of data into SQL Server Executing advanced tasks and increasing efficiency for everyday administration About the Reader For DBAs, accidental DBAs, and systems engineers who manage SQL Server. About the Authors Chrissy LeMaire is a GitHub Star and the creator of dbatools. Rob Sewell is a data engineer and a passionate automator. Jess Pomfret and Cláudio Silva are data platform architects. All are Microsoft MVPs. Quotes All SQL Server professionals should learn dbatools. With its combination of knowledge transfer, anecdotes, and hands-on labs, this book is the perfect way. - From the Foreword by Anna Hoffman, Databases Product Management, Microsoft Excellent guide for dbatools with lots of practical tips! Required reading for anyone interested in dbatools. - Ruben Vandeginste, PeopleWare A must-have for any SQL server developer. - Raushan Kumar Jha, Microsoft If you want to automate all vital aspects of SQL Server, wait no more! Learn dbatools in a month, with guidance from the best minds in the business. - Ranjit Sahai, RAM Consulting

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs. After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform. What You Will Learn Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform Benefit from the new Delta Lake open-source storage layer for data lakehouses Take advantage of schema evolution, change feeds, live tables, and more Writefunctional PySpark code for data lakehouse ELT jobs Optimize Apache Spark performance through partitioning, indexing, and other tuning options Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake Who This Book Is For Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform.

Tidy Modeling with R

Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you're just starting out or have years of experience with modeling, this practical introduction shows data analysts, business analysts, and data scientists how the tidymodels framework offers a consistent, flexible approach for your work. RStudio engineers Max Kuhn and Julia Silge demonstrate ways to create models by focusing on an R dialect called the tidyverse. Software that adopts tidyverse principles shares both a high-level design philosophy and low-level grammar and data structures, so learning one piece of the ecosystem makes it easier to learn the next. You'll understand why the tidymodels framework has been built to be used by a broad range of people. With this book, you will: Learn the steps necessary to build a model from beginning to end Understand how to use different modeling and feature engineering approaches fluently Examine the options for avoiding common pitfalls of modeling, such as overfitting Learn practical methods to prepare your data for modeling Tune models for optimal performance Use good statistical practices to compare, evaluate, and choose among models

IBM TS7700 Release 5.2.2 Guide

This IBM® Redbooks® publication covers IBM TS7700 R5.2. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on 25 years of experience, the R5.2 release includes many features that enable improved performance, usability, and security. Highlights include IBM TS7700 Advanced Object Store, an all flash TS7770, grid resiliency enhancements, and Logical WORM retention. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud® Object Storage, Amazon S3, and RSTOR. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.2. The R5.2 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000® Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.2 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. Note: The latest Release 5.2 was split into two phases: R5.2 Phase 1 (also referred to as and ) R5.2 Phase 2 ( and R) TS7700 provides tape virtualization for the IBM z environment. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. Tape virtualization can help satisfy the following requirements in a data processing environment. New and existing capabilities of the TS7700 5.2.2 release includes the following highlights: Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication of virtual tape and TCT objects Grid access to all logical volume and object data that is independent of where it exists An all-flash TS7770 option for improved performance Full Advanced Object Store Grid Cloud support of DS8000 Transparent Cloud Tier Full AES256 encryption for data that is in-flight and at-rest Tight integration with IBM Z® and DFSMS policy management DS8000 Object Store AES256 in-flight encryption and compression Regulatory compliance through Logical WORM and LWORM Retention support Cloud Storage Tier support for archive, logical volume version, and disaster recovery Optional integration with physical tape 16 Gb IBM FICON® throughput that exceeds 5 GBps per TS7700 cluster Grid Resiliency Support with Control Unit Initiated Reconfiguration (CUIR) support IBM Z hosts view up to 3,968 common devices per TS7700 grid TS7770 Cache On-demand feature that is based capacity licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1160, IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM POWER9™ technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM DS8900F and IBM Z Synergy DS8900F: Release 9.3 and z/OS 2.5

IBM Z® has a close and unique relationship to its storage. Over the years, improvements to the IBM zSystems® processors and storage software, the disk storage systems, and their communication architecture consistently reinforced this synergy. This IBM® Redpaper™ publication summarizes and highlights the various aspects, advanced functions, and technologies that are often pioneered by IBM and make IBM Z and IBM DS8000® products an ideal combination. This paper is intended for users who have some familiarity with IBM Z and the IBM DS8000 series and want a condensed but comprehensive overview of the synergy items up to the IBM z16 server with IBM z/OS® V2.5 and the IBM DS8900 Release 9.3 firmware.

Data Engineering with Alteryx

Dive into 'Data Engineering with Alteryx' to master the principles of DataOps while learning to build robust data pipelines using Alteryx. This book guides you through key practices to enhance data pipeline reliability, efficiency, and accessibility, making it an essential resource for modern data professionals. What this Book will help me do Understand and implement DataOps practices within Alteryx workflows. Design and develop data pipelines with Alteryx Designer for efficient data processing. Learn to manage and publish pipelines using Alteryx Server and Alteryx Connect. Gain advanced skills in Alteryx for handling spatial analytics and machine learning. Master techniques to monitor, secure, and optimize data workflows and access. Author(s) Paul Houghton is an experienced data engineer and author specializing in data engineering and DataOps. With extensive experience using Alteryx tools and workflows, Paul has a passion for teaching and sharing his knowledge through clear and practical guidance. His hands-on approach ensures readers successfully navigate and apply technical concepts to real-world projects. Who is it for? This book is ideal for data engineers, data scientists, and data analysts aiming to build reliable data pipelines with Alteryx. You do not need prior experience with Alteryx, but familiarity with data workflows will enhance your learning experience. If you're focused on aligning with DataOps methodologies, this book is tailored for you.

Ten Things to Know About ModelOps

The past few years have seen significant developments in data science, AI, machine learning, and advanced analytics. But the wider adoption of these technologies has also brought greater cost, risk, regulation, and demands on organizational processes, tasks, and teams. This report explains how ModelOps can provide both technical and operational solutions to these problems. Thomas Hill, Mark Palmer, and Larry Derany summarize important considerations, caveats, choices, and best practices to help you be successful with operationalizing AI/ML and analytics in general. Whether your organization is already working with teams on AI and ML, or just getting started, this report presents ten important dimensions of analytic practice and ModelOps that are not widely discussed, or perhaps even known. In part, this report examines: Why ModelOps is the enterprise "operating system" for AI/ML algorithms How to build your organization's IP secret sauce through repeatable processing steps How to anticipate risks rather than react to damage done How ModelOps can help you deliver the many algorithms and model formats available How to plan for success and monitor for value, not just accuracy Why AI will be soon be regulated and how ModelOps helps ensure compliance

In-Memory Analytics with Apache Arrow

Discover the power of in-memory data analytics with "In-Memory Analytics with Apache Arrow." This book delves into Apache Arrow's unique capabilities, enabling you to handle vast amounts of data efficiently and effectively. Learn how Arrow improves performance, offers seamless integration, and simplifies data analysis in diverse computing environments. What this Book will help me do Gain proficiency with the datastore facilities and data types defined by Apache Arrow. Master the Arrow Flight APIs to efficiently transfer data between systems. Learn to leverage in-memory processing advantages offered by Arrow for state-of-the-art analytics. Understand how Arrow interoperates with popular tools like Pandas, Parquet, and Spark. Develop and deploy high-performance data analysis pipelines with Apache Arrow. Author(s) Matthew Topol, the author of the book, is an experienced practitioner in data analytics and Apache Arrow technology. Having contributed to the development and implementation of Arrow-powered systems, he brings a wealth of knowledge to readers. His ability to delve deep into technical concepts while keeping explanations practical makes this book an excellent guide for learners of the subject. Who is it for? This book is ideal for professionals in the data domain including developers, data analysts, and data scientists aiming to enhance their data manipulation capabilities. Beginners with some familiarity with data analysis concepts will find it beneficial, as well as engineers designing analytics utilities. Programming examples accommodate users of C, Go, and Python, making it broadly accessible.

Fundamentals of Data Engineering

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle

Advanced Analytics with PySpark

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming. Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing. If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis. Familiarize yourself with Spark's programming model and ecosystem Learn general approaches in data science Examine complete implementations that analyze large public datasets Discover which machine learning tools make sense for particular problems Explore code that can be adapted to many uses