talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

Practical Data Privacy

Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems. Practical Data Privacy answers important questions such as: What do privacy regulations like GDPR and CCPA mean for my data workflows and data science use cases? What does "anonymized data" really mean? How do I actually anonymize data? How does federated learning and analysis work? Homomorphic encryption sounds great, but is it ready for use? How do I compare and choose the best privacy-preserving technologies and methods? Are there open-source libraries that can help? How do I ensure that my data science projects are secure by default and private by design? How do I work with governance and infosec teams to implement internal policies appropriately?

IBM DS8910F Model 993 Rack-Mounted Storage System Release 9.1

This IBM Redpaper publication presents and positions the DS8910F Model 993 storage system. This modular system can be integrated into a 16U contiguous space of an IBM z15™ model T02 or IBM z14® Model ZR1 with Feature Code 0937 and IBM LinuxONE III model LT2 or LinuxONE Rockhopper II model LR1 with Feature Code 0938. The DS8910F Model 993 allows you to take advantage of the performance boost of all-flash systems and advanced features while limiting data center footprint and power infrastructure requirements.

Automating Data Transformations

The modern data stack has evolved rapidly in the past decade. Yet, as enterprises migrate vast amounts of data from on-premises platforms to the cloud, data teams continue to face limitations executing data transformation at scale. Data transformation is an integral part of the analytics workflow--but it's also the most time-consuming, expensive, and error-prone part of the process. In this report, Satish Jayanthi and Armon Petrossian examine key concepts that will enable you to automate data transformation at scale. IT decision makers, CTOs, and data team leaders will explore ways to democratize data transformation by shifting from activity-oriented to outcome-oriented teams--from manufacturing-line assembly to an approach that lets even junior analysts implement data with only a brief code review. With this insightful report, you will: Learn how successful data systems rely on simplicity, flexibility, user-friendliness, and a metadata-first approach Adopt a product-first mindset (data as a product, or DaaP) for developing data resources that focus on discoverability, understanding, trust, and exploration Build a transformation platform that delivers the most value, using a column-first approach Use data architecture as a service (DAaaS) to help teams build and maintain their own data infrastructure as they work collaboratively About the authors: Armon Petrossian is CEO and cofounder of Coalesce. Previously, he was part of the founding team at WhereScape in North America, where he served as national sales manager for almost a decade. Satish Jayanthi is CTO and cofounder of Coalesce. Prior to that, he was senior solutions architect at WhereScape, where he met his cofounder Armon.

IBM FlashSystem 7300 Product Guide

This IBM® Redpaper Product Guide describes the IBM FlashSystem® 7300 solution, which is a next-generation IBM FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Spectrum® Virtualize. To take advantage of artificial intelligence (AI)-enhanced applications, real-time big data analytics, and cloud architectures that require higher levels of system performance and storage capacity, enterprises around the globe are rapidly moving to modernize established IT infrastructures. However, for many organizations, staff resources, and expertise are limited, and cost-efficiency is a top priority. These organizations have important investments in existing infrastructure that they want to maximize. They need enterprise-grade solutions that optimize cost-efficiency while simplifying the pathway to modernization. IBM FlashSystem 7300 is designed specifically for these requirements and use cases. It also delivers a cyber resilience without compromising application performance. IBM FlashSystem 7300 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering (TCT) IBM HyperSwap® including 3-site replication for high availability Scale-out and scale-up configurations further enhance capacity and throughput for better availability With the release of IBM Spectrum Virtualize V8.5, extra functions and features are available, including support for new third-generation IBM FlashCore Modules Non-Volatile Memory Express (NVMe) type drives within the control enclosure, and 100 Gbps Ethernet adapters that provide NVMe Remote Direct Memory Access (RDMA) options. New software features include GUI enhancements, security enhancements including multifactor authentication and single sign-on, and Fibre Channel (FC) portsets.

Reachable Sets of Dynamic Systems

Reachable Sets of Dynamic Systems: Uncertainty, Sensitivity, and Complex Dynamics introduces differential inclusions, providing an overview as well as multiple examples of its interdisciplinary applications. The design of dynamic systems of any type is an important issue as is the influence of uncertainty in model parameters and model sensitivity. The possibility of calculating the reachable sets may be a powerful additional tool in such tasks. This book can help graduate students, researchers, and engineers working in the field of computer simulation and model building, in the calculation of reachable sets of dynamic models. Introduces methodologies and approaches to the modeling and simulation of dynamic systems Presents uncertainty treatment and model sensitivity are described, and interdisciplinary examples Explores applications of differential inclusions in modeling and simulation

Snowflake SnowPro™ Advanced Architect Certification Companion: Hands-on Preparation and Practice

Master the intricacies of Snowflake and prepare for the SnowPro Advanced Architect Certification exam with this comprehensive study companion. This book provides robust and effective study tools to help you prepare for the exam and is also designed for those who are interested in learning the advanced features of Snowflake. The practical examples and in-depth background on theory in this book help you unleash the power of Snowflake in building a high-performance system. The best practices demonstrated in the book help you use Snowflake more powerfully and effectively as a data warehousing and analytics platform. Reading this book and reviewing the concepts will help you gain the knowledge you need to take the exam. The book guides you through a study of the different domains covered on the exam: Accounts and Security, Snowflake Architecture, Data Engineering, and Performance Optimization. You’ll also be well positioned to apply your newly acquired practical skills to real-world Snowflake solutions. You will have a deep understanding of Snowflake to help you take full advantage of Snowflake’s architecture to deliver value analytics insight to your business. What You Will Learn Gain the knowledge you need to prepare for the exam Review in-depth theory on Snowflake to help you build high-performance systems Broaden your skills as a data warehouse designer to cover the Snowflake ecosystem Optimize performance and costs associated with your use of the Snowflake data platform Share data securely both inside your organization and with external partners Apply your practical skills to real-world Snowflake solutions Who This Book Is For Anyone who is planning to take the SnowPro Advanced Architect Certification exam, those who want to move beyond traditional database technologies and build their skills to design and architect solutions using Snowflake services, and veteran database professionals seeking an on-the-job reference to understand one of the newest and fastest-growing technologies in data

Building an Event-Driven Data Mesh

The exponential growth of data combined with the need to derive real-time business value is a critical issue today. An event-driven data mesh can power real-time operational and analytical workloads, all from a single set of data product streams. With practical real-world examples, this book shows you how to successfully design and build an event-driven data mesh. Building an Event-Driven Data Mesh provides: Practical tips for iteratively building your own event-driven data mesh, including hurdles you'll experience, possible solutions, and how to obtain real value as soon as possible Solutions to pitfalls you may encounter when moving your organization from monoliths to event-driven architectures A clear understanding of how events relate to systems and other events in the same stream and across streams A realistic look at event modeling options, such as fact, delta, and command type events, including how these choices will impact your data products Best practices for handling events at scale, privacy, and regulatory compliance Advice on asynchronous communication and handling eventual consistency

Principles of Data Fabric

In "Principles of Data Fabric," you will gain a comprehensive understanding of Data Fabric solutions and architectures. This book provides a clear picture of how to design, implement, and optimize Data Fabric solutions to tackle complex data challenges. By the end, you'll be equipped with the knowledge to unify and leverage your organizational data efficiently. What this Book will help me do Design and architect Data Fabric solutions tailored to specific organizational needs. Learn to integrate Data Fabric with DataOps and Data Mesh for holistic data management. Master the principles of Data Governance and Self-Service analytics within the Data Fabric. Implement best practices for distributed data management and regulatory compliance. Apply industry insights and frameworks to optimize Data Fabric deployment. Author(s) Sonia Mezzetta, the author of "Principles of Data Fabric," is an experienced data professional with a deep understanding of data management frameworks and architectures like Data Fabric, Data Mesh, and DataOps. With years of industry expertise, Sonia has helped organizations implement effective data strategies. Her writing combines technical know-how with an approachable style to enlighten and guide readers on their data journey. Who is it for? This book is ideal for data engineers, data architects, and business analysts who seek to understand and implement Data Fabric solutions. It will also appeal to senior data professionals like Chief Data Officers aiming to integrate Data Fabric into their enterprises. Novice to intermediate knowledge of data management would be beneficial for readers. The content provides clear pathways to achieve actionable results in data strategies.

Beginning Database Design Solutions, 2nd Edition

A concise introduction to database design concepts, methods, and techniques in and out of the cloud In the newly revised second edition of Beginning Database Design Solutions: Understanding and Implementing Database Design Concepts for the Cloud and Beyond, Second Edition, award-winning programming instructor and mathematician Rod Stephens delivers an easy-to-understand guide to designing and implementing databases both in and out of the cloud. Without assuming any prior database design knowledge, the author walks you through the steps you’ll need to take to understand, analyze, design, and build databases. In the book, you’ll find clear coverage of foundational database concepts along with hands-on examples that help you practice important techniques so you can apply them to your own database designs, as well as: Downloadable source code that illustrates the concepts discussed in the book Best practices for reliable, platform-agnostic database design Strategies for digital transformation driven by universally accessible database design An essential resource for database administrators, data management specialists, and database developers seeking expertise in relational, NoSQL, and hybrid database design both in and out of the cloud, Beginning Database Design Solutions is a hands-on guide ideal for students and practicing professionals alike.

Data Fabric and Data Mesh Approaches with AI: A Guide to AI-based Data Cataloging, Governance, Integration, Orchestration, and Consumption

Understand modern data fabric and data mesh concepts using AI-based self-service data discovery and delivery capabilities, a range of intelligent data integration styles, and automated unified data governance—all designed to deliver "data as a product" within hybrid cloud landscapes. This book teaches you how to successfully deploy state-of-the-art data mesh solutions and gain a comprehensive overview on how a data fabric architecture uses artificial intelligence (AI) and machine learning (ML) for automated metadata management and self-service data discovery and consumption. You will learn how data fabric and data mesh relate to other concepts such as data DataOps, MLOps, AIDevOps, and more. Many examples are included to demonstrate how to modernize the consumption of data to enable a shopping-for-data (data as a product) experience. By the end of this book, you will understand the data fabric concept and architecture as it relates to themes such as automated unifieddata governance and compliance, enterprise information architecture, AI and hybrid cloud landscapes, and intelligent cataloging and metadata management. What You Will Learn Discover best practices and methods to successfully implement a data fabric architecture and data mesh solution Understand key data fabric capabilities, e.g., self-service data discovery, intelligent data integration techniques, intelligent cataloging and metadata management, and trustworthy AI Recognize the importance of data fabric to accelerate digital transformation and democratize data access Dive into important data fabric topics, addressing current data fabric challenges Conceive data fabric and data mesh concepts holistically within an enterprise context Become acquainted with the business benefits of data fabric and data mesh Who This Book Is For Anyone who is interested in deploying modern data fabric architectures and data mesh solutions within an enterprise, including IT and business leaders, data governance and data office professionals, data stewards and engineers, data scientists, and information and data architects. Readers should have a basic understanding of enterprise information architecture.

IBM Storage DS8900F Product Guide Release 9.3.2

This IBM® Redbooks Product Guide provides an overview of the features and functions that are available with the IBM Storage DS8900F models that run microcode Release 9.3.2 (Bundle 89.32/Licensed Machine Code 7.9.32). As of February 2023, the DS8900F with DS8000 Release 9.3.2 is the latest addition. The DS8900F is an all-flash system exclusively, and it offers three classes: IBM DS8980F: Analytic Class: The DS8980F Analytic Class offers best performance for organizations that want to expand their workload possibilities to artificial intelligence (AI), Business Intelligence, and Machine Learning. IBM DS8950F: Agility Class: The agility class is efficiently designed to consolidate all your mission-critical workloads for IBM zSystems, IBM LinuxONE, IBM Power Systems, and distributed environments under a single all-flash storage solution. IBM DS8910F: Flexibility Class: The flexibility class delivers significant performance for midrange organizations that are looking to meet storage challenges with advanced functionality delivered as a single rack solution.

Azure SQL Hyperscale Revealed: High-performance Scalable Solutions for Critical Data Workloads

Take a deep dive into the Azure SQL Database Hyperscale Service Tier and discover a new form of cloud architecture from Microsoft that supports massive databases. The new horizontally scalable architecture, formerly code-named Socrates, allows you to decouple compute nodes from storage layers. This radically different approach dramatically increases the scalability of the service. This book shows you how to leverage Hyperscale to provide next-level scalability, high throughput, and fast performance from large databases in your environment. The book begins by showing how Hyperscale helps you eliminate many of the problems of traditional high-availability and disaster recovery architecture. You’ll learn how Hyperscale overcomes storage capacity limitations and issues with scale-up times and costs. With Hyperscale, your costs do not increase linearly with database size and you can manage more data than ever at a lower cost. The book teaches you how todeploy, configure, and monitor an Azure SQL Hyperscale database in a production environment. The book also covers migrating your current workloads from traditional architecture to Azure SQL Hyperscale. What You Will Learn Understand the advantages of Hyperscale over traditional architecture Deploy a Hyperscale database on the Azure cloud (interactively and with code) Configure the advanced features of the Hyperscale database tier Monitor and scale database performance to suit your needs Back up and restore your Azure SQL Hyperscale databases Implement disaster recovery and failover capability Compare performance of Hyperscale vs traditional architecture Migrate existing databases to the Hyperscale service tier Who This Book Is For SQL architects, data engineers, and DBAs who want the most efficient and cost-effective cloud technologies to run their critical data workloads, and those seeking rapid scalability and high performance and throughput while utilizing large databases

Introduction to IBM PowerVM

Virtualization plays an important role in resource efficiency by optimizing performance, reducing costs, and improving business continuity. IBM PowerVM® provides a secure and scalable server virtualization environment for IBM AIX®, IBM® i, and Linux applications. PowerVM is built on the advanced reliability, availability, and serviceability (RAS) features and leading performance of IBM Power servers. This IBM Redbooks® publication introduces PowerVM virtualization technologies on Power servers. This publication targets clients who are new to Power servers and introduces the available capabilities of the PowerVM platform. This publication includes the following chapters: Chapter 1, "IBM PowerVM overview" introduces PowerVM and provides a high-level overview of the capabilities and benefits of the platform. Chapter 2, "IBM PowerVM features in details" provides a more in-depth review of PowerVM capabilities for system administrators and architects to familiarize themselves with its features. Chapter 3, "Planning for IBM PowerVM" provides planning guidance about PowerVM to prepare for the implementation of the solution. Chapter 4, "Implementing IBM PowerVM" describes and details configuration steps to implement PowerVM, starting from implementing the Virtual I/O Server (VIOS) to storage and network I/O virtualization configurations. Chapter 5, "Managing the PowerVM environment" focuses on systems management, day-to-day operations, monitoring, and maintenance. Chapter 6, "Automation on IBM Power servers" explains available techniques, utilities, and benefits of modern automation solutions.

Proactive Early Threat Detection and Securing Oracle Database with IBM QRadar, IBM Security Guardium Database Protection, and IBM Copy Services Manager by using IBM FlashSystem Safeguarded Copy

This IBM® blueprint publication focuses on early threat detection within a database environment by using IBM Security® Guardium® Data Protection and IBM QRadar® . It also highlights how to proactively start a cyber resilience workflow in response to a cyberattack or potential malicious user actions. The workflow that is presented here uses IBM Copy Services Manager as orchestration software to start IBM FlashSystem® Safeguarded Copy functions. The Safeguarded Copy creates an immutable copy of the data in an air-gapped form on the same IBM FlashSystem for isolation and eventual quick recovery. This document describes how to enable and forward Oracle database user activities (by using IBM Security Guardium Data Protection) and IBM FlashSystem audit logs by using IBM FlashSystem to IBM QRadar. This document also describes how to create various rules to determine a threat, and configure and launch a suitable response to the detected threat in IBM QRadar. The document also outlines the steps that are involved to create a Scheduled Task by using IBM Copy Services Manager with various actions.

Scaling Machine Learning with Spark

Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better. Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology. You will: Explore machine learning, including distributed computing concepts and terminology Manage the ML lifecycle with MLflow Ingest data and perform basic preprocessing with Spark Explore feature engineering, and use Spark to extract features Train a model with MLlib and build a pipeline to reproduce it Build a data system to combine the power of Spark with deep learning Get a step-by-step example of working with distributed TensorFlow Use PyTorch to scale machine learning and its internal architecture

SnowPro™ Core Certification Companion: Hands-on Preparation and Practice

This study companion helps you prepare for the SnowPro Core Certification exam. The author guides your studies so you will not have to tackle the exam by yourself. To help you track your progress, chapters in this book correspond to the exam domains as described on Snowflake’s website. Upon studying the material in this book, you will have solid knowledge that should give you the best shot possible at taking and passing the exam and earning the certification you deserve. Each chapter provides explanations, instructions, guidance, tips, and other information with the level of detail that you need to prepare for the exam. You will not waste your time with unneeded detail and advanced content which is out of scope of the exam. Focus is kept on reviewing the materials and helping you become familiar with the content of the exam that is recommended by Snowflake. This Book Helps You Review the domainsthat Snowflake specifically recommends you study in preparation for Exam COF-C02 Identify gaps in your knowledge that you can study and fill in to increase your chances of passing Exam COF-C02 Level up your knowledge even if not taking the exam, so you know the same material as someone who has taken the exam Learn how to set up a Snowflake account and configure access according to recommended security best practices Be capable of loading structured and unstructured data into Snowflake as well as unloading data from Snowflake Understand how to apply Snowflake data protection features such as cloning, time travel, and fail safe Review Snowflake’s data sharing capabilities, including data marketplace and data exchange Who This Book Is For Those who are planning to take the SnowPro Core Certification COF-C02 exam, and anyone who wishes to gain core expertise in implementing and migrating tothe Snowflake Data Cloud

Data Mesh in Action

Revolutionize the way your organization approaches data with a data mesh! This new decentralized architecture outpaces monolithic lakes and warehouses and can work for a company of any size. In Data Mesh in Action you will learn how to: Implement a data mesh in your organization Turn data into a data product Move from your current data architecture to a data mesh Identify data domains, and decompose an organization into smaller, manageable domains Set up the central governance and local governance levels over data Balance responsibilities between the two levels of governance Establish a platform that allows efficient connection of distributed data products and automated governance Data Mesh in Action reveals how this groundbreaking architecture looks for both startups and large enterprises. You won’t need any new technology—this book shows you how to start implementing a data mesh with flexible processes and organizational change. You’ll explore both an extended case study and real-world examples. As you go, you’ll be expertly guided through discussions around Socio-Technical Architecture and Domain-Driven Design with the goal of building a sleek data-as-a-product system. Plus, dozens of workshop techniques for both in-person and remote meetings help you onboard colleagues and drive a successful transition. About the Technology Business increasingly relies on efficiently storing and accessing large volumes of data. The data mesh is a new way to decentralize data management that radically improves security and discoverability. A well-designed data mesh simplifies self-service data consumption and reduces the bottlenecks created by monolithic data architectures. About the Book Data Mesh in Action teaches you pragmatic ways to decentralize your data and organize it into an effective data mesh. You’ll start by building a minimum viable data product, which you’ll expand into a self-service data platform, chapter-by-chapter. You’ll love the book’s unique “sliders” that adjust the mesh to meet your specific needs. You’ll also learn processes and leadership techniques that will change the way you and your colleagues think about data. What's Inside Decompose an organization into manageable domains Turn data into a data product Set up central and local governance levels Build a fit-for-purpose data platform Improve management, initiation, and support techniques About the Reader For data professionals. Requires no specific programming stack or data platform. About the Authors Jacek Majchrzak is a hands-on lead data architect. Dr. Sven Balnojan manages data products and teams. Dr. Marian Siwiak is a data scientist and a management consultant for IT, scientific, and technical projects. Quotes This book teleports you into the seat of the chief architect on a data mesh project. - From the Foreword by Jean-Georges Perrin, PayPal A must-read for anyone who works in data. - Prukalpa Sankar, Co-Founder of Atlan Satisfies all those ‘what’, ‘why’, and ‘how’ questions. A unique blend of process and technology, and an excellent, example-driven resource. - Shiroshica Kulatilake, WSO2 The starting point for your journey in the new generation of data platforms. - Arnaud Castelltort, University of Montpellier