talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

Protocol Buffers Handbook

The "Protocol Buffers Handbook" by Clément Jean offers an in-depth exploration of Protocol Buffers (Protobuf), a powerful data serialization format. Learn everything from syntax and schema evolution to custom validations and cross-language integrations. With practical examples in Go and Python, this guide empowers you to efficiently serialize and manage structured data across platforms. What this Book will help me do Develop advanced skills in using Protocol Buffers (Protobuf) for efficient data serialization. Master the key concepts of Protobuf syntax and schema evolution for compatibility. Learn to create custom validation plugins and tailor Protobuf processes. Integrate Protobuf with multiple programming environments, including Go and Python. Automate Protobuf projects using tools like Buf and Bazel to streamline workflows. Author(s) Clément Jean is a skilled programmer and technical writer specializing in data serialization and distributed systems. With substantial experience in developing scalable microservices, he shares valuable insights into using Protocol Buffers effectively. Through this book, Clément offers a hands-on approach to Protobuf, blending theory with practical examples derived from real-world scenarios. Who is it for? This book is perfect for software engineers, system integrators, and data architects who aim to optimize data serialization and APIs, regardless of their programming language expertise. Beginners will grasp foundational Protobuf concepts, while experienced developers will extend their knowledge to advanced, practical applications. Those working with microservices and heavily data-dependent systems will find this book especially relevant.

IBM Storage FlashSystem 5200 Product Guide for IBM Storage Virtualize 8.6

This IBM® Redpaper® Product Guide publication describes the IBM Storage FlashSystem® 5200 solution, which is a next-generation IBM Storage FlashSystem control enclosure. It is an NVMe end-to-end platform that is targeted at the entry and midrange market and delivers the full capabilities of IBM FlashCore® technology. It also provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following features: Data reduction and deduplication Dynamic tiering Thin provisioning Snapshots Cloning Replication Data copy services Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for high availability (HA) Scale-out and scale-up configurations further enhance capacity and throughput for better availability. The IBM Storage FlashSystem 5200 is a high-performance storage solution that is based on a revolutionary 1U form factor. It consists of 12 NVMe Flash Devices in a 1U storage enclosure drawer with full redundant canister components and no single point of failure. It is designed for businesses of all sizes, including small, remote, branch offices and regional clients. It is a smarter, self-optimizing solution that requires less management, which enables organizations to overcome their storage challenges. Flash has come of age and price point reductions mean that lower parts of the storage market are seeing the value of moving over to flash and NVMe--based solutions. The IBM Storage FlashSystem 5200 advances this transition by providing incredibly dense tiers of flash in a more affordable package. With the benefit of IBM FlashCore Module compression and new QLC flash-based technology becoming available, a compelling argument exists to move away from Nearline SAS storage and on to NVMe. This Product Guide is aimed at pre-sales and post-sales technical support and marketing and storage administrators.

IBM Storage FlashSystem 9500 Product Guide for IBM Storage Virtualize 8.6

This IBM® Redpaper® Product Guide describes the IBM Storage FlashSystem® 9500 solution, which is a next-generation IBM Storage FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Storage Virtualize. Often, applications exist that are foundational to the operations and success of an enterprise. These applications might function as prime revenue generators, guide or control important tasks, or provide crucial business intelligence, among many other jobs. Whatever their purpose, they are mission critical to the organization. They demand the highest levels of performance, functionality, security, and availability. They also must be protected against the newer threat of cyberattacks. To support such mission-critical applications, enterprises of all types and sizes turn to the IBM Storage FlashSystem 9500. IBM Storage FlashSystem 9500 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for HA Scale-out and scale-up configurations that further enhance capacity and throughput for better availability This Redpaper applies to IBM Storage Virtualize V8.6.

Learn SQL using MySQL in One Day and Learn It Well

"Learn SQL using MySQL in One Day and Learn It Well" is your hands-on guide to mastering SQL efficiently using MySQL. This book takes you from understanding basic database concepts to executing advanced queries and implementing essential features like triggers and routines. With a project-based approach, you will confidently manage databases and unlock the potential of data. What this Book will help me do Understand database concepts and relational data architecture. Design and define tables to organize and store data effectively. Perform advanced SQL queries to manipulate and analyze data efficiently. Implement database triggers, views, and routines for advanced management. Apply practical skills in SQL through a comprehensive hands-on project. Author(s) Jamie Chan is a professional instructor and technical writer with extensive experience in database management and software development. Known for a clear and engaging teaching style, Jamie has authored numerous books focusing on hands-on learning. Jamie approaches pedagogy with the goal of making technical subjects accessible and practical for all learners. Who is it for? This book is designed for beginners eager to learn SQL and MySQL from scratch. It is perfect for professionals or students who want relevant and actionable skills in database management. Whether you're looking to enhance career prospects or leverage database tools for personal projects, this book is your practical starting point. Basic computer literacy is all that's needed.

Natural Language and Search

When you look at operational analytics and business data analysis activities—such as log analytics, real-time application monitoring, website search, observability, and more—effective search functionality is key to identifying issues, improving customers experience, and increasing operational effectiveness. How can you support your business needs by leveraging ML-driven advancements in search relevance? In this report, authors Jon Handler, Milind Shyani, Karen Kilroy help executives and data scientists explore how ML can enable ecommerce firms to generate more pertinent search results to drive better sales. You'll learn how personalized search helps you quickly find relevant data within applications, websites, and data lake catalogs. You'll also discover how to locate the content available in CRM systems and document stores. This report helps you: Address the challenges of traditional document search, including data preparation and ingestion Leverage ML techniques to improve search outcomes and the relevance of documents you retrieve Discover what makes a good search solution that's reliable, scalable, and can drive your business forward Learn how to choose a search solution to improve your decision-making process With advancements in ML-driven search, businesses can realize even more benefits and improvements in their data and document search capabilities to better support their own business needs and the needs of their customers. About the authors: Jon Handler is a senior principal solutions architect at Amazon Web Services. Milind Shyani is an applied scientist at Amazon Web Services working on large language models, information retrieval and machine learning algorithms. Karen Kilroy, CEO of Kilroy Blockchain, is a lifelong technologist, full stack software engineer, speaker, and author living in Northwest Arkansas.

Bio-Inspired Strategies for Modeling and Detection in Diabetes Mellitus Treatment

Bio-Inspired Strategies for Modeling and Detection in Diabetes Mellitus Treatment focuses on bio-inspired techniques such as modelling to generate control algorithms for the treatment of diabetes mellitus. The book addresses the identification of diabetes mellitus using a high-order recurrent neural network trained by the extended Kalman filter. The authors also describe the use of metaheuristic algorithms for the parametric identification of compartmental models of diabetes mellitus widely used in research works such as the Sorensen model and the Dallaman model. In addition, the book addresses the modelling of time series for the prediction of risk scenarios such as hyperglycaemia and hypoglycaemia using deep neural networks. The detection of diabetes mellitus in early stages or when current diagnostic techniques cannot detect glucose intolerance or prediabetes is proposed, carried out by means of deep neural networks in force in the literature. Readers will find leading-edge research in diabetes identification based on discrete high-order neural networks trained with the extended Kalman filter; parametric identification of compartmental models used to describe diabetes mellitus; modelling of data obtained by continuous glucose monitoring sensors for the prediction of risk scenarios such as hyperglycaemia and hypoglycaemia; and screening for glucose intolerance using glucose tolerance test data and deep neural networks. Application of the proposed approaches is illustrated via simulation and real-time implementations for modelling, prediction, and classification.Addresses the online identification of diabetes mellitus using a high-order recurrent neural network trained online by an extended Kalman filter. Covers parametric identification of compartmental models used to describe diabetes mellitus. Provides modeling of data obtained by continuous glucose-monitoring sensors for the prediction of risk scenarios such as hyperglycaemia and hypoglycaemia.

Engineering Data Mesh in Azure Cloud

Discover how to implement a modern data mesh architecture using Microsoft Azure's Cloud Adoption Framework. In this book, you'll learn the strategies to decentralize data while maintaining strong governance, turning your current analytics struggles into scalable and streamlined processes. Unlock the potential of data mesh to achieve advanced and democratized analytics platforms. What this Book will help me do Learn to decentralize data governance and integrate data domains effectively. Master strategies for building and implementing data contracts suited to your organization's needs. Explore how to design a landing zone for a data mesh using Azure's Cloud Adoption Framework. Understand how to apply key architecture patterns for analytics, including AI and machine learning. Gain the knowledge to scale analytics frameworks using modern cloud-based platforms. Author(s) None Deswandikar is a seasoned data architect with extensive experience in implementing cutting-edge data solutions in the cloud. With a passion for simplifying complex data strategies, None brings real-world customer experiences into practical guidance. This book reflects None's dedication to helping organizations achieve their data goals with clarity and effectiveness. Who is it for? This book is ideal for chief data officers, data architects, and engineers seeking to transform data analytics frameworks to accommodate advanced workloads. Especially useful for professionals aiming to implement cloud-based data mesh solutions, it assumes familiarity with centralized data systems, data lakes, and data integration techniques. If modernizing your organization's data strategy appeals to you, this book is for you.

The Definitive Guide to Data Integration

Master the modern data stack with 'The Definitive Guide to Data Integration.' This comprehensive book covers the key aspects of data integration, including data sources, storage, transformation, governance, and more. Equip yourself with the knowledge and hands-on skills to manage complex datasets and unlock your data's full potential. What this Book will help me do Understand how to integrate diverse datasets efficiently using modern tools. Develop expertise in designing and implementing robust data integration workflows. Gain insights into real-time data processing and cloud-based data architectures. Learn best practices for data quality, governance, and compliance in integration. Master the use of APIs, workflows, and transformation patterns in practice. Author(s) The authors, None Bonnefoy, None Chaize, Raphaël Mansuy, and Mehdi Tazi, are seasoned experts in data engineering and integration. They bring years of experience in modern data technologies and consulting. Their approachable writing style ensures that readers at various skill levels can grasp complex concepts effectively. Who is it for? This book is ideal for data engineers, architects, analysts, and IT professionals. Whether you're new to data integration or looking to deepen your expertise, this guide caters to individuals seeking to navigate the challenges of the modern data stack.

Azure Data Factory by Example: Practical Implementation for Data Engineers

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. This edition, updated for 2024, includes the latest developments to the Azure Data Factory service: Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics. Improvements to flow control provided by activity deactivation and the Fail activity. The introduction of reusable data flow components such as user-defined functions and flowlets. Extensions to integration runtime capabilities including Managed VNet support. The ability to trigger pipelines in response to custom events. Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

IBM GDPS: An Introduction to Concepts and Capabilities

This IBM Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery (DR), along with issues that are related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for high availability and disaster recovery (HADR). Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM® also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

The Complete Developer

Whether you’ve been in the developer kitchen for decades or are just taking the plunge to do it yourself, The Complete Developer will show you how to build and implement every component of a modern stack—from scratch. You’ll go from a React-driven frontend to a fully fleshed-out backend with Mongoose, MongoDB, and a complete set of REST and GraphQL APIs, and back again through the whole Next.js stack. The book’s easy-to-follow, step-by-step recipes will teach you how to build a web server with Express.js, create custom API routes, deploy applications via self-contained microservices, and add a reactive, component-based UI. You’ll leverage command line tools and full-stack frameworks to build an application whose no-effort user management rides on GitHub logins. You’ll also learn how to: Work with modern JavaScript syntax, TypeScript, and the Next.js framework Simplify UI development with the React library Extend your application with REST and GraphQL APIs Manage your data with the MongoDB NoSQL database Use OAuth to simplify user management, authentication, and authorization Automate testing with Jest, test-driven development, stubs, mocks, and fakes Whether you’re an experienced software engineer or new to DIY web development, The Complete Developer will teach you to succeed with the modern full stack. After all, control matters. Covers: Docker, Express.js, JavaScript, Jest, MongoDB, Mongoose, Next.js, Node.js, OAuth, React, REST and GraphQL APIs, and TypeScript

Practical MongoDB Aggregations

Dive into the capabilities of the MongoDB aggregation framework with this official guide, "Practical MongoDB Aggregations". You'll learn how to design and optimize efficient aggregation pipelines for MongoDB 7.0, empowering you to handle complex data analysis and processing tasks directly within the database. What this Book will help me do Gain expertise in crafting advanced MongoDB aggregation pipelines for custom data workflows. Learn to perform time series analysis for financial datasets and IoT applications. Discover optimization techniques for working with sharded clusters and large datasets. Master array manipulation and other specific operations essential for MongoDB data models. Build pipelines that ensure data security and distribution while maintaining performance. Author(s) Paul Done, a recognized expert in MongoDB, brings his extensive experience in database technologies to this book. With years of practice in helping companies leverage MongoDB for big data solutions, Paul shares his deep knowledge in an accessible and logical manner. His approach to writing is hands-on, focusing on practical insights and clear explanations. Who is it for? This book is tailored for intermediate-level developers, database architects, data analysts, engineers, and scientists who use MongoDB. If you are familiar with MongoDB and looking to expand your understanding specifically around its aggregation capabilities, this guide is for you. Whether you're analyzing time series data or need to optimize pipelines for performance, you'll find actionable tips and examples here to suit your needs.

Learn T-SQL Querying - Second Edition

Troubleshoot query performance issues, identify anti-patterns in your code, and write efficient T-SQL queries with this guide for T-SQL developers Key Features A definitive guide to mastering the techniques of writing efficient T-SQL code Learn query optimization fundamentals, query analysis, and how query structure impacts performance Discover insightful solutions to detect, analyze, and tune query performance issues Purchase of the print or Kindle book includes a free PDF eBook Book Description Data professionals seeking to excel in Transact-SQL for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. Learn T-SQL Querying second edition focuses on indexing queries and crafting elegant T-SQL code enabling data professionals gain mastery in modern SQL Server versions (2022) and Azure SQL Database. The book covers new topics like logical statement processing flow, data access using indexes, and best practices for tuning T-SQL queries. Starting with query processing fundamentals, the book lays a foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning to analyze execution plans for insights into current performance and scalability. Using dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. The book covers indexing and delves into SQL Server’s built-in tools to expedite resolution of T-SQL query performance and scalability issues. Hands-on examples will guide you to avoid UDF pitfalls and understand features like predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and avoid pitfalls What you will learn Identify opportunities to write well-formed T-SQL statements Familiarize yourself with the Cardinality Estimator for query optimization Create efficient indexes for your existing workloads Implement best practices for T-SQL querying Explore Query Execution Dynamic Management Views Utilize the latest performance optimization features in SQL Server 2017, 2019, and 2022 Safeguard query performance during upgrades to newer versions of SQL Server Who this book is for This book is for database administrators, database developers, data analysts, data scientists and T-SQL practitioners who want to master the art of writing efficient T-SQL code and troubleshooting query performance issues through practical examples. A basic understanding of T-SQL syntax, writing queries in SQL Server, and using the SQL Server Management Studio tool will be helpful to get started.

Azure Data Factory Cookbook - Second Edition

This comprehensive guide to Azure Data Factory shows you how to create robust data pipelines and workflows to handle both cloud and on-premises data solutions. Through practical recipes, you will learn to build, manage, and optimize ETL, hybrid ETL, and ELT processes. The book offers detailed explanations to help you integrate technologies like Azure Synapse, Data Lake, and Databricks into your projects. What this Book will help me do Master building and managing data pipelines using Azure Data Factory's latest versions and features. Leverage Azure Synapse and Azure Data Lake for streamlined data integration and analytics workflows. Enhance your ETL/ELT solutions with Microsoft Fabric, Databricks, and Delta tables. Employ debugging tools and workflows in Azure Data Factory to identify and solve data processing issues efficiently. Implement industry-grade best practices for reliable and efficient data orchestration and integration pipelines. Author(s) Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, and Xenia Ireton collectively bring years of expertise in data engineering and cloud-based solutions. They are recognized professionals in the Azure ecosystem, dedicated to sharing their knowledge through detailed and actionable content. Their collaborative approach ensures that this book provides practical insights for technical audiences. Who is it for? This book is ideal for data engineers, ETL developers, and professional architects who work with cloud and hybrid environments. If you're looking to upskill in Azure Data Factory or expand your knowledge into related technologies like Synapse Analytics or Databricks, this is for you. Readers should have a foundational understanding of data warehousing concepts to fully benefit from the material.

Big Data Computing

This book primarily aims to provide an in-depth understanding of recent advances in big data computing technologies, methodologies, and applications along with introductory details of big data computing models such as Apache Hadoop, MapReduce, Hive, Pig, Mahout in-memory storage systems, NoSQL databases, and big data streaming services.

IBM FlashSystem and VMware Implementation and Best Practices Guide

This IBM® Redbooks® publication details the configuration and best practices for using the IBM FlashSystem® family of storage products within a VMware environment. The first version of this book was published in 2021 and specifically addressed IBM Spectrum® Virtualize Version 8.4 with VMware vSphere 7.0. This second version of this book includes all the enhancements that are available with IBM Spectrum Virtualize 8.5. Topics illustrate planning, configuring, operations, and preferred practices that include integration of IBM FlashSystem storage systems with the VMware vCloud suite of applications: VMware vSphere Web Client (vWC) vSphere Storage APIs - Storage Awareness (VASA) vSphere Storage APIs – Array Integration (VAAI) VMware Site Recovery Manager (SRM) VMware vSphere Metro Storage Cluster (vMSC) Embedded VASA Provider for VMware vSphere Virtual Volumes (vVols) This book is intended for presales consulting engineers, sales engineers, and IBM clients who want to deploy IBM FlashSystem storage systems in virtualized data centers that are based on VMware vSphere. Note: There is a newer version of this book: "IBM Storage Virtualize and VMware: Integrations, Implementation and Best Practices, SG24-8549". This book addresses IBM Storage Virtualize Version 8.6 with VMware vSphere 8. The new IBM Storage plugin for vSphere is covered in this book.

IBM TS7700 Release 5.3 Guide

This IBM Redbooks® publication covers IBM TS7700 R5.3. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on over 25 years of experience, the R5.3 release includes many features that enable improved performance, usability, and security. Highlights include the IBM TS7700 Advanced Object Store, an all flash TS7770, grid resiliency enhancements, and Logical WORM retention. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud Object Storage, Amazon S3, and RSTOR. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.3. The R5.3 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000 Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.3 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. TS7700 provides tape virtualization for the IBM Z® environment. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. New and existing capabilities of the TS7700 5.3 release includes the following highlights: Support for IBM TS1160 Tape Drives and JE/JM media Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication of virtual tape and TCT objects Grid access to all logical volume and object data independent of where it resides An all flash TS7770 option for improved performance Full Advanced Object Store Grid Cloud support of DS8000 Transparent Cloud Tier Full AES256 encryption for data that is in-flight and at-rest Tight integration with IBM Z and DFSMS policy management DS8000 Object Store with AES256 in-flight encryption and compression Regulatory compliance through Logical WORM and LWORM Retention support Cloud Storage Tier support for archive, logical volume versions, and disaster recovery Optional integration with physical tape 16 Gb IBM FICON® throughput that exceeds 4 GBps per TS7700 cluster Grid Resiliency Support with Control Unit Initiated Reconfiguration (CUIR) support IBM Z hosts view up to 3,968 3490 devices per TS7700 grid TS7770 Cache On Demand feature that uses capacity-based licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1160, IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM Power9® technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.