O'Reilly Data Engineering Books

Hands-On Differential Privacy

2024-05-16 O'Reilly Amazon

book

Mayana Pereira , Ethan Cowan , Michael Shoemate

data data-engineering data-security-privacy data security & privacy

Many organizations today analyze and share large, sensitive datasets about individuals. Whether these datasets cover healthcare details, financial records, or exam scores, it's become more difficult for organizations to protect an individual's information through deidentification, anonymization, and other traditional statistical disclosure limitation techniques. This practical book explains how differential privacy (DP) can help. Authors Ethan Cowan, Michael Shoemate, and Mayana Pereira explain how these techniques enable data scientists, researchers, and programmers to run statistical analyses that hide the contribution of any single individual. You'll dive into basic DP concepts and understand how to use open source tools to create differentially private statistics, explore how to assess the utility/privacy trade-offs, and learn how to integrate differential privacy into workflows. With this book, you'll learn: How DP guarantees privacy when other data anonymization methods don't What preserving individual privacy in a dataset entails How to apply DP in several real-world scenarios and datasets Potential privacy attack methods, including what it means to perform a reidentification attack How to use the OpenDP library in privacy-preserving data releases How to interpret guarantees provided by specific DP data releases

Prompt Engineering for Generative AI

2024-05-16 O'Reilly Amazon

book

Mike Taylor , James Phoenix

data ai-ml artificial-intelligence-ai generative-ai prompt-engineering AI/ML

Large language models (LLMs) and diffusion models such as ChatGPT and Stable Diffusion have unprecedented potential. Because they have been trained on all the public text and images on the internet, they can make useful contributions to a wide variety of tasks. And with the barrier to entry greatly reduced today, practically any developer can harness LLMs and diffusion models to tackle problems previously unsuitable for automation. With this book, you'll gain a solid foundation in generative AI, including how to apply these models in practice. When first integrating LLMs and diffusion models into their workflows, most developers struggle to coax reliable enough results from them to use in automated systems. Authors James Phoenix and Mike Taylor show you how a set of principles called prompt engineering can enable you to work effectively with AI. Learn how to empower AI to work for you. This book explains: The structure of the interaction chain of your program's AI model and the fine-grained steps in between How AI model requests arise from transforming the application problem into a document completion problem in the model training domain The influence of LLM and diffusion model architecture—and how to best interact with it How these principles apply in practice in the domains of natural language processing, text and image generation, and code

XML and Related Technologies

2024-05-09 O'Reilly Amazon

book

Atul kahate

data data-engineering storage-formats XML C#/.NET Java

About The Author – Atul Kahate has over 13 years of experience in Information Technology in India and abroad in various capacities. He has done his Bachelor of Science degree in Statistics and his Master of Business Administration in Computer Systems. He has authored 17 highly acclaimed books on various areas of Information Technology. Several of his books are being used as course textbooks or sources of reference in a number of universities/colleges/IT companies all over the world. Atul has been writing articles in newspapers about cricket, since the age of 12. He has also authored two books on cricket and has written over 2000 articles on IT and cricket. He has a deep interest in teaching, music, and cricket besides technology. He has conducted several training programs, in a number of educational institutions and IT organisations, on a wide range of technologies. Some of the prestigious institutions where he has conducted training programs, include IIT, Symbiosis, I2IT, MET, Indira Institute of Management, Fergusson College, MIT, VIIT, MIT, Walchand Government Engineering College besides numerous other colleges in India.

Book Content – 1. Introduction to XML, 2. XML Syntaxes, 3. Document Type Definitions, 4. XML Schemas 5. Cascading Style Sheets, 6. Extensible Stylesheet Language, 7. XML and Java, 8. XML and ASP.NET, 9. Web Services and AJAX, 10. XML Security, Appendix – Miscellaneous Topics

IBM Storage DS8900F Architecture and Implementation: Updated for Release 9.3.2

2024-05-07 O'Reilly Amazon

book

Daniel Beukers , Connie Riggins , Jörg Klemm , Peter Kimmel , Bozhidar Feraliev , Gauurav Sabharwal , Jeff Cook

data data-engineering IBM AI/ML BI

This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM Storage DS8900F family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8900F systems. This edition applies to DS8900F systems with IBM Storage DS8000® Licensed Machine Code (LMC) 7.9.30 (bundle version 89.30.xx.x), referred to as Release 9.3. The DS8900F systems are all-flash exclusively, and they are offered as three classes: DS8980F: Analytic Class: The DS8980F Analytic Class offers best performance for organizations that want to expand their workload possibilities to artificial intelligence (AI), Business Intelligence (BI), and machine learning (ML). IBM DS8950F: Agility Class: The Agility Class consolidates all your mission-critical workloads for IBM Z®, IBM LinuxONE, IBM Power, and distributed environments under a single all-flash storage solution. IBM DS8910F: Flexibility Class: The Flexibility Class reduces complexity while addressing various workloads at the lowest DS8900F family entry cost. The DS8900F architecture relies on powerful IBM POWER9™ processor-based servers that manage the cache to streamline disk input/output (I/O), which maximizes performance and throughput. These capabilities are further enhanced by High-Performance Flash Enclosures (HPFE) Gen2. Like its predecessors, the DS8900F supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning.

Supervised and Unsupervised Data Engineering for Multimedia Data

2024-05-07 O'Reilly Amazon

book

Suman Kumar Swarnkar , J. P. Patra , Tien Anh Tran , Yogesh Kumar Rathore , Sapna Singh Kshatri

data data-engineering AI/ML Data Engineering

SUPERVISED and UNSUPERVISED DATA ENGINEERING for MULTIMEDIA DATA Explore the cutting-edge realms of data engineering in multimedia with Supervised and Unsupervised Data Engineering for Multimedia Data, where expert contributors delve into innovative methodologies, offering invaluable insights to empower both novices and seasoned professionals in mastering the art of manipulating multimedia data with precision and efficiency. Supervised and Unsupervised Data Engineering for Multimedia Data presents a groundbreaking exploration into the intricacies of handling multimedia data through the lenses of both supervised and unsupervised data engineering. Authored by a team of accomplished experts in the field, this comprehensive volume serves as a go-to resource for data scientists, computer scientists, and researchers seeking a profound understanding of cutting-edge methodologies. The book seamlessly integrates theoretical foundations with practical applications, offering a cohesive framework for navigating the complexities of multimedia data. Readers will delve into a spectrum of topics, including artificial intelligence, machine learning, and data analysis, all tailored to the challenges and opportunities presented by multimedia datasets. From foundational principles to advanced techniques, each chapter provides valuable insights, making this book an essential guide for academia and industry professionals alike. Whether you’re a seasoned practitioner or a newcomer to the field, Supervised and Unsupervised Data Engineering for Multimedia Data illuminates the path toward mastery in manipulating and extracting meaningful insights from multimedia data in the modern age.

Digital Transformation of SAP Supply Chain Processes: Build Mobile Apps Using SAP BTP and SAP Mobile Services

2024-05-06 O'Reilly Amazon

book

Pranay Gupta

data data-engineering SAP API JSON

Take a high-level tour of SAP oDATA integrations with frontend technologies like Angular using the SAP Mobile Services Platform. This book will give you a different perspective on executing SAP transactions on iOS using Angular instead of SAP-provided Fiori-based applications. You’ll start by learning about SAP supply chain processes such as Goods Receipt, Transfer Posting, Goods Issue, and Inventory Search. You’ll then move on to understanding the thought process involved in integrating SAP's backend (SAP ECC) with Angular iOS app using SAP Mobile Services running on SAP BTP. All this will serve as a guide tailored to SAP functional and technical consultants actively engaged in client-facing roles. You’ll follow a roadmap for modernizing and streamlining supply chain operations by leveraging Angular iOS apps. Digital Transformation of SAP Supply Chain Processes provides the essential tools for businesses looking to stay competitive in today's technology-driven landscape. What You Will Learn Study the fundamental procedures to set up the Authorization Endpoint, Token Endpoint, and base URL within SAP Mobile Services. Manage attachments in mobile applications and store them in an external content repository. Gain proficiency in testing OData services using the POSTMAN API client with OAuth protocol. Acquire knowledge about the JSON messages, CORS protocol, and X-CSRF token exchange. Link Zebra Printers through the Zebra Native Printing app on iOS App to print SAP forms on mobile printers. Who This Book Is For SAP Consultants with an interest in the Digital Transformation of SAP Supply Chain Processes to iOS-based SAP transactions.

ArcGIS Pro 3.x Cookbook - Second Edition

2024-05-03 O'Reilly Amazon

book

Tripp Corbin, GISP

data data-engineering location-data geographic-information-system-gis arcgis Data Management

ArcGIS Pro 3.x Cookbook teaches you to master the powerful tools available in Esri's ArcGIS Pro application for geospatial data management and analysis. You'll discover practical recipes that guide you through creating, editing, visualizing, and analyzing GIS data in 2D and 3D. Whether you are transitioning from ArcMap or starting fresh, this book will empower you to build impressive geospatial projects. What this Book will help me do Navigate and make effective use of the ArcGIS Pro user interface and tools. Create, edit, and publish detailed 2D and 3D geospatial maps. Manage data efficiently using geodatabases, relationships, and topology tools. Perform comprehensive spatial analyses including proximity, clustering, and 3D analysis. Apply geospatial data validation techniques to ensure data consistency and integrity. Author(s) Tripp Corbin, GISP, is an experienced Geographic Information Systems Professional with extensive expertise in Esri's GIS ecosystem. Tripp has taught numerous colleagues about ArcGIS Pro and its capabilities, bringing clarity and focus to complex GIS concepts. His engaging teaching style and comprehensive technical knowledge make this book both helpful and approachable for readers. Who is it for? This book is designed for GIS professionals, geospatial analysts, and technicians looking to expand their skills with ArcGIS Pro. It's well-suited for architects and specialists who want to visualize, analyze, and create GIS projects effectively. Beginner GIS users will find clear guidance without needing prior experience, and experienced ArcMap users will learn how to transition smoothly to ArcGIS Pro.

Apache Iceberg: The Definitive Guide

2024-05-02 O'Reilly Amazon

book

Tomer Shiran , Jason Hughes , Alex Merced

data data-engineering storage-repositories data-lake apache-iceberg AI/ML

Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way. Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg. With this book, you'll learn: The architecture of Apache Iceberg tables What happens under the hood when you perform operations on Iceberg tables How to further optimize Iceberg tables for maximum performance How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.

Mastering MySQL Administration: High Availability, Security, Performance, and Efficiency

2024-05-02 O'Reilly Amazon

book

Arun Kumar Samayam , Y V Ravi Kumar , Naresh Kumar Miryala

data data-engineering relational-databases MySQL Cyber Security

This book is your one-stop resource on MySQL database installation and server management for administrators. It covers installation, upgrades, monitoring, high availability, disaster recovery, security, and performance and troubleshooting. You will become fluent in MySQL 8.2, the latest version of the highly scalable and robust relational database system. With a hands-on approach, the book offers step-by-step guidance on installing, upgrading, and establishing robust high availability and disaster recovery capabilities for MySQL databases. It also covers high availability with InnoDB and NDB clusters, MySQL routers and enterprise MySQL tools, along with robust security design and performance techniques. Throughout, the authors punctuate concepts with examples taken from their experience with large-scale implementations at companies such as Meta and American Airlines, anchoring this practical guide to MySQL 8.2 administration in the real world. What YouWill Learn Understand MySQL architecture and best practices for administration of MySQL server Configure high availability, replication, disaster recovery with InnoDB and NDB engines Back up and restore with MySQL utilities and tools, and configure the database for zero data loss Troubleshoot with steps for real-world critical errors and detailed solutions Who This Book Is For Technical professionals, database administrators, developers, and engineers seeking to optimize MySQL databases for scale, security, and performance

Data Engineering with Google Cloud Platform - Second Edition

2024-04-30 O'Reilly Amazon

book

Adi Wijaya

it-operations cloud-computing cloud-platforms google-cloud BigQuery CI/CD

Data Engineering with Google Cloud Platform is your ultimate guide to building scalable data platforms using Google Cloud technologies. In this book, you will learn how to leverage products such as BigQuery, Cloud Composer, and Dataplex for efficient data engineering. Expand your expertise and gain practical knowledge to excel in managing data pipelines within the Google Cloud ecosystem. What this Book will help me do Understand foundational data engineering concepts using Google Cloud Platform. Learn to build and manage scalable data pipelines with tools such as Dataform and Dataflow. Explore advanced topics like data governance and secure data handling in Google Cloud. Boost readiness for Google Cloud data engineering certification with real-world exam guidance. Master cost-effective strategies and CI/CD practices for data engineering on Google Cloud. Author(s) Adi Wijaya, the author of this book, is a Data Strategic Cloud Engineer at Google with extensive experience in data engineering and the Google Cloud ecosystem. With his hands-on expertise, he emphasizes practical solutions and in-depth knowledge sharing, guiding readers through the intricacies of Google Cloud for data engineering success. Who is it for? This book is ideal for data analysts, IT practitioners, software engineers, and data enthusiasts aiming to excel in data engineering. Whether you're a beginner tackling fundamental concepts or an experienced professional exploring Google Cloud's advanced capabilities, this book is designed for you. It bridges your current skills with modern data engineering practices on Google Cloud, making it a valuable resource at any stage of your career.

Protocol Buffers Handbook

2024-04-30 O'Reilly Amazon

book

Clément Jean

data data-engineering storage-formats protocol-buffers API Protobuf

The "Protocol Buffers Handbook" by Clément Jean offers an in-depth exploration of Protocol Buffers (Protobuf), a powerful data serialization format. Learn everything from syntax and schema evolution to custom validations and cross-language integrations. With practical examples in Go and Python, this guide empowers you to efficiently serialize and manage structured data across platforms. What this Book will help me do Develop advanced skills in using Protocol Buffers (Protobuf) for efficient data serialization. Master the key concepts of Protobuf syntax and schema evolution for compatibility. Learn to create custom validation plugins and tailor Protobuf processes. Integrate Protobuf with multiple programming environments, including Go and Python. Automate Protobuf projects using tools like Buf and Bazel to streamline workflows. Author(s) Clément Jean is a skilled programmer and technical writer specializing in data serialization and distributed systems. With substantial experience in developing scalable microservices, he shares valuable insights into using Protocol Buffers effectively. Through this book, Clément offers a hands-on approach to Protobuf, blending theory with practical examples derived from real-world scenarios. Who is it for? This book is perfect for software engineers, system integrators, and data architects who aim to optimize data serialization and APIs, regardless of their programming language expertise. Beginners will grasp foundational Protobuf concepts, while experienced developers will extend their knowledge to advanced, practical applications. Those working with microservices and heavily data-dependent systems will find this book especially relevant.

IBM Storage FlashSystem 5200 Product Guide for IBM Storage Virtualize 8.6

2024-04-29 O'Reilly Amazon

book

Vasfi Gucer , Hartmut Lonzer , Corne Lottering

data data-engineering IBM Cloud Computing Marketing SAS

This IBM® Redpaper® Product Guide publication describes the IBM Storage FlashSystem® 5200 solution, which is a next-generation IBM Storage FlashSystem control enclosure. It is an NVMe end-to-end platform that is targeted at the entry and midrange market and delivers the full capabilities of IBM FlashCore® technology. It also provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following features: Data reduction and deduplication Dynamic tiering Thin provisioning Snapshots Cloning Replication Data copy services Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for high availability (HA) Scale-out and scale-up configurations further enhance capacity and throughput for better availability. The IBM Storage FlashSystem 5200 is a high-performance storage solution that is based on a revolutionary 1U form factor. It consists of 12 NVMe Flash Devices in a 1U storage enclosure drawer with full redundant canister components and no single point of failure. It is designed for businesses of all sizes, including small, remote, branch offices and regional clients. It is a smarter, self-optimizing solution that requires less management, which enables organizations to overcome their storage challenges. Flash has come of age and price point reductions mean that lower parts of the storage market are seeing the value of moving over to flash and NVMe--based solutions. The IBM Storage FlashSystem 5200 advances this transition by providing incredibly dense tiers of flash in a more affordable package. With the benefit of IBM FlashCore Module compression and new QLC flash-based technology becoming available, a compelling argument exists to move away from Nearline SAS storage and on to NVMe. This Product Guide is aimed at pre-sales and post-sales technical support and marketing and storage administrators.

IBM Storage FlashSystem 9500 Product Guide for IBM Storage Virtualize 8.6

2024-04-29 O'Reilly Amazon

book

Vasfi Gucer , Jon Herd , Hartmut Lonzer

data data-engineering IBM BI Cloud Computing Cyber Security

This IBM® Redpaper® Product Guide describes the IBM Storage FlashSystem® 9500 solution, which is a next-generation IBM Storage FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Storage Virtualize. Often, applications exist that are foundational to the operations and success of an enterprise. These applications might function as prime revenue generators, guide or control important tasks, or provide crucial business intelligence, among many other jobs. Whatever their purpose, they are mission critical to the organization. They demand the highest levels of performance, functionality, security, and availability. They also must be protected against the newer threat of cyberattacks. To support such mission-critical applications, enterprises of all types and sizes turn to the IBM Storage FlashSystem 9500. IBM Storage FlashSystem 9500 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for HA Scale-out and scale-up configurations that further enhance capacity and throughput for better availability This Redpaper applies to IBM Storage Virtualize V8.6.

Learn SQL using MySQL in One Day and Learn It Well

2024-04-26 O'Reilly Amazon

book

Jamie Chan

data data-engineering relational-databases MySQL SQL

"Learn SQL using MySQL in One Day and Learn It Well" is your hands-on guide to mastering SQL efficiently using MySQL. This book takes you from understanding basic database concepts to executing advanced queries and implementing essential features like triggers and routines. With a project-based approach, you will confidently manage databases and unlock the potential of data. What this Book will help me do Understand database concepts and relational data architecture. Design and define tables to organize and store data effectively. Perform advanced SQL queries to manipulate and analyze data efficiently. Implement database triggers, views, and routines for advanced management. Apply practical skills in SQL through a comprehensive hands-on project. Author(s) Jamie Chan is a professional instructor and technical writer with extensive experience in database management and software development. Known for a clear and engaging teaching style, Jamie has authored numerous books focusing on hands-on learning. Jamie approaches pedagogy with the goal of making technical subjects accessible and practical for all learners. Who is it for? This book is designed for beginners eager to learn SQL and MySQL from scratch. It is perfect for professionals or students who want relevant and actionable skills in database management. Whether you're looking to enhance career prospects or leverage database tools for personal projects, this book is your practical starting point. Basic computer literacy is all that's needed.

Natural Language and Search

2024-04-25 O'Reilly Amazon

book

Milind Shyani , Jon Handler , Karen Kilroy

data data-engineering search AI/ML Analytics AWS

When you look at operational analytics and business data analysis activities—such as log analytics, real-time application monitoring, website search, observability, and more—effective search functionality is key to identifying issues, improving customers experience, and increasing operational effectiveness. How can you support your business needs by leveraging ML-driven advancements in search relevance? In this report, authors Jon Handler, Milind Shyani, Karen Kilroy help executives and data scientists explore how ML can enable ecommerce firms to generate more pertinent search results to drive better sales. You'll learn how personalized search helps you quickly find relevant data within applications, websites, and data lake catalogs. You'll also discover how to locate the content available in CRM systems and document stores. This report helps you: Address the challenges of traditional document search, including data preparation and ingestion Leverage ML techniques to improve search outcomes and the relevance of documents you retrieve Discover what makes a good search solution that's reliable, scalable, and can drive your business forward Learn how to choose a search solution to improve your decision-making process With advancements in ML-driven search, businesses can realize even more benefits and improvements in their data and document search capabilities to better support their own business needs and the needs of their customers. About the authors: Jon Handler is a senior principal solutions architect at Amazon Web Services. Milind Shyani is an applied scientist at Amazon Web Services working on large language models, information retrieval and machine learning algorithms. Karen Kilroy, CEO of Kilroy Blockchain, is a lifelong technologist, full stack software engineer, speaker, and author living in Northwest Arkansas.

Bio-Inspired Strategies for Modeling and Detection in Diabetes Mellitus Treatment

2024-04-18 O'Reilly Amazon

book

Alma Y Alanis , Oscar D Sánchez , Marco Perez Cisneros , Alonso Vaca Gonzalez

data data-engineering data-models

Bio-Inspired Strategies for Modeling and Detection in Diabetes Mellitus Treatment focuses on bio-inspired techniques such as modelling to generate control algorithms for the treatment of diabetes mellitus. The book addresses the identification of diabetes mellitus using a high-order recurrent neural network trained by the extended Kalman filter. The authors also describe the use of metaheuristic algorithms for the parametric identification of compartmental models of diabetes mellitus widely used in research works such as the Sorensen model and the Dallaman model. In addition, the book addresses the modelling of time series for the prediction of risk scenarios such as hyperglycaemia and hypoglycaemia using deep neural networks. The detection of diabetes mellitus in early stages or when current diagnostic techniques cannot detect glucose intolerance or prediabetes is proposed, carried out by means of deep neural networks in force in the literature. Readers will find leading-edge research in diabetes identification based on discrete high-order neural networks trained with the extended Kalman filter; parametric identification of compartmental models used to describe diabetes mellitus; modelling of data obtained by continuous glucose monitoring sensors for the prediction of risk scenarios such as hyperglycaemia and hypoglycaemia; and screening for glucose intolerance using glucose tolerance test data and deep neural networks. Application of the proposed approaches is illustrated via simulation and real-time implementations for modelling, prediction, and classification.Addresses the online identification of diabetes mellitus using a high-order recurrent neural network trained online by an extended Kalman filter. Covers parametric identification of compartmental models used to describe diabetes mellitus. Provides modeling of data obtained by continuous glucose-monitoring sensors for the prediction of risk scenarios such as hyperglycaemia and hypoglycaemia.

Software Engineering for Data Scientists

2024-04-16 O'Reilly Amazon

book

Catherine Nelson

data data-science API Data Science NumPy Pandas

Data science happens in code. The ability to write reproducible, robust, scaleable code is key to a data science project's success—and is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering, and clearly explains how to apply the best practices from software engineering to data science. Examples are provided in Python, drawn from popular packages such as NumPy and pandas. If you want to write better data science code, this guide covers the essential topics that are often missing from introductory data science or coding classes, including how to: Understand data structures and object-oriented programming Clearly and skillfully document your code Package and share your code Integrate data science code with a larger code base Learn how to write APIs Create secure code Apply best practices to common tasks such as testing, error handling, and logging Work more effectively with software engineers Write more efficient, maintainable, and robust code in Python Put your data science projects into production And more

ISV IBM zPDT Updates

2024-04-10 O'Reilly Amazon

book

Bill Ogden

data data-engineering IBM

This IBM Redpaper publication covers important elements of zPDT GA12 including the last fixpack for GA11.

Hacked

2024-04-03 O'Reilly Amazon

book

Dr. Jessica Barker

data data-engineering data-security-privacy data security & privacy

Discover the strategies, secrets and stories behind the cyber attacks that target businesses and individuals across the world and learn how you can safeguard yourself against them.

Engineering Data Mesh in Azure Cloud

2024-03-29 O'Reilly Amazon

book

Aniruddha Deswandikar

data data-engineering database-architecture data-mesh AI/ML Analytics

Discover how to implement a modern data mesh architecture using Microsoft Azure's Cloud Adoption Framework. In this book, you'll learn the strategies to decentralize data while maintaining strong governance, turning your current analytics struggles into scalable and streamlined processes. Unlock the potential of data mesh to achieve advanced and democratized analytics platforms. What this Book will help me do Learn to decentralize data governance and integrate data domains effectively. Master strategies for building and implementing data contracts suited to your organization's needs. Explore how to design a landing zone for a data mesh using Azure's Cloud Adoption Framework. Understand how to apply key architecture patterns for analytics, including AI and machine learning. Gain the knowledge to scale analytics frameworks using modern cloud-based platforms. Author(s) None Deswandikar is a seasoned data architect with extensive experience in implementing cutting-edge data solutions in the cloud. With a passion for simplifying complex data strategies, None brings real-world customer experiences into practical guidance. This book reflects None's dedication to helping organizations achieve their data goals with clarity and effectiveness. Who is it for? This book is ideal for chief data officers, data architects, and engineers seeking to transform data analytics frameworks to accommodate advanced workloads. Especially useful for professionals aiming to implement cloud-based data mesh solutions, it assumes familiarity with centralized data systems, data lakes, and data integration techniques. If modernizing your organization's data strategy appeals to you, this book is for you.

The Definitive Guide to Data Integration

2024-03-29 O'Reilly Amazon

book

Raphaël MANSUY , Emeric CHAIZE , Mehdi TAZI , Pierre-Yves BONNEFOY

data data-engineering API Cloud Computing Data Engineering Data Quality

Master the modern data stack with 'The Definitive Guide to Data Integration.' This comprehensive book covers the key aspects of data integration, including data sources, storage, transformation, governance, and more. Equip yourself with the knowledge and hands-on skills to manage complex datasets and unlock your data's full potential. What this Book will help me do Understand how to integrate diverse datasets efficiently using modern tools. Develop expertise in designing and implementing robust data integration workflows. Gain insights into real-time data processing and cloud-based data architectures. Learn best practices for data quality, governance, and compliance in integration. Master the use of APIs, workflows, and transformation patterns in practice. Author(s) The authors, None Bonnefoy, None Chaize, Raphaël Mansuy, and Mehdi Tazi, are seasoned experts in data engineering and integration. They bring years of experience in modern data technologies and consulting. Their approachable writing style ensures that readers at various skill levels can grasp complex concepts effectively. Who is it for? This book is ideal for data engineers, architects, analysts, and IT professionals. Whether you're new to data integration or looking to deepen your expertise, this guide caters to individuals seeking to navigate the challenges of the modern data stack.

Azure Data Factory by Example: Practical Implementation for Data Engineers

2024-03-22 O'Reilly Amazon

book

Richard Swinbank

data data-engineering storage-repositories data-lake Analytics Azure

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. This edition, updated for 2024, includes the latest developments to the Azure Data Factory service: Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics. Improvements to flow control provided by activity deactivation and the Fail activity. The introduction of reusable data flow components such as user-defined functions and flowlets. Extensions to integration runtime capabilities including Managed VNet support. The ability to trigger pipelines in response to custom events. Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

IBM GDPS: An Introduction to Concepts and Capabilities

2024-03-21 O'Reilly Amazon

book

John Thompson , Rosazila Musa , David Draper , Mairi Jane Lee , Marie-France Narbey

data data-engineering IBM

This IBM Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery (DR), along with issues that are related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for high availability and disaster recovery (HADR). Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM® also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

The Complete Developer

2024-03-19 O'Reilly Amazon

book

Martin Krause

data data-engineering nosql-databases MongoDB API Docker

Whether you’ve been in the developer kitchen for decades or are just taking the plunge to do it yourself, The Complete Developer will show you how to build and implement every component of a modern stack—from scratch. You’ll go from a React-driven frontend to a fully fleshed-out backend with Mongoose, MongoDB, and a complete set of REST and GraphQL APIs, and back again through the whole Next.js stack. The book’s easy-to-follow, step-by-step recipes will teach you how to build a web server with Express.js, create custom API routes, deploy applications via self-contained microservices, and add a reactive, component-based UI. You’ll leverage command line tools and full-stack frameworks to build an application whose no-effort user management rides on GitHub logins. You’ll also learn how to: Work with modern JavaScript syntax, TypeScript, and the Next.js framework Simplify UI development with the React library Extend your application with REST and GraphQL APIs Manage your data with the MongoDB NoSQL database Use OAuth to simplify user management, authentication, and authorization Automate testing with Jest, test-driven development, stubs, mocks, and fakes Whether you’re an experienced software engineer or new to DIY web development, The Complete Developer will teach you to succeed with the modern full stack. After all, control matters. Covers: Docker, Express.js, JavaScript, Jest, MongoDB, Mongoose, Next.js, Node.js, OAuth, React, REST and GraphQL APIs, and TypeScript

Fundamentals of Organizational Behaviour

2024-03-03 O'Reilly Amazon

book

Chia-Yu Kou-Barrett

data data-engineering relational-databases database-theory

Explore the practical implications and relevance of organizational behaviour with this concise textbook that successfully bridges the gap between theory and practice.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Hands-On Differential Privacy

Prompt Engineering for Generative AI

XML and Related Technologies

IBM Storage DS8900F Architecture and Implementation: Updated for Release 9.3.2

Supervised and Unsupervised Data Engineering for Multimedia Data

Digital Transformation of SAP Supply Chain Processes: Build Mobile Apps Using SAP BTP and SAP Mobile Services

ArcGIS Pro 3.x Cookbook - Second Edition

Apache Iceberg: The Definitive Guide

Mastering MySQL Administration: High Availability, Security, Performance, and Efficiency

Data Engineering with Google Cloud Platform - Second Edition

Protocol Buffers Handbook

IBM Storage FlashSystem 5200 Product Guide for IBM Storage Virtualize 8.6

IBM Storage FlashSystem 9500 Product Guide for IBM Storage Virtualize 8.6

Learn SQL using MySQL in One Day and Learn It Well

Natural Language and Search

Bio-Inspired Strategies for Modeling and Detection in Diabetes Mellitus Treatment

Software Engineering for Data Scientists

ISV IBM zPDT Updates

Hacked

Engineering Data Mesh in Azure Cloud

The Definitive Guide to Data Integration

Azure Data Factory by Example: Practical Implementation for Data Engineers

IBM GDPS: An Introduction to Concepts and Capabilities

The Complete Developer

Fundamentals of Organizational Behaviour