O'Reilly Data Engineering Books

Practical Lakehouse Architecture

2024-07-31 O'Reilly Amazon

book

Gaurav Ashok Thalpati

data data-engineering storage-repositories data-lake AI/ML BI

This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures. Practical Lakehouse Architecture shows you how to: Understand key lakehouse concepts and features like transaction support, time travel, and schema evolution Understand the differences between traditional and lakehouse data architectures Differentiate between various file formats and table formats Design lakehouse architecture layers for storage, compute, metadata management, and data consumption Implement data governance and data security within the platform Evaluate technologies and decide on the best technology stack to implement the lakehouse for your use case Make critical design decisions and address practical challenges to build a future-ready data platform Start your lakehouse implementation journey and migrate data from existing systems to the lakehouse

IBM FlashCore Module (FCM) Product Guide: Features the newly available FCM4 with AI-powered ransomware detection

2024-07-03 O'Reilly Amazon

book

Vasfi Gucer , Jon Herd , Hartmut Lonzer

data data-engineering IBM AI/ML

This IBM® Redpaper® Product Guide describes the IBM FlashCore Module (FCM) history, a general overview and then a deeper dive on the way IBM leads the field in the adoption of high speed, low latency storage. The IBM FlashCore Module is used in the latest IBM FlashSystem® solutions, which is are next-generation IBM FlashSystem control enclosures. The IBM FlashCore Module combines the performance of flash and a Non-Volatile Memory Express (NVMe) optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) with IBM Storage Virtualize software.

Elastic Stack 8.x Cookbook

2024-06-28 O'Reilly Amazon

book

Yazid Akadiri , Huage Chen

data data-engineering search elasticsearch elastic-stack-elk-stack elastic stack (elk stack)

Unlock the potential of the Elastic Stack with the "Elastic Stack 8.x Cookbook." This book provides over 80 hands-on recipes, guiding you through ingesting, processing, and visualizing data using Elasticsearch, Logstash, Kibana, and more. You'll also explore advanced features like machine learning and observability to create data-driven applications with ease. What this Book will help me do Implement a robust workflow for ingesting, transforming, and visualizing diverse datasets. Utilize Kibana to create insightful dashboards and visual analytics. Leverage Elastic Stack's AI capabilities, such as natural language processing and machine learning. Develop search solutions and integrate advanced features like vector search. Monitor and optimize your Elastic Stack deployments for performance and security. Author(s) Huage Chen and Yazid Akadiri are experienced professionals in the field of Elastic Stack. They bring years of practical experience in data engineering, observability, and software development. Huage and Yazid aim to provide a clear, practical pathway for both beginners and experienced users to get the most out of the Elastic Stack's capabilities. Who is it for? This book is perfect for developers, data engineers, and observability practitioners looking to harness the power of Elastic Stack. It caters to both beginners and experts, providing clear instructions to help readers understand and implement powerful data solutions. If you're working with search applications, data analysis, or system observability, this book is an ideal resource.

The Ultimate Guide to Snowpark

2024-05-30 O'Reilly Amazon

book

Vivekanandan SS , Shankar Narayanan SGS

data data-engineering Snowflake AI/ML Cloud Computing Data Engineering

The Ultimate Guide to Snowpark serves as a comprehensive resource to help you master the Snowflake Snowpark framework using Python. You'll learn how to manage data engineering, data science, and data applications in Snowpark, coupled with practical implementations and examples. By following this guide, you'll gain the skills needed to efficiently process and analyze data in the Snowflake Data Cloud. What this Book will help me do Master Snowpark with Python for data engineering, data science, and data application workloads. Develop and deploy robust data pipelines using Snowpark in Python. Design, implement, and produce machine learning models using Snowpark. Learn to monetize and operationalize Snowflake-native applications. Effectively adopt Snowpark in production for scalable, efficient data solutions. Author(s) Shankar Narayanan SGS and Vivekanandan SS are experienced professionals in data engineering and Snowflake technologies. Shankar has extensive experience in utilizing Snowflake Snowpark to manage and enhance data solutions. Vivekanandan brings expertise in the intersection of Python programming and cloud-based data processing. Together, their combined knowledge and approachable writing style make this book an invaluable resource to readers. Who is it for? This book is designed for data engineers, data scientists, developers, and seasoned data practitioners. Ideal candidates are those looking to expand their skills in implementing Snowpark solutions using Python. A prior understanding of SQL, Python programming, and familiarity with Snowflake is beneficial for readers to fully leverage the techniques presented.

Databricks ML in Action

2024-05-17 O'Reilly Amazon

book

Amanda Baker , Stephanie Rivera , Hayley Horn , Anastasia Prokaieva

data data-engineering Databricks AI/ML Big Data Data Lakehouse

Dive into the Databricks Data Intelligence Platform and learn how to harness its full potential for creating, deploying, and maintaining machine learning solutions. This book covers everything from setting up your workspace to integrating state-of-the-art tools such as AutoML and VectorSearch, imparting practical skills through detailed examples and code. What this Book will help me do Set up and manage a Databricks workspace tailored for effective data science workflows. Implement monitoring to ensure data quality and detect drift efficiently. Build, fine-tune, and deploy machine learning models seamlessly using Databricks tools. Operationalize AI projects including feature engineering, data pipelines, and workflows on the Databricks Lakehouse architecture. Leverage integrations with popular tools like OpenAI's ChatGPT to expand your AI project capabilities. Author(s) This book is authored by Stephanie Rivera, Anastasia Prokaieva, Amanda Baker, and Hayley Horn, seasoned experts in data science and machine learning from Databricks. Their collective years of expertise in big data and AI technologies ensure a rich and insightful perspective. Through their work, they strive to make complex concepts accessible and actionable. Who is it for? This book serves as an ideal guide for machine learning engineers, data scientists, and technically inclined managers. It's well-suited for those transitioning to the Databricks environment or seeking to deepen their Databricks-based machine learning implementation skills. Whether you're an ambitious beginner or an experienced professional, this book provides clear pathways to success.

Prompt Engineering for Generative AI

2024-05-16 O'Reilly Amazon

book

Mike Taylor , James Phoenix

data ai-ml artificial-intelligence-ai generative-ai prompt-engineering AI/ML

Large language models (LLMs) and diffusion models such as ChatGPT and Stable Diffusion have unprecedented potential. Because they have been trained on all the public text and images on the internet, they can make useful contributions to a wide variety of tasks. And with the barrier to entry greatly reduced today, practically any developer can harness LLMs and diffusion models to tackle problems previously unsuitable for automation. With this book, you'll gain a solid foundation in generative AI, including how to apply these models in practice. When first integrating LLMs and diffusion models into their workflows, most developers struggle to coax reliable enough results from them to use in automated systems. Authors James Phoenix and Mike Taylor show you how a set of principles called prompt engineering can enable you to work effectively with AI. Learn how to empower AI to work for you. This book explains: The structure of the interaction chain of your program's AI model and the fine-grained steps in between How AI model requests arise from transforming the application problem into a document completion problem in the model training domain The influence of LLM and diffusion model architecture—and how to best interact with it How these principles apply in practice in the domains of natural language processing, text and image generation, and code

IBM Storage DS8900F Architecture and Implementation: Updated for Release 9.3.2

2024-05-07 O'Reilly Amazon

book

Daniel Beukers , Connie Riggins , Jörg Klemm , Peter Kimmel , Bozhidar Feraliev , Gauurav Sabharwal , Jeff Cook

data data-engineering IBM AI/ML BI

This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM Storage DS8900F family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8900F systems. This edition applies to DS8900F systems with IBM Storage DS8000® Licensed Machine Code (LMC) 7.9.30 (bundle version 89.30.xx.x), referred to as Release 9.3. The DS8900F systems are all-flash exclusively, and they are offered as three classes: DS8980F: Analytic Class: The DS8980F Analytic Class offers best performance for organizations that want to expand their workload possibilities to artificial intelligence (AI), Business Intelligence (BI), and machine learning (ML). IBM DS8950F: Agility Class: The Agility Class consolidates all your mission-critical workloads for IBM Z®, IBM LinuxONE, IBM Power, and distributed environments under a single all-flash storage solution. IBM DS8910F: Flexibility Class: The Flexibility Class reduces complexity while addressing various workloads at the lowest DS8900F family entry cost. The DS8900F architecture relies on powerful IBM POWER9™ processor-based servers that manage the cache to streamline disk input/output (I/O), which maximizes performance and throughput. These capabilities are further enhanced by High-Performance Flash Enclosures (HPFE) Gen2. Like its predecessors, the DS8900F supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning.

Supervised and Unsupervised Data Engineering for Multimedia Data

2024-05-07 O'Reilly Amazon

book

Suman Kumar Swarnkar , J. P. Patra , Tien Anh Tran , Yogesh Kumar Rathore , Sapna Singh Kshatri

data data-engineering AI/ML Data Engineering

SUPERVISED and UNSUPERVISED DATA ENGINEERING for MULTIMEDIA DATA Explore the cutting-edge realms of data engineering in multimedia with Supervised and Unsupervised Data Engineering for Multimedia Data, where expert contributors delve into innovative methodologies, offering invaluable insights to empower both novices and seasoned professionals in mastering the art of manipulating multimedia data with precision and efficiency. Supervised and Unsupervised Data Engineering for Multimedia Data presents a groundbreaking exploration into the intricacies of handling multimedia data through the lenses of both supervised and unsupervised data engineering. Authored by a team of accomplished experts in the field, this comprehensive volume serves as a go-to resource for data scientists, computer scientists, and researchers seeking a profound understanding of cutting-edge methodologies. The book seamlessly integrates theoretical foundations with practical applications, offering a cohesive framework for navigating the complexities of multimedia data. Readers will delve into a spectrum of topics, including artificial intelligence, machine learning, and data analysis, all tailored to the challenges and opportunities presented by multimedia datasets. From foundational principles to advanced techniques, each chapter provides valuable insights, making this book an essential guide for academia and industry professionals alike. Whether you’re a seasoned practitioner or a newcomer to the field, Supervised and Unsupervised Data Engineering for Multimedia Data illuminates the path toward mastery in manipulating and extracting meaningful insights from multimedia data in the modern age.

Apache Iceberg: The Definitive Guide

2024-05-02 O'Reilly Amazon

book

Tomer Shiran , Jason Hughes , Alex Merced

data data-engineering storage-repositories data-lake apache-iceberg AI/ML

Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way. Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg. With this book, you'll learn: The architecture of Apache Iceberg tables What happens under the hood when you perform operations on Iceberg tables How to further optimize Iceberg tables for maximum performance How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.

Natural Language and Search

2024-04-25 O'Reilly Amazon

book

Milind Shyani , Jon Handler , Karen Kilroy

data data-engineering search AI/ML Analytics AWS

When you look at operational analytics and business data analysis activities—such as log analytics, real-time application monitoring, website search, observability, and more—effective search functionality is key to identifying issues, improving customers experience, and increasing operational effectiveness. How can you support your business needs by leveraging ML-driven advancements in search relevance? In this report, authors Jon Handler, Milind Shyani, Karen Kilroy help executives and data scientists explore how ML can enable ecommerce firms to generate more pertinent search results to drive better sales. You'll learn how personalized search helps you quickly find relevant data within applications, websites, and data lake catalogs. You'll also discover how to locate the content available in CRM systems and document stores. This report helps you: Address the challenges of traditional document search, including data preparation and ingestion Leverage ML techniques to improve search outcomes and the relevance of documents you retrieve Discover what makes a good search solution that's reliable, scalable, and can drive your business forward Learn how to choose a search solution to improve your decision-making process With advancements in ML-driven search, businesses can realize even more benefits and improvements in their data and document search capabilities to better support their own business needs and the needs of their customers. About the authors: Jon Handler is a senior principal solutions architect at Amazon Web Services. Milind Shyani is an applied scientist at Amazon Web Services working on large language models, information retrieval and machine learning algorithms. Karen Kilroy, CEO of Kilroy Blockchain, is a lifelong technologist, full stack software engineer, speaker, and author living in Northwest Arkansas.

Engineering Data Mesh in Azure Cloud

2024-03-29 O'Reilly Amazon

book

Aniruddha Deswandikar

data data-engineering database-architecture data-mesh AI/ML Analytics

Discover how to implement a modern data mesh architecture using Microsoft Azure's Cloud Adoption Framework. In this book, you'll learn the strategies to decentralize data while maintaining strong governance, turning your current analytics struggles into scalable and streamlined processes. Unlock the potential of data mesh to achieve advanced and democratized analytics platforms. What this Book will help me do Learn to decentralize data governance and integrate data domains effectively. Master strategies for building and implementing data contracts suited to your organization's needs. Explore how to design a landing zone for a data mesh using Azure's Cloud Adoption Framework. Understand how to apply key architecture patterns for analytics, including AI and machine learning. Gain the knowledge to scale analytics frameworks using modern cloud-based platforms. Author(s) None Deswandikar is a seasoned data architect with extensive experience in implementing cutting-edge data solutions in the cloud. With a passion for simplifying complex data strategies, None brings real-world customer experiences into practical guidance. This book reflects None's dedication to helping organizations achieve their data goals with clarity and effectiveness. Who is it for? This book is ideal for chief data officers, data architects, and engineers seeking to transform data analytics frameworks to accommodate advanced workloads. Especially useful for professionals aiming to implement cloud-based data mesh solutions, it assumes familiarity with centralized data systems, data lakes, and data integration techniques. If modernizing your organization's data strategy appeals to you, this book is for you.

Data Science and Machine Learning Applications in Subsurface Engineering

2024-02-06 O'Reilly Amazon

book

Daniel Asante Otchere

data ai-ml machine-learning AI/ML Data Science

This book provides comprehensive research and explores the different applications of data science and machine learning in subsurface engineering.

Handbook of Geospatial Artificial Intelligence

2023-12-29 O'Reilly Amazon

book

Wenwen Li , Yingjie Hu , Song Gao

data data-engineering location-data geographic-information-system-gis geographic information system (gis) AI/ML

Geospatial Artificial Intelligence (GeoAI) is the integration of geospatial studies and AI using machine learning and deep learning technologies. This comprehensive handbook explains and discusses key fundamental concepts, methods, models, technologies of GeoAI, recent advances, research tools, and applications in different fields.

Elasticsearch in Action, Second Edition

2023-12-10 O'Reilly Amazon

book

Madhusudhan Konda

data data-engineering search elasticsearch AI/ML Analytics

Build powerful, production-ready search applications using the incredible features of Elasticsearch. In Elasticsearch in Action, Second Edition you will discover: Architecture, concepts, and fundamentals of Elasticsearch Installing, configuring, and running Elasticsearch and Kibana Creating an index with custom settings Data types, mapping fundamentals, and templates Fundamentals of text analysis and working with text analyzers Indexing, deleting, and updating documents Indexing data in bulk, and reindexing and aliasing operations Learning search concepts, relevancy scores, and similarity algorithms Elasticsearch in Action, Second Edition teaches you to build scalable search applications using Elasticsearch. This completely new edition explores Elasticsearch fundamentals from the ground up. You’ll deep dive into design principles, search architectures, and Elasticsearch’s essential APIs. Every chapter is clearly illustrated with diagrams and hands-on examples. You’ll even explore real-world use cases for full text search, data visualizations, and machine learning. Plus, its comprehensive nature means you’ll keep coming back to the book as a handy reference! About the Technology Create fully professional-grade search engines with Elasticsearch and Kibana! Rewritten for the latest version of Elasticsearch, this practical book explores Elasticsearch’s high-level architecture, reveals infrastructure patterns, and walks through the search and analytics capabilities of numerous Elasticsearch APIs. About the Book Elasticsearch in Action, Second Edition teaches you how to add modern search features to websites and applications using Elasticsearch 8. In it, you’ll quickly progress from the basics of installation and configuring clusters, to indexing documents, advanced aggregations, and putting your servers into production. You’ll especially appreciate the mix of technical detail with techniques for designing great search experiences. What's Inside Understanding search architecture Full text and term-level search queries Analytics and aggregations High-level visualizations in Kibana Configure, scale, and tune clusters About the Reader For application developers comfortable with scripting and command-line applications. About the Author Madhusudhan Konda is a full-stack lead engineer, architect, mentor, and conference speaker. He delivers live online training on Elasticsearch and the Elastic Stack. Quotes Madhu’s passion comes across in the depth and breadth of this book, the enthusiastic tone, and the hands-on examples. I hope you will take what you have read and put it ‘in action’. - From the Foreword by Shay Banon, Founder of Elasticsearch Practical and well-written. A great starting point for beginners and a comprehensive guide for more experienced professionals. - Simona Russo, Serendipity The author’s excitement is evident from the first few paragraphs. Couple that with extensive experience and technical prowess, and you have an instant classic. - Herodotos Koukkides and Semi Koen, Global Japanese Financial Institution

Vector Search for Practitioners with Elastic

2023-11-30 O'Reilly Amazon

book

Bahaaldine Azarmi , Jeff Vestal

data data-engineering search AI/ML Data Management ELK

The book "Vector Search for Practitioners with Elastic" provides a comprehensive guide to leveraging vector search technology within Elastic for applications in NLP, cybersecurity, and observability. By exploring practical examples and advanced techniques, this book teaches you how to optimize and implement vector search to address complex challenges in modern data management. What this Book will help me do Gain a deep understanding of implementing vector search with Elastic. Learn techniques to optimize vector data storage and retrieval for practical applications. Understand how to apply vector search for image similarity in Elastic. Discover methods for utilizing vector search for security and observability enhancements. Develop skills to integrate modern NLP tools with vector databases and Elastic. Author(s) Bahaaldine Azarmi, with his extensive experience in Elastic and NLP technologies, brings a practitioner's insight into the world of vector search. Co-author None Vestal contributes expertise in observability and system optimization. Together, they deliver practical and actionable knowledge in a clear and approachable manner. Who is it for? This book is designed for data professionals seeking to deepen their expertise in vector search and Elastic technologies. It is ideal for individuals in observability, search technology, or cybersecurity roles. If you have foundational knowledge in machine learning models, Python, and Elastic, this book will enable you to effectively utilize vector search in your projects.

Cyber Resiliency with IBM Storage Sentinel and IBM Storage Safeguarded Copy

2023-10-23 O'Reilly Amazon

book

David Green , Vasfi Gucer , Thomas Gerisch , Axel Westphal , Nezih Boyacioglu , Gerd Franke , Daniel Thompson , Guillaume Legmar , Christopher Vollmar , Markus Standau

data data-engineering IBM AI/ML Oracle SAP

IBM Storage Sentinel is a cyber resiliency solution for SAP HANA, Oracle, and Epic healthcare systems, designed to help organizations enhance ransomware detection and incident recovery. IBM Storage Sentinel automates the creation of immutable backup copies of your data, then uses machine learning to detect signs of possible corruption and generate forensic reports that help you quickly diagnose and identify the source of the attack. Because IBM Storage Sentinel can intelligently isolate infected backups, your organization can identify the most recent verified and validated backup copies, greatly accelerating your time to recovery. This IBM Redbooks publication explains how to implement a cyber resiliency solution for SAP HANA, Oracle, and Epic healthcare systems using IBM Storage Sentinel and IBM Storage Safeguarded Copy. Target audience of this document is cyber security and storage specialists.

Delta Lake: Up and Running

2023-10-17 O'Reilly Amazon

book

Dan Davis , Bennie Haelen

data data-engineering storage-repositories delta-lake AI/ML Analytics

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADLS, and GCS. This practical book shows data engineers, data scientists, and data analysts how to get Delta Lake and its features up and running. The ultimate goal of building data pipelines and applications is to gain insights from data. You'll understand how your storage solution choice determines the robustness and performance of the data pipeline, from raw data to insights. You'll learn how to: Use modern data management and data engineering techniques Understand how ACID transactions bring reliability to data lakes at scale Run streaming and batch jobs against your data lake concurrently Execute update, delete, and merge commands against your data lake Use time travel to roll back and examine previous data versions Build a streaming data quality pipeline following the medallion architecture

Amazon Redshift: The Definitive Guide

2023-10-03 O'Reilly Amazon

book

Rajesh Francis , Rajiv Gupta , Milind Oke

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse. Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift. By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you: Build a cloud data strategy around Amazon Redshift as foundational data warehouse Get started with Amazon Redshift with simple-to-use data models and design best practices Understand how and when to use Redshift Serverless and Redshift provisioned clusters Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing Learn best practices for security, monitoring, resilience, and disaster recovery Leverage Amazon Redshift integration with other AWS services to unlock additional value

Practical Implementation of a Data Lake: Translating Customer Expectations into Tangible Technical Goals

2023-10-03 O'Reilly Amazon

book

Nayanjyoti Paul

data data-engineering storage-repositories data-lake AI/ML Data Lake

This book explains how to implement a data lake strategy, covering the technical and business challenges architects commonly face. It also illustrates how and why client requirements should drive architectural decisions. Drawing upon a specific case from his own experience, author Nayanjyoti Paul begins with the consideration from which all subsequent decisions should flow: what does your customer need? He also describes the importance of identifying key stakeholders and the key points to focus on when starting a new project. Next, he takes you through the business and technical requirement-gathering process, and how to translate customer expectations into tangible technical goals. From there, you’ll gain insight into the security model that will allow you to establish security and legal guardrails, as well as different aspects of security from the end user’s perspective. You’ll learn which organizational roles need to be onboarded into the data lake, their responsibilities, the services they need access to, and how the hierarchy of escalations should work. Subsequent chapters explore how to divide your data lakes into zones, organize data for security and access, manage data sensitivity, and techniques used for data obfuscation. Audit and logging capabilities in the data lake are also covered before a deep dive into designing data lakes to handle multiple kinds and file formats and access patterns. The book concludes by focusing on production operationalization and solutions to implement a production setup. After completing this book, you will understand how to implement a data lake, the best practices to employ while doing so, and will be armed with practical tips to solve business problems. What You Will Learn Understand the challenges associated with implementing a data lake Explore the architectural patterns and processes used to design a new data lake Design and implement data lake capabilities Associate business requirements with technical deliverables to drive success Who This Book Is For Data Scientists and Architects, Machine Learning Engineers, and Software Engineers.

Data Engineering and Data Science

2023-09-26 O'Reilly Amazon

book

Vinay Jha Pillai , M. Niranjanamurthy , Kukatlapalli Pradeep Kumar , Hari Murthy , Aynur Unal

data data-science AI/ML Data Collection Data Engineering Data Science

DATA ENGINEERING and DATA SCIENCE Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one-stop shop” for the concepts and applications of data science and engineering for data scientists across many industries. The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.

The Unrealized Opportunities with Real-Time Data

2023-09-25 O'Reilly Amazon

book

Federico Castanedo

data data-engineering AI/ML Analytics Data Science Data Streaming

The amount of data generated from various processes and platforms has increased exponentially in the past decade, and the challenges of filtering useful data out of streams of raw data has become even greater. Meanwhile, the essence of making useful insights from that data has become even more important. In this incisive report, Federico Castanedo examines the challenges companies face when acting on data at rest as well as the benefits you unlock when acting on data as it's generated. Data engineers, enterprise architects, CTOs, and CIOs will explore the tools, processes, and mindset your company needs to process streaming data in real time. Learn how to make quick data-driven decisions to gain an edge on competitors. This report helps you: Explore gaps in today's real-time data architectures, including the limitations of real-time analytics to act on data immediately Examine use cases that can't be served efficiently with real-time analytics Understand how stream processing engines work with real-time data Learn how distributed data processing architectures, stream processing, streaming analytics, and event-based architectures relate to real-time data Understand how to transition from traditional batch processing environments to stream processing Federico Castanedo is an academic director and adjunct professor at IE University in Spain. A data science and AI leader, he has extensive experience in academia, industry, and startups.

Serverless Machine Learning with Amazon Redshift ML

2023-08-30 O'Reilly Amazon

book

Debu Panda , Bhanu Pittampally , Sumeet Joshi , Phil Bates

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Serverless Machine Learning with Amazon Redshift ML provides a hands-on guide to using Amazon Redshift Serverless and Redshift ML for building and deploying machine learning models. Through SQL-focused examples and practical walkthroughs, you will learn efficient techniques for cloud data analytics and serverless machine learning. What this Book will help me do Grasp the workflow of building machine learning models with Redshift ML using SQL. Learn to handle supervised learning tasks like classification and regression. Apply unsupervised learning techniques, such as K-means clustering, in Redshift ML. Develop time-series forecasting models within Amazon Redshift. Understand how to operationalize machine learning in serverless cloud architecture. Author(s) Debu Panda, Phil Bates, Bhanu Pittampally, and Sumeet Joshi are seasoned professionals in cloud computing and machine learning technologies. They combine deep technical knowledge with teaching expertise to guide learners through mastering Amazon Redshift ML. Their collaborative approach ensures that the content is accessible, engaging, and practically applicable. Who is it for? This book is perfect for data scientists, machine learning engineers, and database administrators using or intending to use Amazon Redshift. It's tailored for professionals with basic knowledge of machine learning and SQL who aim to enhance their efficiency and specialize in serverless machine learning within cloud architectures.

Graph-Powered Analytics and Machine Learning with TigerGraph

2023-07-24 O'Reilly Amazon

book

Phuc Kien Nguyen , Alexander Thomas , Victor Lee

data data-engineering graph-databases tigergraph AI/ML Analytics

With the rapid rise of graph databases, organizations are now implementing advanced analytics and machine learning solutions to help drive business outcomes. This practical guide shows data scientists, data engineers, architects, and business analysts how to get started with a graph database using TigerGraph, one of the leading graph database models available. You'll explore a three-stage approach to deriving value from connected data: connect, analyze, and learn. Victor Lee, Phuc Kien Nguyen, and Alexander Thomas present real use cases covering several contemporary business needs. By diving into hands-on exercises using TigerGraph Cloud, you'll quickly become proficient at designing and managing advanced analytics and machine learning solutions for your organization. Use graph thinking to connect, analyze, and learn from data for advanced analytics and machine learning Learn how graph analytics and machine learning can deliver key business insights and outcomes Use five core categories of graph algorithms to drive advanced analytics and machine learning Deliver a real-time 360-degree view of core business entities, including customer, product, service, supplier, and citizen Discover insights from connected data through machine learning and advanced analytics

AI for Big Data-Based Engineering Applications from Security Perspectives

2023-06-30 O'Reilly Amazon

book

Balwinder Raj , Shingo Yamaguchi , Brij B. Gupta , Sandeep Singh Gill

Cyber Security security-engineering AI/ML Big Data

This book emphasizes the idea of understanding the motivation of the advanced circuits’ design to establish the AI interface and to mitigate the security attacks in a better way for big data. It is for students, researchers, and professionals, faculty members and software developers who wish to carry out further research.

Geospatial Data Analytics on AWS

2023-06-30 O'Reilly Amazon

book

Jeff DeMuth , Janahan Gnanachandran , Scott Bateman

data data-engineering location-data geographic-information-system-gis geographic information system (gis) AI/ML

In "Geospatial Data Analytics on AWS," you will learn how to store, manage, and analyze geospatial data effectively using various AWS services. This book provides insight into building geospatial data lakes, leveraging AWS databases, and applying best practices to derive insights from spatial data in the cloud. What this Book will help me do Design and manage geospatial data lakes on AWS leveraging S3 and other storage solutions. Analyze geospatial data using AWS services such as Athena and Redshift. Utilize machine learning models for geospatial data processing and analytics using SageMaker. Visualize geospatial data through services like Amazon QuickSight and OpenStreetMap integration. Avoid common pitfalls when managing geospatial data in the cloud. Author(s) Scott Bateman, Janahan Gnanachandran, and Jeff DeMuth bring their extensive experience in cloud computing and geospatial analytics to this book. With backgrounds in cloud architecture, data science, and geospatial applications, they aim to make complex topics accessible. Their collaborative approach ensures readers can practically apply concepts to real-world challenges. Who is it for? This book is ideal for GIS and data professionals, including developers, analysts, and scientists. It suits readers with a basic understanding of geographical concepts but no prior AWS experience. If you're aiming to enhance your cloud-based geospatial data management and analytics skills, this is the guide for you.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Practical Lakehouse Architecture

IBM FlashCore Module (FCM) Product Guide: Features the newly available FCM4 with AI-powered ransomware detection

Elastic Stack 8.x Cookbook

The Ultimate Guide to Snowpark

Databricks ML in Action

Prompt Engineering for Generative AI

IBM Storage DS8900F Architecture and Implementation: Updated for Release 9.3.2

Supervised and Unsupervised Data Engineering for Multimedia Data

Apache Iceberg: The Definitive Guide

Natural Language and Search

Engineering Data Mesh in Azure Cloud

Data Science and Machine Learning Applications in Subsurface Engineering

Handbook of Geospatial Artificial Intelligence

Elasticsearch in Action, Second Edition

Vector Search for Practitioners with Elastic

Cyber Resiliency with IBM Storage Sentinel and IBM Storage Safeguarded Copy

Delta Lake: Up and Running

Amazon Redshift: The Definitive Guide

Practical Implementation of a Data Lake: Translating Customer Expectations into Tangible Technical Goals

Data Engineering and Data Science

The Unrealized Opportunities with Real-Time Data

Serverless Machine Learning with Amazon Redshift ML

Graph-Powered Analytics and Machine Learning with TigerGraph

AI for Big Data-Based Engineering Applications from Security Perspectives

Geospatial Data Analytics on AWS