O'Reilly Data Engineering Books

Building Effective Privacy Programs

2025-08-25 O'Reilly Amazon

book

Jason Edwards , Griffin Weaver

data data-engineering data-security-privacy data security & privacy AI/ML Blockchain

Presents a structured approach to privacy management, an indispensable resource for safeguarding data in an ever-evolving digital landscape In today’s data-driven world, protecting personal information has become a critical priority for organizations of all sizes. Building Effective Privacy Programs: Cybersecurity from Principles to Practice equips professionals with the tools and knowledge to design, implement, and sustain robust privacy programs. Seamlessly integrating foundational principles, advanced privacy concepts, and actionable strategies, this practical guide serves as a detailed roadmap for navigating the complex landscape of data privacy. Bridging the gap between theoretical concepts and practical implementation, Building Effective Privacy Programs combines in-depth analysis with practical insights, offering step-by-step instructions on building privacy-by-design frameworks, conducting privacy impact assessments, and managing compliance with global regulations. In-depth chapters feature real-world case studies and examples that illustrate the application of privacy practices in a variety of scenarios, complemented by discussions of emerging trends such as artificial intelligence, blockchain, IoT, and more. Providing timely and comprehensive coverage of privacy principles, regulatory compliance, and actionable strategies, Building Effective Privacy Programs: Addresses all essential areas of cyberprivacy, from foundational principles to advanced topics Presents detailed analysis of major laws, such as GDPR, CCPA, and HIPAA, and their practical implications Offers strategies to integrate privacy principles into business processes and IT systems Covers industry-specific applications for healthcare, finance, and technology sectors Highlights successful privacy program implementations and lessons learned from enforcement actions Includes glossaries, comparison charts, sample policies, and additional resources for quick reference Written by seasoned professionals with deep expertise in privacy law, cybersecurity, and data protection, Building Effective Privacy Programs: Cybersecurity from Principles to Practice is a vital reference for privacy officers, legal advisors, IT professionals, and business executives responsible for data governance and regulatory compliance. It is also an excellent textbook for advanced courses in cybersecurity, information systems, business law, and business management.

Fundamentals of Metadata Management

2025-08-05 O'Reilly Amazon

book

Ole Olesen-Bagneux

data data-engineering metadata AI/ML Analytics Data Analytics

Whether it's to adhere to regulations, access markets by meeting specific standards, or devise data analytics and AI strategies, companies today are busy implementing metadata repositories—metadata tools about the IT, data, information, and knowledge in your company. Until now, most of these repositories have been implemented in isolation from one another, but that practice lies at the core of problems with data management in many companies today. Author Ole Olesen-Bagneux, chief evangelist at Actian, shows you how to masterfully manage your metadata repositories by properly coordinating them. That requires a data discovery team to increase insights for all key players in enterprise data management, from the CIO and CDO to enterprise and data architects. Coordinating these repositories will help you and your organization democratize data and excel at data management. This book shows you how. Learn what metadata repositories are and what they do Explore which data to represent in these repositories Set up a data discovery team to make data searchable Learn how to manage and coordinate repositories in a meta grid Increase innovation by setting up a functional data marketplace Make information security and data protection more robust Gain a deeper understanding of your company IT landscape Activate real enterprise architecture based on evidence

MongoDB 8.0 in Action, Third Edition

2025-07-10 O'Reilly Amazon

book

Arkadiusz Borucki

data data-engineering nosql-databases MongoDB AI/ML Cloud Computing

Deliver flexible, scalable, and high-performance data storage that's perfect for AI and other modern applications with MongoDB 8.0 and MongoDB Atlas multi-cloud data platform. In MongoDB 8.0 in Action, Third Edition you'll find comprehensive coverage of the latest version of MongoDB 8.0 and the MongoDB Atlas multi-cloud data platform. Learn to utilize MongoDB’s flexible schema design for data modeling, scale applications effectively using advanced sharding features, integrate full-text and vector-based semantic search, and more. This totally revised new edition delivers engaging hands-on tutorials and examples that put MongoDB into action! In MongoDB 8.0 in Action, Third Edition you'll: Master new features in MongoDB 8.0 Create your first, free Atlas cluster using the Atlas CLI Design scalable NoSQL databases with effective data modeling techniques Master Vector Search for building GenAI-driven applications Utilize advanced search capabilities in MongoDB Atlas, including full-text search Build Event-Driven Applications with Atlas Stream Processing Deploy and manage MongoDB Atlas clusters both locally and in the cloud using the Atlas CLI Leverage the Atlas SQL interface for familiar SQL querying Use MongoDB Atlas Online Archive for efficient data management Establish robust security practices including encryption Master backup and restore strategies Optimize database performance and identify slow queries MongoDB 8.0 in Action, Third Edition offers a clear, easy-to-understand introduction to everything in MongoDB 8.0 and MongoDB Atlas—including new advanced features such as embedded config servers in sharded clusters, or moving an unsharded collection to a different shard. The book also covers Atlas stream processing, full text search, and vector search capabilities for generative AI applications. Each chapter is packed with tips, tricks, and practical examples you can quickly apply to your projects, whether you're brand new to MongoDB or looking to get up to speed with the latest version. About the Technology MongoDB is the database of choice for storing structured, semi-structured, and unstructured data like business documents and other text and image files. MongoDB 8.0 introduces a range of exciting new features—from sharding improvements that simplify the management of distributed data, to performance enhancements that stay resilient under heavy workloads. Plus, MongoDB Atlas brings vector search and full-text search features that support AI-powered applications. About the Book MongoDB 8.0 in Action, Third Edition you’ll learn how to take advantage of all the new features of MongoDB 8.0, including the powerful MongoDB Atlas multi-cloud data platform. You’ll start with the basics of setting up and managing a document database. Then, you’ll learn how to use MongoDB for AI-driven applications, implement advanced stream processing, and optimize performance with improved indexing and query handling. Hands-on projects like creating a RAG-based chatbot and building an aggregation pipeline mean you’ll really put MongoDB into action! What's Inside The new features in MongoDB 8.0 Get familiar with MongoDB’s Atlas cloud platform Utilizing sharding enhancements Using vector-based search technologies Full-text search capabilities for efficient text indexing and querying About the Reader For developers and DBAs of all levels. No prior experience with MongoDB required. About the Author Arek Borucki is a MongoDB Champion, certified MongoDB and MongoDB Atlas administrator with expertise in distributed systems, NoSQL databases, and Kubernetes. Quotes An excellent resource with real-world examples and best practices to design, optimize, and scale modern applications. - Advait Patel, Broadcom Essential MongoDB resource. Covers new features such as full-text search, vector search, AI, and RAG applications. - Juan Roy, Credit Suisse Reflects author’s practical experience and clear teaching style. It’s packed with real-world examples and up-to-date insights. - Rajesh Nair, MongoDB Champion & community leader This book will definitely make you a MongoDB star! - Vinicios Wentz, JP Morgan & Chase Co.

Building Neo4j-Powered Applications with LLMs

2025-06-20 O'Reilly Amazon

book

Ravindranatha Anthapu , Siddhant Agarwal

data data-engineering graph-databases Neo4j AI/ML Cloud Computing

Dive into building applications that combine the power of Large Language Models (LLMs) with Neo4j knowledge graphs, Haystack, and Spring AI to deliver intelligent, data-driven recommendations and search outcomes. This book provides actionable insights and techniques to create scalable, robust solutions by leveraging the best-in-class frameworks and a real-world project-oriented approach. What this Book will help me do Understand how to use Neo4j to build knowledge graphs integrated with LLMs for enhanced data insights. Develop skills in creating intelligent search functionalities by combining Haystack and vector-based graph techniques. Learn to design and implement recommendation systems using LangChain4j and Spring AI frameworks. Acquire the ability to optimize graph data architectures for LLM-driven applications. Gain proficiency in deploying and managing applications on platforms like Google Cloud for scalability. Author(s) Ravindranatha Anthapu, a Principal Consultant at Neo4j, and Siddhant Agarwal, a Google Developer Expert in Generative AI, bring together their vast experience to offer practical implementations and cutting-edge techniques in this book. Their combined expertise in Neo4j, graph technology, and real-world AI applications makes them authoritative voices in the field. Who is it for? Designed for database developers and data scientists, this book caters to professionals aiming to leverage the transformational capabilities of knowledge graphs alongside LLMs. Readers should have a working knowledge of Python and Java as well as familiarity with Neo4j and the Cypher query language. If you're looking to enhance search or recommendation functionalities through state-of-the-art AI integrations, this book is for you.

GIS For Dummies, 2nd Edition

2025-05-27 O'Reilly Amazon

book

Michael N. DeMers , Jami Dennis

data data-engineering location-data geographic-information-system-gis geographic information system (gis) AI/ML

A jargon-free primer on GIS concepts and the essential tech tools Geographic Information Systems (GIS) is the fascinating technology field that's all about understanding and visualizing our world. GIS For Dummies introduces you to the essential skills you'll need if you want to become a geospatial data guru. You'll learn to read, analyze, and interpret maps, and you'll discover how GIS professionals create digital models of landscapes, cities, weather patterns, and beyond. Understand how advances in technology, including AI, are turning GIS tools into powerful assets for solving real-world problems and protecting the planet. This beginner-friendly book makes it easy to grasp necessary GIS concepts so you can apply GIS in your organization, pursue a career in this dynamic field, or just impress others with your geographic knowledge. Learn the basics of data analysis, interpretation, and modeling using Geographic Information Systems Gain the skills to read and interpret all types of maps and visual GIS information Discover how GIS is used in fields like urban planning, environmental science, business, and disaster management Explore whether a career in GIS could be right for you GIS For Dummies is the perfect starting point for students, professionals, and anyone curious about the potential of GIS as a technology or career choice.

Amazon Redshift Cookbook - Second Edition

2025-04-25 O'Reilly Amazon

book

Anusha Challa , Shruti Worlikar , Harshida Patel

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Amazon Redshift Cookbook provides practical techniques for utilizing AWS's managed data warehousing service effectively. With this book, you'll learn to create scalable and secure data analytics solutions, tackle data integration challenges, and leverage Redshift's advanced features like data sharing and generative AI capabilities. What this Book will help me do Create end-to-end data analytics solutions from ingestion to reporting using Amazon Redshift. Optimize the performance and security of Redshift implementations to meet enterprise standards. Leverage Amazon Redshift for zero-ETL ingestion and advanced concurrency scaling. Integrate Redshift with data lakes for enhanced data processing versatility. Implement generative AI and machine learning solutions directly within Redshift environments. Author(s) Shruti Worlikar, Harshida Patel, and Anusha Challa are seasoned data experts who bring together years of experience with Amazon Web Services and data analytics. Their combined expertise enables them to offer actionable insights, hands-on recipes, and proven strategies for implementing and optimizing Amazon Redshift-based solutions. Who is it for? This book is best suited for data analysts, data engineers, and architects who are keen on mastering modern data warehouse solutions using Redshift. Readers should have some knowledge of data warehousing and familiarity with cloud concepts. Ideal for professionals looking to migrate on-premises systems or build cloud-native analytics pipelines leveraging Redshift.

Unlock Data Agility with Composable Data Architecture

2025-04-25 O'Reilly Amazon

book

Adam Morton

data data-engineering Agile/Scrum AI/ML Data Management

Are your data systems slowing down your AI initiatives? The potential of AI to revolutionize business is undeniable, but many organizations struggle to bridge the gap between ambitious ideas and real-world results. The cause? Traditional data architectures remain too rigid and siloed to support today's dynamic, data-intensive demands. If you're a data leader searching for a solution, composable data architecture is the answer. This essential guide provides a clear, actionable framework for you to discover how this modular, adaptable approach empowers data teams, streamlines pipelines, and fuels continuous innovation. So, you'll not only keep pace with your most agile competitors—you'll surpass them. Understand the fundamental concepts that make composable architecture a game-changer Design pipelines that optimize performance and adapt to your organization's unique data needs See how composable architecture breaks down silos, enabling faster, more collaborative data processes Discover tools to streamline data management of high-volume streams or multicloud environments Leverage flexible architecture that simplifies data sharing, enabling easier access to insights

Grokking Relational Database Design

2025-03-10 O'Reilly Amazon

book

Michail Tsikerdekis , Qiang Hao

data data-engineering relational-databases AI/ML Computer Science Data Collection

A friendly illustrated guide to designing and implementing your first database. Grokking Relational Database Design makes the principles of designing relational databases approachable and engaging. Everything in this book is reinforced by hands-on exercises and examples. In Grokking Relational Database Design, you’ll learn how to: Query and create databases using Structured Query Language (SQL) Design databases from scratch Implement and optimize database designs Take advantage of generative AI when designing databases A well-constructed database is easy to understand, query, manage, and scale when your app needs to grow. In Grokking Relational Database Design you’ll learn the basics of relational database design including how to name fields and tables, which data to store where, how to eliminate repetition, good practices for data collection and hygiene, and much more. You won’t need a computer science degree or in-depth knowledge of programming—the book’s practical examples and down-to-earth definitions are beginner-friendly. About the Technology Almost every business uses a relational database system. Whether you’re a software developer, an analyst creating reports and dashboards, or a business user just trying to pull the latest numbers, it pays to understand how a relational database operates. This friendly, easy-to-follow book guides you from square one through the basics of relational database design. About the Book Grokking Relational Database Design introduces the core skills you need to assemble and query tables using SQL. The clear explanations, intuitive illustrations, and hands-on projects make database theory come to life, even if you can’t tell a primary key from an inner join. As you go, you’ll design, implement, and optimize a database for an e-commerce application and explore how generative AI simplifies the mundane tasks of database designs. What's Inside Define entities and their relationships Minimize anomalies and redundancy Use SQL to implement your designs Security, scalability, and performance About the Reader For self-taught programmers, software engineers, data scientists, and business data users. No previous experience with relational databases assumed. About the Authors Dr. Qiang Hao and Dr. Michail Tsikerdekis are both professors of Computer Science at Western Washington University. Quotes If anyone is looking to improve their database design skills, they can’t go wrong with this book. - Ben Brumm, DatabaseStar Goes beyond SQL syntax and explores the core principles. An invaluable resource! - William Jamir Silva, Adjust Relational database design is best done right the first time. This book is a great help to achieve that! - Maxim Volgin, KLM Provides necessary notions to design and build databases that can stand the data challenges we face. - Orlando Méndez, Experian

Generative AI with SAP and Amazon Bedrock: Utilizing GenAI with SAP and AWS Business Use Cases

2025-02-19 O'Reilly Amazon

book

Miguel Figueiredo

data data-engineering SAP AI/ML AWS Cloud Computing

Explore Generative AI and understand its key concepts, architecture, and tangible business use cases. This book will help you develop the skills needed to use SAP AI Core service features available in the SAP Business Technology Platform. You’ll examine large language model (LLM) concepts and gain the practical knowledge to unleash the best use of Gen AI. As you progress, you’ll learn how to get started with your own LLM models and work with Generative AI use cases. Additionally, you’ll see how to take advantage Amazon Bedrock stack using AWS SDK for ABAP. To fully leverage your knowledge, Generative AI with SAP and Amazon Bedrock offers practical step-by-step instructions for how to establish a cloud SAP BTP account model and create your first GenAIartifacts. This work is an important prerequisite for those who want to take full advantage of generative AI with SAP. What You Will Learn Master the concepts and terminology of artificial intelligence and GenAI Understand opportunities and impacts for different industries with GenAI Become familiar with SAP AI Core, Amazon Bedrock, AWS SDK for ABAP and develop your firsts GenAI projects Accelerate your development skills Gain more productivity and time implementing GenAI use cases Who this Book Is For Anyone who wants to learn about Generative AI for Enterprise and SAP practitioners who want to take advantage of AI within the SAP ecosystem to support their systems and workflows.

Snowflake Recipes: A Problem-Solution Approach to Implementing Modern Data Pipelines

2024-12-19 O'Reilly Amazon

book

John Eipe , Dillon Dayton

data data-engineering Snowflake Agile/Scrum AI/ML AWS

Explore Snowflake’s core concepts and unique features that differentiates it from industry competitors, such as, Azure Synapse and Google BigQuery. This book provides recipes for architecting and developing modern data pipelines on the Snowflake data platform by employing progressive techniques, agile practices, and repeatable strategies. You’ll walk through step-by-step instructions on ready-to-use recipes covering a wide range of the latest development topics. Then build scalable development pipelines and solve specific scenarios common to all modern data platforms, such as, data masking, object tagging, data monetization, and security best practices. Throughout the book you’ll work with code samples for Amazon Web Services, Microsoft Azure, and Google Cloud Platform. There’s also a chapter devoted to solving machine learning problems with Snowflake. Authors Dillon Dayton and John Eipe are both Snowflake SnowPro Core certified, specializing in data and digital services, and understand the challenges of finding the right solution to complex problems. The recipes in this book are based on real world use cases and examples designed to help you provide quality, performant, and secured data to solve business initiatives. What You’ll Learn Handle structured and un- structured data in Snowflake. Apply best practices and different options for data transformation. Understand data application development. Implement data sharing, data governance and security. Who This book Is For Data engineers, scientists and analysts moving into Snowflake, looking to build data apps. This book expects basic knowledge in Cloud (AWS or Azure or GCP), SQL and Python

Snowflake Data Engineering

2024-12-06 O'Reilly Amazon

book

Maja Ferle

data data-engineering Snowflake AI/ML Analytics API

A practical introduction to data engineering on the powerful Snowflake cloud data platform. Data engineers create the pipelines that ingest raw data, transform it, and funnel it to the analysts and professionals who need it. The Snowflake cloud data platform provides a suite of productivity-focused tools and features that simplify building and maintaining data pipelines. In Snowflake Data Engineering, Snowflake Data Superhero Maja Ferle shows you how to get started. In Snowflake Data Engineering you will learn how to: Ingest data into Snowflake from both cloud and local file systems Transform data using functions, stored procedures, and SQL Orchestrate data pipelines with streams and tasks, and monitor their execution Use Snowpark to run Python code in your pipelines Deploy Snowflake objects and code using continuous integration principles Optimize performance and costs when ingesting data into Snowflake Snowflake Data Engineering reveals how Snowflake makes it easy to work with unstructured data, set up continuous ingestion with Snowpipe, and keep your data safe and secure with best-in-class data governance features. Along the way, you’ll practice the most important data engineering tasks as you work through relevant hands-on examples. Throughout, author Maja Ferle shares design tips drawn from her years of experience to ensure your pipeline follows the best practices of software engineering, security, and data governance. About the Technology Pipelines that ingest and transform raw data are the lifeblood of business analytics, and data engineers rely on Snowflake to help them deliver those pipelines efficiently. Snowflake is a full-service cloud-based platform that handles everything from near-infinite storage, fast elastic compute services, inbuilt AI/ML capabilities like vector search, text-to-SQL, code generation, and more. This book gives you what you need to create effective data pipelines on the Snowflake platform. About the Book Snowflake Data Engineering guides you skill-by-skill through accomplishing on-the-job data engineering tasks using Snowflake. You’ll start by building your first simple pipeline and then expand it by adding increasingly powerful features, including data governance and security, adding CI/CD into your pipelines, and even augmenting data with generative AI. You’ll be amazed how far you can go in just a few short chapters! What's Inside Ingest data from the cloud, APIs, or Snowflake Marketplace Orchestrate data pipelines with streams and tasks Optimize performance and cost About the Reader For software developers and data analysts. Readers should know the basics of SQL and the Cloud. About the Author Maja Ferle is a Snowflake Subject Matter Expert and a Snowflake Data Superhero who holds the SnowPro Advanced Data Engineer and the SnowPro Advanced Data Analyst certifications. Quotes An incredible guide for going from zero to production with Snowflake. - Doyle Turner, Microsoft A must-have if you’re looking to excel in the field of data engineering. - Isabella Renzetti, Data Analytics Consultant & Trainer Masterful! Unlocks the true potential of Snowflake for modern data engineers. - Shankar Narayanan, Microsoft Valuable insights will enhance your data engineering skills and lead to cost-effective solutions. A must read! - Frédéric L’Anglais, Maxa Comprehensive, up-to-date and packed with real-life code examples. - Albert Nogués, Danone

AI Engineering

2024-12-04 O'Reilly Amazon

book

Chip Huyen

data ai-ml artificial-intelligence-ai artificial intelligence (ai) AI/ML Analytics

Recent breakthroughs in AI have not only increased demand for AI products, they've also lowered the barriers to entry for those who want to build AI products. The model-as-a-service approach has transformed AI from an esoteric discipline into a powerful development tool that anyone can use. Everyone, including those with minimal or no prior AI experience, can now leverage AI models to build applications. In this book, author Chip Huyen discusses AI engineering: the process of building applications with readily available foundation models. The book starts with an overview of AI engineering, explaining how it differs from traditional ML engineering and discussing the new AI stack. The more AI is used, the more opportunities there are for catastrophic failures, and therefore, the more important evaluation becomes. This book discusses different approaches to evaluating open-ended models, including the rapidly growing AI-as-a-judge approach. AI application developers will discover how to navigate the AI landscape, including models, datasets, evaluation benchmarks, and the seemingly infinite number of use cases and application patterns. You'll learn a framework for developing an AI application, starting with simple techniques and progressing toward more sophisticated methods, and discover how to efficiently deploy these applications. Understand what AI engineering is and how it differs from traditional machine learning engineering Learn the process for developing an AI application, the challenges at each step, and approaches to address them Explore various model adaptation techniques, including prompt engineering, RAG, fine-tuning, agents, and dataset engineering, and understand how and why they work Examine the bottlenecks for latency and cost when serving foundation models and learn how to overcome them Choose the right model, dataset, evaluation benchmarks, and metrics for your needs Chip Huyen works to accelerate data analytics on GPUs at Voltron Data. Previously, she was with Snorkel AI and NVIDIA, founded an AI infrastructure startup, and taught Machine Learning Systems Design at Stanford. She's the author of the book Designing Machine Learning Systems, an Amazon bestseller in AI. AI Engineering builds upon and is complementary to Designing Machine Learning Systems (O'Reilly).

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

2024-12-01 O'Reilly Amazon

book

Venkata Gunnu , Balaji Dhamodharan , Ramcharan Kakarla , Sundar Krishnan

data data-engineering apache-spark PySpark AI/ML API

This comprehensive guide, featuring hand-picked examples of daily use cases, will walk you through the end-to-end predictive model-building cycle using the latest techniques and industry tricks. In Chapters 1, 2, and 3, we will begin by setting up the environment and covering the basics of PySpark, focusing on data manipulation. Chapter 4 delves into the art of variable selection, demonstrating various techniques available in PySpark. In Chapters 5, 6, and 7, we explore machine learning algorithms, their implementations, and fine-tuning techniques. Chapters 8 and 9 will guide you through machine learning pipelines and various methods to operationalize and serve models using Docker/API. Chapter 10 will demonstrate how to unlock the power of predictive models to create a meaningful impact on your business. Chapter 11 introduces some of the most widely used and powerful modeling frameworks to unlock real value from data. In this new edition, you will learn predictive modeling frameworks that can quantify customer lifetime values and estimate the return on your predictive modeling investments. This edition also includes methods to measure engagement and identify actionable populations for effective churn treatments. Additionally, a dedicated chapter on experimentation design has been added, covering steps to efficiently design, conduct, test, and measure the results of your models. All code examples have been updated to reflect the latest stable version of Spark. You will: Gain an overview of end-to-end predictive model building Understand multiple variable selection techniques and their implementations Learn how to operationalize models Perform data science experiments and learn useful tips

Managing Data as a Product

2024-11-29 O'Reilly Amazon

book

Andrea Gioia

data data-engineering AI/ML Data Engineering Data Management Data Modelling

Discover how to transform your data architecture with the insights and techniques presented in Managing Data as a Product by Andrea Gioia. In this comprehensive guide, you'll explore how to design, implement, and maintain data-product-centered systems to meet modern demands, achieving scalable and sustainable data management tailored to your organization's needs. What this Book will help me do Understand the principles of data-product-centered architectures and their advantages. Learn to design, develop, and operate data products in production settings. Explore strategies to manage the lifecycle of data products efficiently. Gain insights into team topologies and data ownership for distributed systems. Discover data modeling techniques for AI-ready architectures. Author(s) Andrea Gioia is a renowned data architect and the creator of the Open Data Mesh Initiative. With over 20 years of experience, Andrea has successfully led complex data projects and is passionate about sharing his expertise. His writing is practical and driven by real-world challenges, aiming to equip engineers with actionable knowledge. Who is it for? This book is ideal for data engineers, software architects, and engineering leaders involved in shaping innovative data architectures. If you have foundational knowledge of data engineering and are eager to advance your expertise by adopting data-product principles, this book will suit your needs. It is for professionals aiming to modernize and optimize their approach to organizational data management.

Prompt Engineering for LLMs

2024-11-25 O'Reilly Amazon

book

Albert Ziegler , John Berryman

data ai-ml artificial-intelligence-ai generative-ai prompt-engineering AI/ML

Large language models (LLMs) are revolutionizing the world, promising to automate tasks and solve complex problems. A new generation of software applications are using these models as building blocks to unlock new potential in almost every domain, but reliably accessing these capabilities requires new skills. This book will teach you the art and science of prompt engineering-the key to unlocking the true potential of LLMs. Industry experts John Berryman and Albert Ziegler share how to communicate effectively with AI, transforming your ideas into a language model-friendly format. By learning both the philosophical foundation and practical techniques, you'll be equipped with the knowledge and confidence to build the next generation of LLM-powered applications. Understand LLM architecture and learn how to best interact with it Design a complete prompt-crafting strategy for an application Gather, triage, and present context elements to make an efficient prompt Master specific prompt-crafting techniques like few-shot learning, chain-of-thought prompting, and RAG

Learn FileMaker Pro 2024: The Comprehensive Guide to Building Custom Databases

2024-11-22 O'Reilly Amazon

book

Mark Conway Munro

data data-engineering filemaker AI/ML API LLM

FileMaker Pro is a development platform from Claris International Inc., a subsidiary of Apple Inc. The software makes it easy for everyone to create powerful, multi-user, cross-platform, relational database applications. This book navigates the reader through the software in a clear and logical manner, with each chapter building on the previous one. After an initial review of the user environment and application basics, the book delves into a deep exploration of the integrated development environment, which seamlessly combines the full stack of schema, business logic, and interface layers into a unified visual programming experience. Everything beginners need to get started is covered, along with advanced material that seasoned professionals will appreciate. Written by a professional developer with decades of real-world experience, "Learn FileMaker Pro 2024" is a comprehensive learning and reference guide. Join millions of users and developers worldwide in achieving a new level of workflow efficiency with FileMaker. For This New Edition This third edition includes clearer lessons and more examples, making it easier than ever to start planning, building, and deploying a custom database solution. It covers dozens of new and modified features introduced in versions 19.1 to 19.6, as well as the more recent 2023 (v20) and 2024 (v21) releases. Whatever your level of experience, this book has something new for you! What You’ll Learn · Plan and create custom tables, fields, and relationships · Write calculations using built-in and custom functions · Build layouts with dynamic objects, themes, and custom menus · Automate tasks with scripts and link them to objects and interface events · Keep database files secure and healthy · Integrate with external systems using ODBC, cURL, and the FM API · Deploy solutions to share with desktop, iOS, and web clients · Learn about summary reports, dynamic object references, and transactions · Delve into artificial intelligence with CoreML, OpenAI, and Semantic Finds Who This Book Is For Hobbyist developers, professional consultants, IT staff

Apache Spark for Machine Learning

2024-11-01 O'Reilly Amazon

book

Deepak Gowda

data data-engineering apache-spark AI/ML Big Data Computer Science

Dive into the power of Apache Spark as a tool for handling and processing big data required for machine learning. With this book, you will explore how to configure, execute, and deploy machine learning algorithms using Spark's scalable architecture and learn best practices for implementing real-world big data solutions. What this Book will help me do Understand the integration of Apache Spark with large-scale infrastructures for machine learning applications. Employ data processing techniques for preprocessing and feature engineering efficiently with Spark. Master the implementation of advanced supervised and unsupervised learning algorithms using Spark. Learn to deploy machine learning models within Spark ecosystems for optimized performance. Discover methods for analyzing big data trends and machine learning model tuning for improved accuracy. Author(s) The author, Deepak Gowda, is an experienced data scientist with over ten years of expertise in machine learning and big data. His career spans industries such as supply chain, cybersecurity, and more where he has utilized Apache Spark extensively. Deepak's teaching style is marked by clarity and practicality, making complex concepts approachable. Who is it for? Apache Spark for Machine Learning is tailored for data engineers, machine learning practitioners, and computer science students looking to advance their ability to process, analyze, and model using large datasets. If you're already familiar with basic machine learning and want to scale your solutions using Spark, this book is ideal for your studies and professional growth.

Building Modern Data Applications Using Databricks Lakehouse

2024-10-31 O'Reilly Amazon

book

Will Girten

data data-engineering storage-repositories data-lake AI/ML CI/CD

This book, "Building Modern Data Applications Using Databricks Lakehouse," provides a comprehensive guide for data professionals to master the Databricks platform. You'll learn to effectively build, deploy, and monitor robust data pipelines with Databricks' Delta Live Tables, empowering you to manage and optimize cloud-based data operations effortlessly. What this Book will help me do Understand the foundations and concepts of Delta Live Tables and its role in data pipeline development. Learn workflows to process and transform real-time and batch data efficiently using the Databricks lakehouse architecture. Master the implementation of Unity Catalog for governance and secure data access in modern data applications. Deploy and automate data pipeline changes using CI/CD, leveraging tools like Terraform and Databricks Asset Bundles. Gain advanced insights in monitoring data quality and performance, optimizing cloud costs, and managing DataOps tasks effectively. Author(s) Will Girten, the author, is a seasoned Solutions Architect at Databricks with over a decade of experience in data and AI systems. With a deep expertise in modern data architectures, Will is adept at simplifying complex topics and translating them into actionable knowledge. His books emphasize real-time application and offer clear, hands-on examples, making learning engaging and impactful. Who is it for? This book is geared towards data engineers, analysts, and DataOps professionals seeking efficient strategies to implement and maintain robust data pipelines. If you have a basic understanding of Python and Apache Spark and wish to delve deeper into the Databricks platform for streamlining workflows, this book is tailored for you.

LLM Engineer's Handbook

2024-10-22 O'Reilly Amazon

book

Paul Iusztin , Maxime Labonne

data ai-ml artificial-intelligence-ai generative-ai prompt-engineering AI/ML

The "LLM Engineer's Handbook" is your comprehensive guide to mastering Large Language Models from concept to deployment. Written by leading experts, it combines theoretical foundations with practical examples to help you build, refine, and deploy LLM-powered solutions that solve real-world problems effectively and efficiently. What this Book will help me do Understand the principles and approaches for training and fine-tuning Large Language Models (LLMs). Apply MLOps practices to design, deploy, and monitor your LLM applications effectively. Implement advanced techniques such as retrieval-augmented generation (RAG) and preference alignment. Optimize inference for high performance, addressing low-latency and high availability for production systems. Develop robust data pipelines and scalable architectures for building modular LLM systems. Author(s) Paul Iusztin and Maxime Labonne are experienced AI professionals specializing in natural language processing and machine learning. With years of industry and academic experience, they are dedicated to making complex AI concepts accessible and actionable. Their collaborative authorship ensures a blend of theoretical rigor and practical insights tailored for modern AI practitioners. Who is it for? This book is tailored for AI engineers, NLP professionals, and LLM practitioners who wish to deepen their understanding of Large Language Models. Ideal readers possess some familiarity with Python, AWS, and general AI concepts. If you aim to apply LLMs to real-world scenarios or enhance your expertise in AI-driven systems, this handbook is designed for you.

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

2024-10-12 O'Reilly Amazon

book

Jason Yip , Nikhil Gupta

data data-engineering Databricks databricks-data-engineer-associate AI/ML Analytics

This book is your comprehensive guide to building robust Generative AI solutions using the Databricks Data Intelligence Platform. Databricks is the fastest-growing data platform offering unified analytics and AI capabilities within a single governance framework, enabling organizations to streamline their data processing workflows, from ingestion to visualization. Additionally, Databricks provides features to train a high-quality large language model (LLM), whether you are looking for Retrieval-Augmented Generation (RAG) or fine-tuning. Databricks offers a scalable and efficient solution for processing large volumes of both structured and unstructured data, facilitating advanced analytics, machine learning, and real-time processing. In today's GenAI world, Databricks plays a crucial role in empowering organizations to extract value from their data effectively, driving innovation and gaining a competitive edge in the digital age. This book will not only help you master the Data Intelligence Platform but also help power your enterprise to the next level with a bespoke LLM unique to your organization. Beginning with foundational principles, the book starts with a platform overview and explores features and best practices for ingestion, transformation, and storage with Delta Lake. Advanced topics include leveraging Databricks SQL for querying and visualizing large datasets, ensuring data governance and security with Unity Catalog, and deploying machine learning and LLMs using Databricks MLflow for GenAI. Through practical examples, insights, and best practices, this book equips solution architects and data engineers with the knowledge to design and implement scalable data solutions, making it an indispensable resource for modern enterprises. Whether you are new to Databricks and trying to learn a new platform, a seasoned practitioner building data pipelines, data science models, or GenAI applications, or even an executive who wants to communicate the value of Databricks to customers, this book is for you. With its extensive feature and best practice deep dives, it also serves as an excellent reference guide if you are preparing for Databricks certification exams. What You Will Learn Foundational principles of Lakehouse architecture Key features including Unity Catalog, Databricks SQL (DBSQL), and Delta Live Tables Databricks Intelligence Platform and key functionalities Building and deploying GenAI Applications from data ingestion to model serving Databricks pricing, platform security, DBRX, and many more topics Who This Book Is For Solution architects, data engineers, data scientists, Databricks practitioners, and anyone who wants to deploy their Gen AI solutions with the Data Intelligence Platform. This is also a handbook for senior execs who need to communicate the value of Databricks to customers. People who are new to the Databricks Platform and want comprehensive insights will find the book accessible.

Data Engineering Best Practices

2024-10-11 O'Reilly Amazon

book

Richard J. Schiller , David Larochelle

data data-engineering Agile/Scrum AI/ML Analytics Big Data

Unlock the secrets to building scalable and efficient data architectures with 'Data Engineering Best Practices.' This book provides in-depth guidance on designing, implementing, and optimizing cloud-based data pipelines. You will gain valuable insights into best practices, agile workflows, and future-proof designs. What this Book will help me do Effectively plan and architect scalable data solutions leveraging cloud-first strategies. Master agile processes tailored to data engineering for improved project outcomes. Implement secure, efficient, and reliable data pipelines optimized for analytics and AI. Apply real-world design patterns and avoid common pitfalls in data flow and processing. Create future-ready data engineering solutions following industry-proven frameworks. Author(s) Richard J. Schiller and David Larochelle are seasoned data engineering experts with decades of experience crafting efficient and secure cloud-based infrastructures. Their collaborative writing distills years of real-world expertise into practical advice aimed at helping engineers succeed in a rapidly evolving field. Who is it for? This book is ideal for data engineers, ETL specialists, and big data professionals seeking to enhance their knowledge in cloud-based solutions. Some familiarity with data engineering, ETL pipelines, and big data technologies is helpful. It suits those keen on mastering advanced practices, improving agility, and developing efficient data pipelines. Perfect for anyone looking to future-proof their skills in data engineering.

Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric

2024-10-10 O'Reilly Amazon

book

Bob Ward

data data-engineering relational-databases azure-sql-database AI/ML Azure

Access detailed content and examples on Azure SQL, a set of cloud services that allows for SQL Server to be deployed in the cloud. This book teaches the fundamentals of deployment, configuration, security, performance, and availability of Azure SQL from the perspective of these same tasks and capabilities in SQL Server. This distinct approach makes this book an ideal learning platform for readers familiar with SQL Server on-premises who want to migrate their skills toward providing cloud solutions to an enterprise market that is increasingly cloud-focused. If you know SQL Server, you will love this book. You will be able to take your existing knowledge of SQL Server and translate that knowledge into the world of cloud services from the Microsoft Azure platform, and in particular into Azure SQL. This book provides information never seen before about the history and architecture of Azure SQL. Author Bob Ward is a leading expert with access to and support from the Microsoft engineering team that built Azure SQL and related database cloud services. He presents powerful, behind-the-scenes insights into the workings of one of the most popular database cloud services in the industry. This book also brings you the latest innovations for Azure SQL including Azure Arc, Hyperscale, generative AI applications, Microsoft Copilots, and integration with the Microsoft Fabric. What You Will Learn Know the history of Azure SQL Deploy, configure, and connect to Azure SQL Choose the correct way to deploy SQL Server in Azure Migrate existing SQL Server instances to Azure SQL Monitor and tune Azure SQL’s performance to meet your needs Ensure your data and application are highly available Secure your data from attack and theft Learn the latest innovations for Azure SQL including Hyperscale Learn how to harness the power of AI for generative data-driven applications and Microsoft Copilots for assistance Learn how to integrate Azure SQL with the unified data platform, the Microsoft Fabric Who This Book Is For This book is designed to teach SQL Server in the Azure cloud to the SQL Server professional. Anyone who operates, manages, or develops applications for SQL Server will benefit from this book. Readers will be able to translate their current knowledge of SQL Server—especially of SQL Server 2019 and 2022—directly to Azure. This book is ideal for database professionals looking to remain relevant as their customer base moves into the cloud.

Data Security Blueprints

2024-10-02 O'Reilly Amazon

book

Federico Castanedo

data data-engineering AI/ML Data Science Cyber Security

Once you decide to implement a data security strategy, it can be difficult to know where to start. With so many potential threats and challenges to resolve, teams often try to fix everything at once. But this boil-the-ocean approach is difficult to manage efficiently and ultimately leads to frustration, confusion, and halted progress. There's a better way to go. In this report, data science and AI leader Federico Castanedo shows you what to look for in a data security platform that will deliver the speed, scale, and agility you need to be successful in today's fast-paced, distributed data ecosystems. Unlike other resources that focus solely on data security concepts, this guide provides a road map for putting those concepts into practice. This report reveals: The most common data security use cases and their potential challenges What to look for in a data security solution that's built for speed and scale Why increasingly decentralized data architectures require centralized, dynamic data security mechanisms How to implement the steps required to put common use cases into production Methods for assessing risks—and controls necessary to mitigate those risks How to facilitate cross-functional collaboration to put data security into practice in a scalable, efficient way You'll examine the most common data security use cases that global enterprises across every industry aim to achieve, including the specific steps needed for implementation as well as the potential obstacles these use cases present. Federico Castanedo is a data science and AI leader with extensive experience in academia, industry, and startups. Having held leadership positions at DataRobot and Vodafone, he has a successful track record of leading high-performing data science teams and developing data science and AI products with business impact.

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

2024-09-27 O'Reilly Amazon

book

Pavan Kumar Narayanan

data ai-ml machine-learning AI/ML Airflow Analytics

This book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists

Full Stack FastAPI, React, and MongoDB - Second Edition

2024-08-23 O'Reilly Amazon

book

Shubham Ranjan , Marko Aleksendrić , Rachelle Palmer , Shrey Batra

data data-engineering nosql-databases MongoDB AI/ML API

Full Stack FastAPI, React, and MongoDB guides you step-by-step through creating web applications using the FARM stack. This hands-on resource teaches you how to integrate FastAPI, a modern Python framework, React for front-end development, and MongoDB for data storage to build and deploy powerful, scalable web applications. What this Book will help me do Master the essentials of MongoDB, including creating and managing document-based databases. Gain proficiency in building APIs using FastAPI and Python for robust backend systems. Develop dynamic frontends using React, integrating seamlessly with a FastAPI backend. Securely authenticate and authorize users using JSON Web Tokens in your applications. Explore advanced features like integrating AI models and building with Next.js for production-ready development. Author(s) Marko Aleksendrić, Shrey Batra, Rachelle Palmer, and Shubham Ranjan combine their expertise in web development and software engineering in this book. Together, they bring years of professional experience and a passion for teaching developers to create modern web applications effectively using cutting-edge tools. Who is it for? Intermediate web developers who possess foundational JavaScript and Python skills are the ideal audience for this book. If you want to advance your skills by mastering modern web application development with the FARM stack, this book will guide you comprehensively. With practical, real-world examples, it is designed for developers aiming to build production-grade applications.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Building Effective Privacy Programs

Fundamentals of Metadata Management

MongoDB 8.0 in Action, Third Edition

Building Neo4j-Powered Applications with LLMs

GIS For Dummies, 2nd Edition

Amazon Redshift Cookbook - Second Edition

Unlock Data Agility with Composable Data Architecture

Grokking Relational Database Design

Generative AI with SAP and Amazon Bedrock: Utilizing GenAI with SAP and AWS Business Use Cases

Snowflake Recipes: A Problem-Solution Approach to Implementing Modern Data Pipelines

Snowflake Data Engineering

AI Engineering

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Managing Data as a Product

Prompt Engineering for LLMs

Learn FileMaker Pro 2024: The Comprehensive Guide to Building Custom Databases

Apache Spark for Machine Learning

Building Modern Data Applications Using Databricks Lakehouse

LLM Engineer's Handbook

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

Data Engineering Best Practices

Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric

Data Security Blueprints

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

Full Stack FastAPI, React, and MongoDB - Second Edition