streaming-messaging

Practical Data Engineering with Apache Projects: Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

2026-01-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dunith Danushka

Airflow Flink Data Engineering Iceberg Kafka Spark Trino data data-engineering

This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using Open Source solutions. Focusing on 10 real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more. Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios. At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering. In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering. You Will Learn: The foundational concepts of data engineering and practical experience in solving real-world data engineering problems How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino 10 hands-on data engineering projects Troubleshoot common challenges in data engineering projects Who is this book for: Early-career data engineers and aspiring data engineers who are looking to build a strong foundation in the field; mid-career professionals looking to transition into data engineering roles; and technology enthusiasts interested in gaining insights into data engineering practices and tools.

Building Integrations with MuleSoft

2025-05-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Diane Kesler , Pooja Kamath

API Cloud Computing GitHub data data-engineering enterprise-service-bus mule-esb

This concise yet comprehensive guide shows developers and architects how to tackle data integration challenges with MuleSoft. Authors Pooja Kamath and Diane Kesler take you through the process necessary to build robust and scalable integration solutions step-by-step. Supported by real-world use cases, Building Integrations with MuleSoft teaches you to identify and resolve performance bottlenecks, handle errors, and ensure the reliability and scalability of your integration solutions. You'll explore MuleSoft's robust set of connectors and their components, and use them to connect to systems and applications from legacy databases to cloud services. Ask the right questions to determine your use case, define requirements, decide on reuse versus rebuild, and create sequence and context diagrams Master tools like the Anypoint Platform, Anypoint Studio, Code Builder, GitHub, and Maven Design APIs with RAML and OAS and craft effective requests and responses Write MUnit tests, validate DataWeave expressions, and use Postman Collections Deploy Mule applications to CloudHub, use API Manager to create API proxies, and secure APIs with Mule OAuth 2.0 Learn message orchestration techniques for routers, transactions, error handling, For Each, Parallel For Each, and batch processing

Apache Kafka in Action

2025-05-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alexander Kropp , Anatoly Zelenin (DataFlow Academy)

Analytics Cloud Computing Kafka Kubernetes Microsoft Data Streaming data data-engineering

Apache Kafka, start to finish. Apache Kafka in Action: From basics to production guides you through the concepts and skills you’ll need to deploy and administer Kafka for data pipelines, event-driven applications, and other systems that process data streams from multiple sources. Authors Anatoly Zelenin and Alexander Kropp have spent years using Kafka in real-world production environments. In this guide, they reveal their hard-won expert insights to help you avoid common Kafka pitfalls and challenges. Inside Apache Kafka in Action you’ll discover: Apache Kafka from the ground up Achieving reliability and performance Troubleshooting Kafka systems Operations, governance, and monitoring Kafka use cases, patterns, and anti-patterns Clear, concise, and practical, Apache Kafka in Action is written for IT operators, software engineers, and IT architects working with Kafka every day. Chapter by chapter, it guides you through the skills you need to deliver and maintain reliable and fault-tolerant data-driven applications. About the Technology Apache Kafka is the gold standard streaming data platform for real-time analytics, event sourcing, and stream processing. Acting as a central hub for distributed data, it enables seamless flow between producers and consumers via a publish-subscribe model. Kafka easily handles millions of events per second, and its rock-solid design ensures high fault tolerance and smooth scalability. About the Book Apache Kafka in Action is a practical guide for IT professionals who are integrating Kafka into data-intensive applications and infrastructures. The book covers everything from Kafka fundamentals to advanced operations, with interesting visuals and real-world examples. Readers will learn to set up Kafka clusters, produce and consume messages, handle real-time streaming, and integrate Kafka into enterprise systems. This easy-to-follow book emphasizes building reliable Kafka applications and taking advantage of its distributed architecture for scalability and resilience. What's Inside Master Kafka’s distributed streaming capabilities Implement real-time data solutions Integrate Kafka into enterprise environments Build and manage Kafka applications Achieve fault tolerance and scalability About the Reader For IT operators, software architects and developers. No experience with Kafka required. About the Authors Anatoly Zelenin is a Kafka expert known for workshops across Europe, especially in banking and manufacturing. Alexander Kropp specializes in Kafka and Kubernetes, contributing to cloud platform design and monitoring. Quotes A great introduction. Even experienced users will go back to it again and again. - Jakub Scholz, Red Hat Approachable, practical, well-illustrated, and easy to follow. A must-read. - Olena Kutsenko, Confluent A zero to hero journey to understanding and using Kafka! - Anthony Nandaa, Microsoft Thoughtfully explores a wide range of topics. A wealth of valuable information seamlessly presented and easily accessible. - Olena Babenko, Aiven Oy

Streaming Databases

2024-08-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ralph Matthias Debusmann , Hubert Dulay

Analytics Data Streaming data data-engineering streaming-architecture

Real-time applications are becoming the norm today. But building a model that works properly requires real-time data from the source, in-flight stream processing, and low latency serving of its analytics. With this practical book, data engineers, data architects, and data analysts will learn how to use streaming databases to build real-time solutions. Authors Hubert Dulay and Ralph M. Debusmann take you through streaming database fundamentals, including how these databases reduce infrastructure for real-time solutions. You'll learn the difference between streaming databases, stream processing, and real-time online analytical processing (OLAP) databases. And you'll discover when to use push queries versus pull queries, and how to serve synchronous and asynchronous data emanating from streaming databases. This guide helps you: Explore stream processing and streaming databases Learn how to build a real-time solution with a streaming database Understand how to construct materialized views from any number of streams Learn how to serve synchronous and asynchronous data Get started building low-complexity streaming solutions with minimal setup

MuleSoft Platform Architect's Guide

2024-07-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jitendra Bafna , Jim Andrews

API Fabric data data-engineering enterprise-service-bus mule-esb

The "MuleSoft Platform Architect's Guide" is your essential resource for mastering API-driven solutions using MuleSoft Anypoint Platform. This book enables you to design, deploy, and operate scalable, secure, and high-performance API architectures in enterprise settings while preparing for MuleSoft Platform Architect certification. What this Book will help me do Design robust API integration solutions using MuleSoft Anypoint Platform. Successfully deploy applications to CloudHub and Runtime Fabric environments. Monitor and operate APIs with advanced management tools. Implement scalable solutions aligned with business outcomes. Prepare confidently for the MuleSoft Platform Architect certification. Author(s) Jitendra Bafna is a Senior Solution Architect with years of experience optimizing MuleSoft implementations. Jim Andrews, a MuleSoft Evangelist, has dedicated his career to guiding others in achieving enterprise-ready API solutions. Together, they share practical knowledge, step-by-step guidance, and expertise in API and integration mastery. Who is it for? This book is perfect for IT architects and senior developers experienced in API development, especially those familiar with MuleSoft. It's tailored for professionals aiming to master Anypoint Platform or pursue MuleSoft Platform Architect certification. Readers should have basic experience with integration platforms and a willingness to explore advanced API design.

Big Data on Kubernetes

2024-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Neylson Crepalde

Airflow BI Big Data Docker Kafka Kubernetes Python Spark SQL YAML data data-engineering

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.

Kafka Streams in Action, Second Edition

2024-05-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bill Bejeck

API Java Kafka Data Streaming data data-engineering

Everything you need to implement stream processing on Apache KafkaⓇ using Kafka Streams and the kqsIDB event streaming database. Kafka Streams in Action, Second Edition guides you through setting up and maintaining your streaming processing with Kafka. Inside, you’ll find comprehensive coverage of not only Kafka Streams, but the entire toolbox you’ll need for effective streaming—from the components of the Kafka ecosystem, to Producer and Consumer clients, Connect, and Schema Registry. In Kafka Streams in Action, Second Edition you’ll learn how to: Design streaming applications in Kafka Streams with the KStream and the Processor API Integrate external systems with Kafka Connect Enforce data compatibility with Schema Registry Build applications that respond immediately to events in either Kafka Streams or ksqlDB Craft materialized views over streams with ksqlDB This totally revised new edition of Kafka Streams in Action has been expanded to cover more of the Kafka platform used for building event-based applications. You’ll also find full coverage of ksqlDB, an event streaming database that makes it a snap to create applications that respond immediately to events, such as real-time push and pull updates. About the Technology Enterprise applications need to handle thousands—even millions—of data events every day. With an intuitive API and flawless reliability, the lightweight Kafka Streams library has earned a spot at the center of these systems. Kafka Streams provides exactly the power and simplicity you need to manage real-time event processing or microservices messaging. About the Book Kafka Streams in Action, Second Edition teaches you how to create event streaming applications on the amazing Apache Kafka platform. This thoroughly revised new edition now covers a wider range of streaming architectures and includes data integration with Kafka Connect. As you go, you’ll explore real-world examples that introduce components and brokers, schema management, and the other essentials. Along the way, you’ll pick up practical techniques for blending Kafka with Spring, low-level control of processors and state stores, storing event data with ksqlDB, and testing streaming applications. What's Inside Design efficient streaming applications Integrate external systems with Kafka Connect Enforce data compatibility with Schema Registry About the Reader For Java developers. No knowledge of Kafka or streaming applications required. About the Author Bill Bejeck is a Confluent engineer and a Kafka Streams contributor with over 15 years of software development experience. Bill is also a committer on the Apache KafkaⓇ project. Quotes Comprehensive streaming data applications are only a few years away from becoming the reality, and this book is the guide the industry has been waiting for to move beyond the hype. - Adi Polak, Director, Developer Experience Engineering, Confluent Covers all the key aspects of building applications with Kafka Streams. Whether you are getting started with stream processing or have already built Kafka Streams applications, it is an essential resource. - Mickael Maison, Principal Software Engineer, Red Hat Serves as both a learning and a resource guide, offering a perfect blend of ‘how-to’ and ‘why-to.’ Even if you have been using Kafka Streams for many years, I highly recommend this book. - Neil Buesing, CTO & Co-founder, Kinetic Edge

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

2023-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Elad Eldor

Cloud Computing DataOps DevOps Kafka Data Streaming data data-engineering

This book provides Kafka administrators, site reliability engineers, and DataOps and DevOps practitioners with a list of real production issues that can occur in Kafka clusters and how to solve them. The production issues covered are assembled into a comprehensive troubleshooting guide for those engineers who are responsible for the stability and performance of Kafka clusters in production, whether those clusters are deployed in the cloud or on-premises. This book teaches you how to detect and troubleshoot the issues, and eventually how to prevent them. Kafka stability is hard to achieve, especially in high throughput environments, and the purpose of this book is not only to make troubleshooting easier, but also to prevent production issues from occurring in the first place. The guidance in this book is drawn from the author's years of experience in helping clients and internal customers diagnose and resolve knotty production problems and stabilize their Kafka environments. The book is organized into recipe-style troubleshooting checklists that field engineers can easily follow when under pressure to fix an unstable cluster. This is the book you will want by your side when the stakes are high, and your job is on the line. What You Will Learn Monitor and resolve production issues in your Kafka clusters Provision Kafka clusters with the lowest costs and still handle the required loads Perform root cause analyses of issues affecting your Kafka clusters Know the ways in which your Kafka cluster can affect its consumers and producers Prevent or minimize data loss and delays in data streaming Forestall production issues through an understanding of common failure points Create checklists for troubleshooting your Kafka clusters when problems occur Who This Book Is For Site reliability engineers tasked with maintaining stability of Kafka clusters, Kafka administrators who troubleshoot production issues around Kafka, DevOps and DataOps experts who are involved with provisioning Kafka (whether on-premises or in the cloud), developers of Kafka consumers and producers who wish to learn more about Kafka

Kafka Connect

2023-09-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kate Stanley , Mickael Maison

Kafka Data Streaming data data-engineering kafka-connect

Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time. With this practical guide, authors Mickael Maison and Kate Stanley show data engineers, site reliability engineers, and application developers how to build data pipelines between Kafka clusters and a variety of data sources and sinks. Kafka Connect allows you to quickly adopt Kafka by tapping into existing data and enabling many advanced use cases. No matter where you are in your event streaming journey, Kafka Connect is the ideal tool for building a modern data pipeline. Learn Kafka Connect's capabilities, main concepts, and terminology Design data and event streaming pipelines that use Kafka Connect Configure and operate Kafka Connect environments at scale Deploy secured and highly available Kafka Connect clusters Build sink and source connectors and single message transforms and converters

Building Real-Time Analytics Systems

2023-09-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mark Needham

Analytics AWS Kinesis Dashboard Kafka Pub/Sub data data-engineering real-time-analytics

Gain deep insight into real-time analytics, including the features of these systems and the problems they solve. With this practical book, data engineers at organizations that use event-processing systems such as Kafka, Google Pub/Sub, and AWS Kinesis will learn how to analyze data streams in real time. The faster you derive insights, the quicker you can spot changes in your business and act accordingly. Author Mark Needham from StarTree provides an overview of the real-time analytics space and an understanding of what goes into building real-time applications. The book's second part offers a series of hands-on tutorials that show you how to combine multiple software products to build real-time analytics applications for an imaginary pizza delivery service. You will: Learn common architectures for real-time analytics Discover how event processing differs from real-time analytics Ingest event data from Apache Kafka into Apache Pinot Combine event streams with OLTP data using Debezium and Kafka Streams Write real-time queries against event data stored in Apache Pinot Build a real-time dashboard and order tracking app Learn how Uber, Stripe, and Just Eat use real-time analytics

Modernize Applications with Apache Kafka

2023-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Stroop , Jennifer Vargas

Kafka Kubernetes data data-engineering

Application modernization has become increasingly important as older systems struggle to keep up with today's requirements. When you migrate legacy monolithic applications to microservices, easier maintenance and optimized resource utilization generally follow. But new challenges arise around communication within services and between applications. You can overcome many of these issues with the help of modern messaging technologies such as Apache Kafka. In this report, Jennifer Vargas and Richard Stroop from Red Hat explain how IT leaders and enterprise architects can use Kafka for microservices communication and then off-load operational needs through the use of Kubernetes and managed services. You'll also explore application modernization techniques that don't require you to break down your monolithic application. This report helps you: Understand the importance of migrating your monolithic applications to microservices Examine the various challenges you may face during the modernization process Explore application modernization techniques and learn the benefits of using Apache Kafka during the development process Learn how Apache Kafka can support business outcomes Understand how Kubernetes can help you overcome any difficulties you may encounter when using Kafka for application development

Streaming Data Mesh

2023-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stephen Mooney , Hubert Dulay

Data Governance DevOps Kafka MLOps Data Streaming data data-engineering streaming-architecture

Data lakes and warehouses have become increasingly fragile, costly, and difficult to maintain as data gets bigger and moves faster. Data meshes can help your organization decentralize data, giving ownership back to the engineers who produced it. This book provides a concise yet comprehensive overview of data mesh patterns for streaming and real-time data services. Authors Hubert Dulay and Stephen Mooney examine the vast differences between streaming and batch data meshes. Data engineers, architects, data product owners, and those in DevOps and MLOps roles will learn steps for implementing a streaming data mesh, from defining a data domain to building a good data product. Through the course of the book, you'll create a complete self-service data platform and devise a data governance system that enables your mesh to work seamlessly. With this book, you will: Design a streaming data mesh using Kafka Learn how to identify a domain Build your first data product using self-service tools Apply data governance to the data products you create Learn the differences between synchronous and asynchronous data services Implement self-services that support decentralized data

Sentient Strategy

2023-03-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alan Weiss

data data-engineering real-time-analytics

Alan Weiss equips the reader to consider using this approach independently. These are new times -- new reality, a “no normal" -- hence, it’s ridiculous to use old approaches to strategy.

Building Real-Time Analytics Applications

2023-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Darin Briskman

Analytics Druid Cyber Security data data-engineering real-time-analytics

Every organization needs insight to succeed and excel, and the primary foundation for insights today is data—whether it's internal data from operational systems or external data from partners, vendors, and public sources. But how can you use this data to create and maintain analytics applications capable of gaining real insights in real time? In this report, Darin Briskman explains that leading organizations like Netflix, Walmart, and Confluent have found that while traditional analytics still have value, it's not enough. These companies and many others are now building real-time analytics that deliver insights continually, on demand, and at scale—complete with interactive drill-down data conversations, subsecond performance at scale, and always-on reliability. Ideal for data engineers, data scientists, data architects, and software developers, this report helps you: Learn the elements of real-time analytics, including subsecond performance, high concurrency, and the combination of real-time and historical data Examine case studies that show how Netflix, Walmart, and Confluent have adopted real-time analytics Explore Apache Druid, the real-time database that powers real-time analytics applications Learn how to create real-time analytics applications through data design and interfaces Understand the importance of security, resilience, and managed services Darin Briskman is director of technology at Imply Data, Inc., a software company committed to advancing open source technology and making it simple for developers to realize the power of Apache Druid.

Streaming Video Strategies

2022-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Carolyn Handler Miller

Data Streaming data data-engineering streaming-architecture

Video is an essential tool for businesses and a key driver in consumer sales. But consumers expect the seamless viewing experiences they get on specialized streaming sites like Netflix and YouTube across every company everywhere they watch. Building video that meets those expectations into your sites and apps means dealing with complex challenges. In this report, Carolyn Handler Miller and Frank Kane help you think through decisions about building video at your company—whether you're a founder considering the role of video in your app, a product manager or team lead overseeing video infrastructure, or a developer executing on user experience. You'll explore a solid framework for incorporating video into your websites and apps that considers your existing infrastructure so that you can deliver seamless, high-quality video experiences that drive real results. Four case studies then show how real companies have successfully built video experiences into their businesses' software architecture. This report helps you: Understand the changing role of video for businesses today Appreciate the unique challenges of building video Decide whether to design and build video infrastructure yourself or partner with a third-party expert

Unlocking the Value of Real-Time Analytics

2022-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christopher Gardner

Analytics Data Collection data data-engineering real-time-analytics

Storing data and making it accessible for real-time analysis is a huge challenge for organizations today. In 2020 alone, 64.2 billion GB of data was created or replicated, and it continues to grow. With this report, data engineers, architects, and software engineers will learn how to do deep analysis and automate business decisions while keeping your analytical capabilities timely. Author Christopher Gardner takes you through current practices for extracting data for analysis and uncovers the opportunities and benefits of making that data extraction and analysis continuous. By the end of this report, you’ll know how to use new and innovative tools against your data to make real-time decisions. And you’ll understand how to examine the impact of real-time analytics on your business. Learn the four requirements of real-time analytics: latency, freshness, throughput, and concurrency Determine where delays between data collection and actionable analytics occur Understand the reasons for real-time analytics and identify the tools you need to reach a faster, more dynamic level Examine changes in data storage and software while learning methodologies for overcoming delays in existing database architecture Explore case studies that show how companies use columnar data, sharding, and bitmap indexing to store and analyze data Fast and fresh data can make the difference between a successful transaction and a missed opportunity. The report shows you how.

Grokking Streaming Systems

2022-03-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ning Wang , Josh Fischer

IoT Java Kafka Cyber Security Spark Data Streaming data data-engineering streaming-architecture

A friendly, framework-agnostic tutorial that will help you grok how streaming systems work—and how to build your own! In Grokking Streaming Systems you will learn how to: Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Assess parallelization requirements Spot networking bottlenecks and resolve back pressure Group data for high-performance systems Handle delayed events in real-time systems Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that’s a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities! About the Technology Streaming systems minimize the time between receiving and processing event data, so they can deliver responses in real time. For applications in finance, security, and IoT where milliseconds matter, streaming systems are a requirement. And streaming is hot! Skills on platforms like Spark, Heron, and Kafka are in high demand. About the Book Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you’ll build your own simple streaming tool from the ground up to make sure all the ideas and techniques stick. The helpful and entertaining illustrations make streaming systems come alive as you tackle relevant examples like real-time credit card fraud detection and monitoring IoT services. What's Inside Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Spot networking bottlenecks and resolve backpressure Group data for high-performance systems About the Reader No prior experience with streaming systems is assumed. Examples in Java. About the Authors Josh Fischer and Ning Wang are Apache Committers, and part of the committee for the Apache Heron distributed stream processing engine. Quotes Very well-written and enjoyable. I recommend this book to all software engineers working on data processing. - Apoorv Gupta, Facebook Finally, a much-needed introduction to streaming systems—a must-read for anyone interested in this technology. - Anupam Sengupta, Red Hat Tackles complex topics in a very approachable manner. - Marc Roulleau, GIRO A superb resource for helping you grasp the fundamentals of open-source streaming systems. - Simon Verhoeven, Cronos Explains all the main streaming concepts in a friendly way. Start with this one! - Cicero Zandona, Calypso Technologies

Kafka in Action

2022-02-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dave Klein , Dylan Scott , Viktor Gamov (Confluent)

Analytics ETL/ELT Java Kafka Data Streaming data data-engineering

Master the wicked-fast Apache Kafka streaming platform through hands-on examples and real-world projects. In Kafka in Action you will learn: Understanding Apache Kafka concepts Setting up and executing basic ETL tasks using Kafka Connect Using Kafka as part of a large data project team Performing administrative tasks Producing and consuming event streams Working with Kafka from Java applications Implementing Kafka as a message queue Kafka in Action is a fast-paced introduction to every aspect of working with Apache Kafka. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to produce and consume streams of events. Advancing quickly, you’ll soon be ready to use Kafka in your day-to-day workflow, and start digging into even more advanced Kafka topics. About the Technology Think of Apache Kafka as a high performance software bus that facilitates event streaming, logging, analytics, and other data pipeline tasks. With Kafka, you can easily build features like operational data monitoring and large-scale event processing into both large and small-scale applications. About the Book Kafka in Action introduces the core features of Kafka, along with relevant examples of how to use it in real applications. In it, you’ll explore the most common use cases such as logging and managing streaming data. When you’re done, you’ll be ready to handle both basic developer- and admin-based tasks in a Kafka-focused team. What's Inside Kafka as an event streaming platform Kafka producers and consumers from Java applications Kafka as part of a large data project About the Reader For intermediate Java developers or data engineers. No prior knowledge of Kafka required. About the Authors Dylan Scott is a software developer in the insurance industry. Viktor Gamov is a Kafka-focused developer advocate. At Confluent, Dave Klein helps developers, teams, and enterprises harness the power of event streaming with Apache Kafka. Quotes The authors have had many years of real-world experience using Kafka, and this book’s on-the-ground feel really sets it apart. - From the foreword by Jun Rao, Confluent Cofounder A surprisingly accessible introduction to a very complex technology. Developers will want to keep a copy close by. - Conor Redmond, InComm Payments A comprehensive and practical guide to Kafka and the ecosystem. - Sumant Tambe, Linkedin It quickly gave me insight into how Kafka works, and how to design and protect distributed message applications. - Gregor Rayman, Cloudfarms

Building Big Data Pipelines with Apache Beam

2022-01-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jan Lukavský

Beam Big Data Java apache-beam data data-engineering

Building Big Data Pipelines with Apache Beam is the essential guide for mastering data processing using Apache Beam. This book covers both the basics and advanced concepts, from implementing pipelines to extending functionalities with custom I/O connectors. By the end, you'll be equipped to build scalable and reusable big data solutions. What this Book will help me do Understand the core principles of Apache Beam and its architecture. Learn how to create efficient data processing pipelines for diverse scenarios. Master the use of stateful processing for real-time data handling. Gain skills in using Beam's portability features for various languages. Explore advanced functionalities like creating custom I/O connectors. Author(s) None Lukavský is a seasoned data engineer with extensive experience in big data technologies and Apache Beam. Having worked on innovative data solutions across industries, None brings hands-on insights and practical expertise to this book. Their approach to teaching ensures readers can directly apply concepts to real-world scenarios. Who is it for? This book is designed for professionals involved in big data, such as data engineers, analysts, and scientists. It is particularly suited for those with an intermediate level of understanding of Java, aiming to expand their skill set to include advanced data pipeline construction. Whether you're stepping into Apache Beam for the first time or looking to deepen your expertise, this book offers valuable, actionable insights.

Kafka: The Definitive Guide, 2nd Edition

2021-11-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Krit Petty , Rajini Sivaram , Todd Palino , Gwen Shapira

API Kafka Cyber Security Data Streaming data data-engineering

Every enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the data itself. With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover Kafka's AdminClient API, transactions, new security features, and tooling changes. Engineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. You'll examine: Best practices for deploying and configuring Kafka Kafka producers and consumers for writing and reading messages Patterns and use-case requirements to ensure reliable data delivery Best practices for building data pipelines and applications with Kafka How to perform monitoring, tuning, and maintenance tasks with Kafka in production The most critical metrics among Kafka's operational measurements Kafka's delivery capabilities for stream processing systems

talk-data.com

Activity Trend

Top Events

Top Speakers

Practical Data Engineering with Apache Projects: Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

Building Integrations with MuleSoft

Apache Kafka in Action

Streaming Databases

MuleSoft Platform Architect's Guide

Big Data on Kubernetes

Kafka Streams in Action, Second Edition

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

Kafka Connect

Building Real-Time Analytics Systems

Modernize Applications with Apache Kafka

Streaming Data Mesh

Sentient Strategy

Building Real-Time Analytics Applications

Streaming Video Strategies

Unlocking the Value of Real-Time Analytics

Grokking Streaming Systems

Kafka in Action

Building Big Data Pipelines with Apache Beam

Kafka: The Definitive Guide, 2nd Edition