talk-data.com talk-data.com

S

Speaker

Steven Hoffman

2

talks

author
Filtering by: O'Reilly Data Engineering Books ×

Filter by Event / Source

Talks & appearances

Showing 2 of 2 activities

Search activities →
Apache Flume: Distributed Log Collection for Hadoop - Second Edition

"Apache Flume: Distributed Log Collection for Hadoop - Second Edition" is your hands-on guide to learning how to use Apache Flume to reliably collect and move logs and data streams into your Hadoop ecosystem. Through practical examples and real-world scenarios, this book will help you master the setup, configuration, and optimization of Flume for various data ingestion use cases. What this Book will help me do Understand the key concepts and architecture behind Apache Flume to build reliable and scalable data ingestion systems. Set up Flume agents to collect and transfer data into the Hadoop File System (HDFS) or other storage solutions effectively. Learn stream data processing techniques, such as filtering, transforming, and enriching data during transit to improve data usability. Integrate Flume with other tools like Elasticsearch and Solr to enhance analytics and search capabilities. Implement monitoring and troubleshooting workflows to maintain healthy and optimized Flume data pipelines. Author(s) Steven Hoffman, a seasoned software developer and data engineer, brings years of practical experience working with big data technologies to this book. He has a strong background in distributed systems and big data solutions, having implemented enterprise-scale analytics projects. Through clear and approachable writing, he aims to empower readers to successfully deploy reliable data pipelines using Apache Flume. Who is it for? This book is written for Hadoop developers, data engineers, and IT professionals who seek to build robust pipelines for streaming data into Hadoop environments. It is ideal for readers who have a basic understanding of Hadoop and HDFS but are new to Apache Flume. If you are looking to enhance your analytics capabilities by efficiently ingesting, routing, and processing streaming data, this book is for you. Beginners as well as experienced engineers looking to dive deeper into Flume will find it insightful.

Apache Flume: Distributed Log Collection for Hadoop

Apache Flume: Distributed Log Collection for Hadoop is a focused guide for users looking to efficiently collect and transport log data into systems like Hadoop using Apache Flume. Its step-by-step approach covers the installation, configuration, and customization of Flume to optimize your data ingestion workflows. What this Book will help me do Effectively install and set up Apache Flume for your data ingestion processes. Understand Flume's architecture and capabilities, including sources, channels, and sinks. Learn to configure reliable data flow paths using failover and load-balancing techniques. Implement data routing and transformations during data flow using Flume. Optimize and monitor your Flume operations to enhance reliability and performance. Author(s) The authors of this book are experienced software engineers and data administrators with deep knowledge and practical expertise in implementing distributed log collection systems. Their teaching approach combines clear explanation with actionable examples to give you a hands-on learning experience. Who is it for? This book is ideal for software engineers, data engineers, and system administrators involved in handling and transporting datasets, especially those with a focus on Hadoop. If you are seeking to understand or optimize Apache Flume for your data processing pipeline, this book will guide you from beginner-friendly setup to advanced customization, helping to enhance your workflows.