pig

Programming Pig, 2nd Edition

2016-11-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alan Gates , Daniel Dai

Data Modelling Hadoop HDFS Python data data-engineering

For many organizations, Hadoop is the first step for dealing with massive amounts of data. The next step? Processing and analyzing datasets with the Apache Pig scripting platform. With Pig, you can batch-process data without having to create a full-fledged application, making it easy to experiment with new datasets. Updated with use cases and programming examples, this second edition is the ideal learning tool for new and experienced users alike. You’ll find comprehensive coverage on key features such as the Pig Latin scripting language and the Grunt shell. When you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig. Delve into Pig’s data model, including scalar and complex data types Write Pig Latin scripts to sort, group, join, project, and filter your data Use Grunt to work with the Hadoop Distributed File System (HDFS) Build complex data processing pipelines with Pig’s macros and modularity features Embed Pig Latin in Python for iterative processing and other advanced tasks Use Pig with Apache Tez to build high-performance batch and interactive data processing applications Create your own load and store functions to handle data formats and storage mechanisms

Big Data for Chimps

2015-09-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Philip (flip) Kromer , Russell Jurney

Big Data Hadoop Python data data-engineering

Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working with real data and real problems. Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data. Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster Dive into map/reduce mechanics and build your first map/reduce job in Python Understand how to run chains of map/reduce jobs in the form of Pig scripts Use a real-world dataset—baseball performance statistics—throughout the book Work with examples of several analytic patterns, and learn when and where you might use them

Pig Design Patterns

2014-04-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pradeep Pasupuleti

Analytics Big Data Data Analytics Hadoop data data-engineering

Discover how to simplify Hadoop programming with Pig Design Patterns, helping you create innovative enterprise-level big data solutions. This book takes you step-by-step through practical design patterns for creating efficient data processing workflows with Apache Pig. What this Book will help me do Understand and implement fundamental data processing patterns with Pig. Master advanced Pig techniques for Big Data analytics. Learn to optimize Pig scripts for performance and scalability. Build end-to-end data processing solutions with real-world examples. Integrate Pig workflows into the broader Hadoop ecosystem. Author(s) Pradeep Pasupuleti is an experienced data engineer and software developer specializing in Big Data technologies. With extensive expertise in Hadoop and Pig, Pradeep shares valuable insights and practical techniques beginners and experts alike will appreciate. Who is it for? This book is perfect for software developers and data engineers working with Hadoop who want to streamline their workflow. It is ideal for professionals already familiar with Pig and Hadoop basics looking to advance. It also suits learners aiming to implement optimized data solutions effectively.

Programming Pig

2011-10-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alan Gates

Data Modelling Hadoop HDFS Python data data-engineering

This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets. Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig. Delve into Pig’s data model, including scalar and complex data types Write Pig Latin scripts to sort, group, join, project, and filter your data Use Grunt to work with the Hadoop Distributed File System (HDFS) Build complex data processing pipelines with Pig’s macros and modularity features Embed Pig Latin in Python for iterative processing and other advanced tasks Create your own load and store functions to handle data formats and storage mechanisms Get performance tips for running scripts on Hadoop clusters in less time

talk-data.com

Activity Trend

Top Events

Top Speakers

Programming Pig, 2nd Edition

Big Data for Chimps

Pig Design Patterns

Programming Pig