Apache HBase

Professional Hadoop Solutions

2013-09-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kevin T. Smith , Alexey Yakubovich , Boris Lublinsky

API AWS Big Data Hadoop HDFS Java Cyber Security XML data data-engineering

The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise architects need to understand how the Hadoop frameworks and APIs fit together, and how they can be integrated to deliver real-world solutions. This book is a practical, detailed guide to building and implementing those solutions, with code-level instruction in the popular Wrox tradition. It covers storing data with HDFS and Hbase, processing data with MapReduce, and automating data processing with Oozie. Hadoop security, running Hadoop with Amazon Web Services, best practices, and automating Hadoop processes in real time are also covered in depth. With in-depth code examples in Java and XML and the latest on recent additions to the Hadoop ecosystem, this complete resource also covers the use of APIs, exposing their inner workings and allowing architects and developers to better leverage and customize them. The ultimate guide for developers, designers, and architects who need to build and deploy Hadoop applications Covers storing and processing data with various technologies, automating data processing, Hadoop security, and delivering real-time solutions Includes detailed, real-world examples and code-level guidelines Explains when, why, and how to use these tools effectively Written by a team of Hadoop experts in the programmer-to-programmer Wrox style Professional Hadoop Solutions is the reference enterprise architects and developers need to maximize the power of Hadoop.

Apache Sqoop Cookbook

2013-07-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kathleen Ting , Jarek Jarcec Cecho

Big Data DWH GitHub Hadoop Hive MySQL Netezza Oracle RDBMS SQL Teradata data +3 more

Integrating data from multiple sources is essential in the age of big data, but it can be a challenging and time-consuming task. This handy cookbook provides dozens of ready-to-use recipes for using Apache Sqoop, the command-line interface application that optimizes data transfers between relational databases and Hadoop. Sqoop is both powerful and bewildering, but with this cookbook’s problem-solution-discussion format, you’ll quickly learn how to deploy and then apply Sqoop in your environment. The authors provide MySQL, Oracle, and PostgreSQL database examples on GitHub that you can easily adapt for SQL Server, Netezza, Teradata, or other relational systems. Transfer data from a single database table into your Hadoop ecosystem Keep table data and Hadoop in sync by importing data incrementally Import data from more than one database table Customize transferred data by calling various database functions Export generated, processed, or backed-up data from Hadoop to your database Run Sqoop within Oozie, Hadoop’s specialized workflow scheduler Load data into Hadoop’s data warehouse (Hive) or database (HBase) Handle installation, connection, and syntax issues common to specific database vendors

Data Warehousing in the Age of Big Data

2013-05-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Krish Krishnan

Big Data Data Governance DataViz DWH Hadoop Hive NoSQL data data-engineering data-warehouse storage-repositories

Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data–ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory. Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse. Learn how to leverage Big Data by effectively integrating it into your data warehouse. Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

HBase in Action

2012-11-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nick Dimiduk , Amandeep Khurana

Analytics Big Data GIS Hadoop Marketing NoSQL data data-engineering nosql-databases

HBase in Action has all the knowledge you need to design, build, and run applications using HBase. First, it introduces you to the fundamentals of distributed systems and large scale data handling. Then, you'll explore real-world applications and code samples with just enough theory to understand the practical techniques. You'll see how to build applications with HBase and take advantage of the MapReduce processing framework. And along the way you'll learn patterns and best practices. About the Technology HBase is a NoSQL storage system designed for fast, random access to large volumes of data. It runs on commodity hardware and scales smoothly from modest datasets to billions of rows and millions of columns. About the Book HBase in Action is an experience-driven guide that shows you how to design, build, and run applications using HBase. First, it introduces you to the fundamentals of handling big data. Then, you'll explore HBase with the help of real applications and code samples and with just enough theory to back up the practical techniques. You'll take advantage of the MapReduce processing framework and benefit from seeing HBase best practices in action. What's Inside When and how to use HBase Practical examples Design patterns for scalable data systems Deployment, integration, and design About the Reader Written for developers and architects familiar with data storage and processing. No prior knowledge of HBase, Hadoop, or MapReduce is required. About the Authors Nick Dimiduk is a Data Architect with experience in social media analytics, digital marketing, and GIS. Amandeep Khurana is a Solutions Architect focused on building HBase-driven solutions. Quotes Timely, practical ... explains in plain language how to use HBase. - From the Foreword by Michael Stack, Chair of the Apache HBase Project Management Committee A difficult topic lucidly explained. - John Griffin, coauthor of "Hibernate Search in Action" Amusing tongue-in-cheek style that doesn’t detract from the substance. - Charles Pyle, APS Healthcare Learn how to think the HBase way. - Gianluca Righetto, Menttis

HBase Administration Cookbook

2012-08-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Yifeng Jiang

Hadoop data data-engineering nosql-databases

The "HBase Administration Cookbook" is your hands-on guide to mastering HBase administration and configuration. Through practical recipes, this book covers the essential tasks like setting up clusters, optimizing performance, and integrating with the Hadoop ecosystem to manage vast amounts of data effectively. What this Book will help me do Set up and administer HBase clusters for scalability and high availability. Perform routine HBase management tasks confidently and efficiently. Optimize HBase and Hadoop ecosystem settings for maximum performance. Understand troubleshooting to address and resolve typical HBase issues. Leverage advanced configurations for specific read/write-heavy use cases. Author(s) Yifeng Jiang is a seasoned software engineer and database expert with deep experience in working with distributed databases like HBase. He is passionate about teaching and conveying complex concepts through approachable explanations and actionable steps. Yifeng's writing style reflects his hands-on expertise and focus on practical application. Who is it for? This book is designed for system administrators, database managers, and developers looking to master HBase administration and configuration. Whether you are relatively new to HBase with basic familiarity with Hadoop or are an experienced Hadoop administrator wanting to enhance your database management skills, this book provides valuable insights and thorough guidance.

Hadoop: The Definitive Guide, 3rd Edition

2012-05-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom White

API Avro Cloud Computing DWH Hadoop HDFS Hive RDBMS data data-engineering

Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN). Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems

Seven Databases in Seven Weeks

2012-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eric Redmond , Jim R. Wilson

Amazon EC2 Big Data Cloud Computing Data Management DynamoDB ELK Java Linux MongoDB Neo4j NoSQL RDBMS +5 more

Data is getting bigger and more complex by the day, and so are the choices in handling that data. As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once. Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs. What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.

HBase: The Definitive Guide

2011-09-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lars George

API Avro Hadoop Java data data-engineering nosql-databases

If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away. Discover how tight integration with Hadoop makes scalability with HBase easier Distribute large datasets across an inexpensive cluster of commodity servers Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks

Professional NoSQL

2011-09-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shashank Tiwari

Cassandra Hadoop Hive MongoDB NoSQL Redis data data-engineering nosql-databases

A hands-on guide to leveraging NoSQL databases NoSQL databases are an efficient and powerful tool for storing and manipulating vast quantities of data. Most NoSQL databases scale well as data grows. In addition, they are often malleable and flexible enough to accommodate semi-structured and sparse data sets. This comprehensive hands-on guide presents fundamental concepts and practical solutions for getting you ready to use NoSQL databases. Expert author Shashank Tiwari begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs. Professional NoSQL: Demystifies the concepts that relate to NoSQL databases, including column-family oriented stores, key/value databases, and document databases. Delves into installing and configuring a number of NoSQL products and the Hadoop family of products. Explains ways of storing, accessing, and querying data in NoSQL databases through examples that use MongoDB, HBase, Cassandra, Redis, CouchDB, Google App Engine Datastore and more. Looks at architecture and internals. Provides guidelines for optimal usage, performance tuning, and scalable configurations. Presents a number of tools and utilities relating to NoSQL, distributed platforms, and scalable processing, including Hive, Pig, RRDtool, Nagios, and more.

Hadoop: The Definitive Guide, 2nd Edition

2010-10-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom White

Avro Cloud Computing DWH Hadoop HDFS Hive data data-engineering

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters. This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book. Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase, Hadoop’s database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems "Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk." --Doug Cutting, Cloudera

Hadoop: The Definitive Guide

2009-06-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom White

Cloud Computing Hadoop HDFS data data-engineering

Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!

talk-data.com

Activity Trend

Top Events

Top Speakers

Professional Hadoop Solutions

Apache Sqoop Cookbook

Data Warehousing in the Age of Big Data

HBase in Action

HBase Administration Cookbook

Hadoop: The Definitive Guide, 3rd Edition

Seven Databases in Seven Weeks

HBase: The Definitive Guide

Professional NoSQL

Hadoop: The Definitive Guide, 2nd Edition

Hadoop: The Definitive Guide