NoSQL

Resilience and Reliability on AWS

2013-01-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Flavia Paganelli , Jasper Geurtsen , Jurg van Vliet

AWS Cloud Computing ELK MongoDB RDBMS Redis data data-engineering nosql-databases postgresql

Cloud services are just as susceptible to network outages as any other platform. This concise book shows you how to prepare for potentially devastating interruptions by building your own resilient and reliable applications in the public cloud. Guided by engineers from 9apps—an independent provider of Amazon Web Services and Eucalyptus cloud solutions—you’ll learn how to combine AWS with open source tools such as PostgreSQL, MongoDB, and Redis. This isn’t a book on theory. With detailed examples, sample scripts, and solid advice, software engineers with operations experience will learn specific techniques that 9apps routinely uses in its cloud infrastructures. Build cloud applications with the "rip, mix, and burn" approach Get a crash course on Amazon Web Services Learn the top ten tips for surviving outages in the cloud Use elasticsearch to build a dependable NoSQL data store Combine AWS and PostgreSQL to build an RDBMS that scales well Create a highly available document database with MongoDB Replica Set and SimpleDB Augment Redis with AWS to provide backup/restore, failover, and monitoring capabilities Work with CloudFront and Route 53 to safeguard global content delivery

HBase in Action

2012-11-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nick Dimiduk , Amandeep Khurana

Analytics Big Data GIS Hadoop Apache HBase Marketing data data-engineering nosql-databases

HBase in Action has all the knowledge you need to design, build, and run applications using HBase. First, it introduces you to the fundamentals of distributed systems and large scale data handling. Then, you'll explore real-world applications and code samples with just enough theory to understand the practical techniques. You'll see how to build applications with HBase and take advantage of the MapReduce processing framework. And along the way you'll learn patterns and best practices. About the Technology HBase is a NoSQL storage system designed for fast, random access to large volumes of data. It runs on commodity hardware and scales smoothly from modest datasets to billions of rows and millions of columns. About the Book HBase in Action is an experience-driven guide that shows you how to design, build, and run applications using HBase. First, it introduces you to the fundamentals of handling big data. Then, you'll explore HBase with the help of real applications and code samples and with just enough theory to back up the practical techniques. You'll take advantage of the MapReduce processing framework and benefit from seeing HBase best practices in action. What's Inside When and how to use HBase Practical examples Design patterns for scalable data systems Deployment, integration, and design About the Reader Written for developers and architects familiar with data storage and processing. No prior knowledge of HBase, Hadoop, or MapReduce is required. About the Authors Nick Dimiduk is a Data Architect with experience in social media analytics, digital marketing, and GIS. Amandeep Khurana is a Solutions Architect focused on building HBase-driven solutions. Quotes Timely, practical ... explains in plain language how to use HBase. - From the Foreword by Michael Stack, Chair of the Apache HBase Project Management Committee A difficult topic lucidly explained. - John Griffin, coauthor of "Hibernate Search in Action" Amusing tongue-in-cheek style that doesn’t detract from the substance. - Charles Pyle, APS Healthcare Learn how to think the HBase way. - Gianluca Righetto, Menttis

Programming Hive

2012-09-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dean Wampler , Jason Rutherglen , Edward Capriolo

DWH ELK Hadoop Hive SQL apache-hive data data-engineering

Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem. This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data. Use Hive to create, alter, and drop databases, tables, views, functions, and indexes Customize data formats and storage options, from files to external databases Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods Gain best practices for creating user defined functions (UDFs) Learn Hive patterns you should use and anti-patterns you should avoid Integrate Hive with other data processing programs Use storage handlers for NoSQL databases and other datastores Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce

NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence

2012-08-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pramod Sadalage , Martin Fowler (Thoughtworks)

Cassandra MongoDB Neo4j RDBMS data data-engineering nosql-databases

The need to handle increasingly larger data volumes is one factor driving the adoption of a new class of nonrelational “NoSQL” databases. Advocates of NoSQL databases claim they can be used to build systems that are more performant, scale better, and are easier to program. NoSQL Distilled is a concise but thorough introduction to this rapidly emerging technology. Pramod J. Sadalage and Martin Fowler explain how NoSQL databases work and the ways that they may be a superior alternative to a traditional RDBMS. The authors provide a fast-paced guide to the concepts you need to know in order to evaluate whether NoSQL databases are right for your needs and, if so, which technologies you should explore further. The first part of the book concentrates on core concepts, including schemaless data models, aggregates, new distribution models, the CAP theorem, and map-reduce. In the second part, the authors explore architectural and design issues associated with implementing NoSQL. They also present realistic use cases that demonstrate NoSQL databases at work and feature representative examples using Riak, MongoDB, Cassandra, and Neo4j. In addition, by drawing on Pramod Sadalage’s pioneering work, NoSQL Distilled shows how to implement evolutionary design with schema migration: an essential technique for applying NoSQL databases. The book concludes by describing how NoSQL is ushering in a new age of Polyglot Persistence, where multiple data-storage worlds coexist, and architects can choose the technology best optimized for each type of data access.

Seven Databases in Seven Weeks

2012-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eric Redmond , Jim R. Wilson

Amazon EC2 Big Data Cloud Computing Data Management DynamoDB ELK Apache HBase Java Linux MongoDB Neo4j RDBMS +5 more

Data is getting bigger and more complex by the day, and so are the choices in handling that data. As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once. Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs. What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.

Planning for Big Data

2012-03-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Edd Wilder-James

Big Data Hadoop data data-engineering

In an age where everything is measurable, understanding big data is an essential. From creating new data-driven products through to increasing operational efficiency, big data has the potential to make your organization both more competitive and more innovative. As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven. Written by O'Reilly Radar's experts on big data, this anthology describes: The broad industry changes heralded by the big data era What big data is, what it means to your business, and how to start solving data problems The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions The landscape of NoSQL databases and their relative merits How visualization plays an important part in data work

MongoDB in Action

2011-12-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kyle Banker

Analytics Big Data Data Modelling MongoDB data data-engineering nosql-databases

NEWER EDITION AVAILABLE MongoDB in Action, Second Edition is now available. An eBook of this older edition is included at no additional cost when you buy the revised edition! A limited number of pBook copies of this edition are still available. Please contact Manning Support to inquire about purchasing previous edition copies. MongoDB in Action is a comprehensive guide to MongoDB for application developers. The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications. About the Technology Big data can mean big headaches. MongoDB is a document-oriented database designed to be flexible, scalable, and very fast, even with big data loads. It's built for high availability, supports rich, dynamic schemas, and lets you easily distribute data across multiple servers. About the Book MongoDB in Action introduces you to MongoDB and the document-oriented database model. This perfectly paced book provides both the big picture you'll need as a developer and enough low-level detail to satisfy a system engineer. Numerous examples will help you develop confidence in the crucial area of data modeling. You'll also love the deep explanations of each feature, including replication, auto-sharding, and deployment. What's Inside Indexes, queries, and standard DB operations Map-reduce for custom aggregations and reporting Schema design patterns Deploying for scale and high availability About the Reader Written for developers. No MongoDB or NoSQL experience required. About the Author Kyle Banker is a software engineer at 10gen where he maintains the official MongoDB drivers for Ruby and C. Quotes Awesome! MongoDB in a nutshell. - Hardy Ferentschik, Red Hat Excellent. Many practical examples. - Curtis Miller, Flatterline Not only the how, but also the why. - Philip Hallstrom, PJKH, LLC Has a developer-centric flavor--an excellent reference. - Rick Wagner, Red Hat A must-read. - Daniel Bretoi, Advanced Energy

Big Data Glossary

2011-09-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pete Warden

AI/ML Big Data NLP SQL data data-engineering nosql-databases

To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment. This handy glossary also includes a chapter of key terms that help define many of these tool categories: NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL MapReduce—Tools that support distributed computing on large datasets Storage—Technologies for storing data in a distributed way Servers—Ways to rent computing power on remote machines Processing—Tools for extracting valuable information from large datasets Natural Language Processing—Methods for extracting information from human-created text Machine Learning—Tools that automatically perform data analyses, based on results of a one-off analysis Visualization—Applications that present meaningful data graphically Acquisition—Techniques for cleaning up messy public data sources Serialization—Methods to convert data structure or object state into a storable format

Professional NoSQL

2011-09-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shashank Tiwari

Cassandra Hadoop Apache HBase Hive MongoDB Redis data data-engineering nosql-databases

A hands-on guide to leveraging NoSQL databases NoSQL databases are an efficient and powerful tool for storing and manipulating vast quantities of data. Most NoSQL databases scale well as data grows. In addition, they are often malleable and flexible enough to accommodate semi-structured and sparse data sets. This comprehensive hands-on guide presents fundamental concepts and practical solutions for getting you ready to use NoSQL databases. Expert author Shashank Tiwari begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs. Professional NoSQL: Demystifies the concepts that relate to NoSQL databases, including column-family oriented stores, key/value databases, and document databases. Delves into installing and configuring a number of NoSQL products and the Hadoop family of products. Explains ways of storing, accessing, and querying data in NoSQL databases through examples that use MongoDB, HBase, Cassandra, Redis, CouchDB, Google App Engine Datastore and more. Looks at architecture and internals. Provides guidelines for optimal usage, performance tuning, and scalable configurations. Presents a number of tools and utilities relating to NoSQL, distributed platforms, and scalable processing, including Hive, Pig, RRDtool, Nagios, and more.

Hadoop in Action

2010-11-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chuck Lam

API Big Data Hadoop Hive Java Data Streaming data data-engineering

Hadoop in Action introduces the subject and teaches you how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming. About the Technology Big data can be difficult to handle using traditional databases. Apache Hadoop is a NoSQL applications framework that runs on distributed clusters. This lets it scale to huge datasets. If you need analytic information from your data, Hadoop's the way to go. About the Book What's Inside Introduction to MapReduce Examples illustrating ideas in practice Hadoop's Streaming API Other related tools, like Pig and Hive About the Reader This book requires basic Java skills. Knowing basic statistical concepts can help with the more advanced examples. About the Author Chuck Lam is a Senior Engineer at RockYou! He has a PhD in pattern recognition from Stanford University. Quotes A guide for beginners, a source of insight for advanced users. - Philipp K. Janert, Principal Value, LLC A nice mix of the what, why, and how of Hadoop. - Paul Stusiak, Falcon Technologies Corp. Demystifies Hadoop. A great resource! - Rick Wagner, Acxiom Corp. Covers it all! Plus, gives you sweet extras no one else does. - John S. Griffin, Overstock.com An excellent introduction to Hadoop and MapReduce. - Kenneth DeLong, BabyCenter, LLC

The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing

2010-09-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eelco Plugge , Peter Membrey , Tim Hawkins

Cloud Computing MongoDB MySQL SQL data data-engineering nosql-databases

MongoDB, a cross-platform NoSQL database, is the fastest-growing new database in the world. MongoDB provides a rich document orientated structure with dynamic queries that you'll recognize from RDMBS offerings such as MySQL. In other words, this is a book about a NoSQL database that does not require the SQL crowd to re-learn how the database world works! MongoDB has reached 1.0 and already boasts 50,000+ users. The community is strong and vibrant and MongoDB is improving at a fast rate. With scalable and fast databases becoming critical for today's applications, this book shows you how to install, administer and program MongoDB without pretending SQL never existed.

talk-data.com

Activity Trend

Top Events

Top Speakers

Resilience and Reliability on AWS

HBase in Action

Programming Hive

NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence

Seven Databases in Seven Weeks

Planning for Big Data

MongoDB in Action

Big Data Glossary

Professional NoSQL

Hadoop in Action

The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing