data

Current State of Big Data Use in Retail Supply Chains

2015-05-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by CSCMP

Big Data data-engineering

Innovation, consisting of invention, adoption, and deployment of new technology and associated process improvements, is a key source of competitive advantages. Big Data is an innovation that has been gaining prominence in retailing and other industries. In fact, managers working in retail supply chain member firms (that is, retailers, manufacturers, distributors, wholesalers, logistics providers, and other service providers) have increasingly been trying to understand what Big Data entails, what it may be used for, and how to make it an integral part of their businesses. This report covers Big Data use, with focus on applications for retail supply chains. The authors’ findings suggest that Big Data use in retail supply chains is still generally elusive. Although most managers have reported initial, and in some cases some significant efforts in analyzing large sets of data for decision making, various challenges confine these data to a range of use spanning traditional, transactional data.

Big Data

2015-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by James Warren , Nathan Marz

AI/ML Analytics AWS Lambda Big Data Cassandra Hadoop NoSQL data-engineering

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built. About the Technology About the Book Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive. Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm, and NoSQL databases. What's Inside Introduction to big data systems Real-time processing of web-scale data Tools like Hadoop, Cassandra, and Storm Extensions to traditional database skills About the Reader This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful. About the Authors Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing. Quotes Transcends individual tools or platforms. Required reading for anyone working with big data systems. - Jonathan Esterhazy, Groupon A comprehensive, example-driven tour of the Lambda Architecture with its originator as your guide. - Mark Fisher, Pivotal Contains wisdom that can only be gathered after tackling many big data projects. A must-read. - Pere Ferrera Bertran, Datasalt The de facto guide to streamlining your data pipeline in batch and near-real time. - Alex Holmes, Author of "Hadoop in Practice"

DS8870 Data Migration Techniques

2015-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bjoern Wesselbaum , Susan Douglass , Mark Wells , Peter Klee , Bertrand Dufrasne , Joachim Appel , Miroslaw Pura , Werner Bauer , Alexander Warmuth

IBM data-engineering data-migration

This IBM® Redbooks® publication describes data migrations between IBM DS8000® storage systems, where in most cases one or more older DS8000 models are being replaced by the newer DS8870 model. Most of the migration methods are based on the DS8000 Copy Services. The book includes considerations for solutions such as IBM Tivoli® Productivity Center for Replication and the IBM Geographically Dispersed Parallel Sysplex™ (GDPS®) used in IBM z/OS® environments. Both offerings are primarily designed to enable a disaster recovery using DS8000 Copy Services. In most data migration cases, Tivoli Productivity Center for Replication or GDPS will not directly provide functions for the data migration itself. However, this book explains how to bring the new migrated environment back into the control of GDPS or Tivoli Productivity Center for Replication. In addition to the Copy Services based migrations, the book also covers host-based mirroring techniques, using IBM Transparent Data Migration Facility (TDMF®) for z/OS and the z/OS Dataset Mobility Facility (zDMF).

PostgreSQL 9 Administration Cookbook - Second Edition

2015-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Simon Riggs , Gabriele Bartolini , Gianni Ciolli , Hannu Krosing

RDBMS data-engineering postgresql relational-databases

Master PostgreSQL 9.4 with this hands-on cookbook featuring over 150 practical and easy-to-follow recipes that will bring you up to speed with PostgreSQL's latest features. You'll learn how to create, manage, and optimize a PostgreSQL-based database, focusing on vital aspects like performance and reliability. What this Book will help me do Efficiently configure PostgreSQL databases for optimal performance. Deploy robust backup and recovery strategies to ensure data reliability. Utilize PostgreSQL's replication features for improved high availability. Implement advanced queries and analyze large datasets effectively. Optimize database structure and functionality for application needs. Author(s) Simon Riggs, Gianni Ciolli, and their co-authors are seasoned database professionals with extensive experience in PostgreSQL administration and development. They have a complementary blend of skills, comprising practical system knowledge, teaching, and authoritative writing. Their hands-on experience translates seamlessly into accessible yet informative content. Who is it for? This book is ideal for database administrators and developers who are looking to enhance their skills with PostgreSQL, especially version 9. If you have some prior experience with relational databases and want practical guidance on optimizing, managing, and mastering PostgreSQL, this resource is tailored for you.

Hadoop Essentials

2015-04-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shiva Achari

Analytics Big Data Data Analytics Data Management Hadoop HDFS Hive Spark data-engineering

In 'Hadoop Essentials,' you'll embark on an engaging journey to master the Hadoop ecosystem. This book covers fundamental to advanced topics, from HDFS and MapReduce to real-time analytics with Spark, empowering you to handle modern data challenges efficiently. What this Book will help me do Understand the core components of Hadoop, including HDFS, YARN, and MapReduce, for foundational knowledge. Learn to optimize Big Data architectures and improve application performance. Utilize tools like Hive and Pig for efficient data querying and processing. Master data ingestion technologies like Sqoop and Flume for seamless data management. Achieve fluency in real-time data analytics using modern tools like Apache Spark and Apache Storm. Author(s) None Achari is a seasoned expert in Big Data and distributed systems with in-depth knowledge of the Hadoop ecosystem. With years of experience in both development and teaching, they craft content that bridges practical know-how with theoretical insights in a highly accessible style. Who is it for? This book is perfect for system and application developers aiming to learn practical applications of Hadoop. It suits professionals seeking solutions to real-world Big Data challenges as well as those familiar with distributed systems basics and looking to deepen their expertise in advanced data analysis.

IBM z13 Configuration Setup

2015-04-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Klaus Horn , Paolo Bruni , Mark Challen , Kazuhiro Nakajima , Tom Carielli , Martin Soellig , Peter A. Hoyle , Peter Hoyle

IBM data-engineering

This IBM® Redbooks® publication helps you install, configure, and maintain the IBM z13™. The z13 offers new functions that require a comprehensive understanding of the available configuration options. This book presents configuration setup scenarios, and describes implementation examples in detail. This publication is intended for systems engineers, hardware planners, and anyone who needs to understand IBM z Systems™ configuration and implementation. Readers should be generally familiar with current IBM z Systems technology and terminology. For details about the functions of the z13, see IBM z13 Technical Introduction, SG24-8250 and IBM z13 Technical Guide, SG24-8251.

Modeling Food Processing Operations

2015-04-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter J Fryer , Kai Knoerzer , Serafim Bakalis

data-engineering data-models

Computational modeling is an important tool for understanding and improving food processing and manufacturing. It is used for many different purposes, including process design and process optimization. However, modeling goes beyond the process and can include applications to understand and optimize food storage and the food supply chain, and to perform a life cycle analysis. Modeling Food Processing Operations provides a comprehensive overview of the various applications of modeling in conventional food processing. The needs of industry, current practices, and state-of-the-art technologies are examined, and case studies are provided. Part One provides an introduction to the topic, with a particular focus on modeling and simulation strategies in food processing operations. Part Two reviews the modeling of various food processes involving heating and cooling. These processes include: thermal inactivation; sterilization and pasteurization; drying; baking; frying; and chilled and frozen food processing, storage and display. Part Three examines the modeling of multiphase unit operations such as membrane separation, extrusion processes and food digestion, and reviews models used to optimize food distribution. Comprehensively reviews the various applications of modeling in conventional food processing Examines the modeling of multiphase unit operations and various food processes involving heating and cooling Analyzes the models used to optimize food distribution

Implementing the IBM System Storage SAN Volume Controller V7.4

2015-04-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marcin Tabinowski , Jon Tate , Frank Enders , Libor Miklas , Hartmut Lonzer , Torben Jensen

IBM data-engineering ibm-system-storage ibm-system-storage-san-volume-controller

This IBM® Redbooks® publication is a detailed technical guide to the IBM System Storage® SAN Volume Controller Version 7.4. The SAN Volume Controller (SVC) is a virtualization appliance solution, which maps virtualized volumes that are visible to hosts and applications to physical volumes on storage devices. Each server within the storage area network (SAN) has its own set of virtual storage addresses that are mapped to physical addresses. If the physical addresses change, the server continues running by using the same virtual addresses that it had before. Therefore, volumes or storage can be added or moved while the server is still running. The IBM virtualization technology improves the management of information at the “block” level in a network, which enables applications and servers to share storage devices on a network. This book is intended for readers who want to implement the SVC at a 7.4 release level with minimal effort.

Apache Solr Search Patterns

2015-04-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jayant Kumar

Analytics ELK data-engineering search solr

Master Elasticsearch as you uncover advanced Solr techniques in this professional guide. This book dives deeply into deploying and optimizing Solr-powered search engines and explores high-performance techniques. Learn to leverage your data with accessible, comprehensive, and practical insights. What this Book will help me do Learn to customize Solr's query scorer to provide tailored search results. Understand the internals of Solr, including indexing and query facilities, for better optimization. Implement scalable and reliable search clusters using SolrCloud. Explore the use of Solr for spatial, e-commerce, and advertising searches. Combine Solr with front-end technologies like AJAX and advanced tagging with FSTs. Author(s) Jayant Kumar, an experienced developer and search solutions architect, specializes in leveraging Apache Solr. With years of practical experience, he brings unique insights into scaling search platforms. His commitment to imparting clear, actionable knowledge is reflected in this focused resource. Who is it for? This book is ideal for software developers and architects embedded in the Solr ecosystem looking to enhance their expertise. If you are seeking to develop advanced and scalable solutions, master Solr's core capabilities, or improve your analytics and graph-generating skills, this book will support your goals.

PostGIS in Action, Second Edition

2015-04-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leo S. Hsu , Regina Obe

GIS SQL data-engineering geographic-information-system-gis location-data postgis postgresql

PostGIS in Action, Second Edition teaches readers of all levels to write spatial queries that solve real-world problems. It first gives you a background in vector-, raster-, and topology-based GIS and then quickly moves into analyzing, viewing, and mapping data. This second edition covers PostGIS 2.0 and 2.1 series, PostgreSQL 9.1, 9.2, and 9.3 features, and shows you how to integrate with other GIS tools. About the Technology About the Book Processing data tied to location and topology requires specialized know-how. PostGIS is a free spatial database extender for PostgreSQL, every bit as good as proprietary software. With it, you can easily create location-aware queries in just a few lines of SQL code and build the back end for a mapping, raster analysis, or routing application with minimal effort. PostGIS in Action, Second Edition teaches you to solve real-world geodata problems. It first gives you a background in vector-, raster-, and topology-based GIS and then quickly moves into analyzing, viewing, and mapping data. You'll learn how to optimize queries for maximum speed, simplify geometries for greater efficiency, and create custom functions for your own applications. You'll also learn how to apply your existing GIS knowledge to PostGIS and integrate with other GIS tools. Familiarity with relational database and GIS concepts is helpful but not required. What's Inside An introduction to spatial databases Geometry, geography, raster, and topology spatial types, functions, and queries Applying PostGIS to real-world problems Extending PostGIS to web and desktop applications Updated for PostGIS 2.x and PostgreSQL 9.x About the Reader About the Authors Regina Obe and Leo Hsu are database consultants and authors. Regina is a member of the PostGIS core development team and the Project Steering Committee. Quotes A huge body of information distilled into a concise guide. - From the Foreword by Paul Ramsey, Chair PostGIS Steering Committee A more-than-worthy update to "the" definitive book on PostGIS. - Jonathan DeCarlo, Bentley Systems, Inc. The most comprehensive guide to spatial data on PostgreSQL. - Sergio Arbeo, codecantor.com Provides the science and the tools needed to create innovative applications for the new digital age. - Guy Ndjeng, NTSP

IBM zPDT Guide and Reference: System z Personal Development Tool

2015-04-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bill Ogden

IBM Linux data-engineering

This IBM® Redpaper Redbooks® publication provides both introductory information and technical details for the IBM System z® Personal Development Tool (IBM zPDT®), which produces a small System z environment suitable for application development. zPDT is a PC Linux application. When zPDT is installed (on Linux), normal System z Operating Systems (such as IBM z/OS®) may be run on it. zPDT provides the basic System z architecture and provides emulated IBM 3390 disk drives, 3270 interfaces, OSA interfaces, and so forth. This current document merges four separate previous Redbooks publications into this single book. The primary reason for this merger is to provide simpler zPDT documentation usage when viewing or searching the documentation onscreen. The systems that are discussed in this document are complex, with elements of Linux (for the underlying PC machine), IBM z/Architecture® (for the core zPDT elements), System z I/O functions (for emulated I/O devices), z/OS (the most common System z operating system), and various applications and subsystems under z/OS. We assume that the reader is familiar with general concepts and terminology of System z hardware and software elements, and with basic PC Linux characteristics. This book provides the primary documentation for zPDT and includes basic system overview, installation, operation, z/OS distribution, FAQs.

IBM z13 Technical Guide

2015-04-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hans-Peter Eckam , Frank Packheiser , Ewerson Palacio , Parwez Hamid , Steven LaFalce , Rakesh Krishnakumar , Octavian Lascu , Maurício Andozia Nogueira , Erik Bakker , Lourenço Luitgards Moura Neto , Andre Spahni , Giancarlo Rodolfi

Analytics Cloud Computing IBM Cyber Security data-engineering

Digital business has been driving the transformation of underlying IT infrastructure to be more efficient, secure, adaptive, and integrated. Information Technology (IT) must be able to handle the explosive growth of mobile clients and employees. IT also must be able to use enormous amounts of data to provide deep and real-time insights to help achieve the greatest business impact. This IBM® Redbooks® publication addresses the new IBM Mainframe, the IBM z13. The IBM z13 is the trusted enterprise platform for integrating data, transactions, and insight. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It needs to be an integrated infrastructure that can support new applications. It needs to have integrated capabilities that can provide new mobile capabilities with real-time analytics delivered by a secure cloud infrastructure. IBM z13 is designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows the z13 to deliver a record level of capacity over the prior z Systems. In its maximum configuration, z13 is powered by up to 141 client characterizable microprocessors (cores) running at 5 GHz. This configuration can run more than 110,000 millions of instructions per second (MIPS) and up to 10 TB of client memory. The IBM z13 Model NE1 is estimated to provide up to 40% more total system capacity than the IBM zEnterprise® EC12 (zEC1) Model HA1. This book provides information about the IBM z13 and its functions, features, and associated software support. Greater detail is offered in areas relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM z Systems functions and plan for their usage. It is not intended as an introduction to mainframes. Readers are expected to be generally familiar with existing IBM z Systems technology and terminology.

NoSQL for Mere Mortals®

2015-04-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dan Sullivan

Cassandra MongoDB Neo4j NoSQL RDBMS Redis SQL data-engineering nosql-databases

NoSQL was developed to overcome the limitations of relational databases in the largest Web applications at companies such as Google, Yahoo and Facebook. As it is applied more widely, developers are finding that it can simplify scalability while requiring far less coding and management overhead. However, NoSQL requires fundamentally different approaches to database design and modeling, and many conventional relational techniques lead to suboptimal results. NoSQL for Mere Mortals is an easy, practical guide to succeeding with NoSQL in your environment. Following the classic, best-selling format pioneered in SQL Queries for Mere Mortals, enterprise database expert Dan Sullivan guides you step-by-step through choosing technologies, designing high-performance databases, and planning for long-term maintenance. Sullivan introduces each type of NoSQL database, shows how to install and manage them, and demonstrates how to leverage their features while avoiding common mistakes that lead to poor performance and unmet requirements. He uses four popular NoSQL databases as reference models: MongoDB, a document database; Cassandra, a column family data store; Redis, a key-value database; and Neo4j, a graph database. You'll find explanations of each database's structure and capabilities, practical guidelines for choosing amongst them, and expert guidance on designing databases with them. Packed with examples, NoSQL for Mere Mortals is today's best way to master NoSQL—whether you're a DBA, developer, user, or student.

The Security Data Lake

2015-04-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Raffael Marty

Analytics Data Analytics Data Lake Hadoop Cyber Security SQL data-engineering data-lake storage-repositories

Companies of all sizes are considering data lakes as a way to deal with terabytes of security data that can help them conduct forensic investigations and serve as an early indicator to identify bad or relevant behavior. Many think about replacing their existing SIEM (security information and event management) systems with Hadoop running on commodity hardware. Before your company jumps into the deep end, you first need to weigh several critical factors. This O'Reilly report takes you through technological and design options for implementing a data lake. Each option not only supports your data analytics use cases, but is also accessible by processes, workflows, third-party tools, and teams across your organization. Within this report, you'll explore: Five questions to ask before choosing architecture for your backend data store How data lakes can overcome scalability and data duplication issues Different options for storing context and unstructured log data Data access use cases covering both search and analytical queries via SQL Processes necessary for ingesting data into a data lake, including parsing, enrichment, and aggregation Four methods for embedding your SIEM into a data lake

IBM z Systems Connectivity Handbook

2015-04-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Packheiser , Ewerson Palacio , Bill White

IBM data-engineering

This IBM® Redbooks® publication describes the connectivity options available for use within and beyond the data center for the IBM z Systems® family of mainframes, which includes these systems: IBM z Systems® z13 (z13) IBM zEnterprise® EC12 (zEC12) IBM zEnterprise BC12 (zBC12) IBM zEnterprise 196 (z196) IBM zEnterprise 114 (z114) This book highlights the hardware and software components, functions, typical uses, coexistence, and relative merits of these connectivity features. It helps readers understand the connectivity alternatives that are available when planning and designing their data center infrastructures. The changes to this edition are based on the z Systems hardware announcement dated January 14, 2015. This book is intended for data center planners, IT professionals, systems engineers, technical sales staff, and network planners who are involved in the planning of connectivity solutions for IBM z Systems.

Jump Start MySQL

2015-04-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Timothy Boronczyk

MySQL RDBMS data-engineering relational-databases

Get a Jump Start on working with MySQL today! MySQL is an extremely popular open source relational database management system that that powers many of the applications on the Web. Discover why MySQL's speed, ease of use, and flexibility make it the database of choice for so many developers. In just one weekend with this hands-on tutorial, you'll learn how to: Get started with MySQL Store, modify, and retrieve data Work with multiple tables Connect to your database through code Program the database Back up your data

Advanced Analytics with Spark

2015-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sandy Ryza (Databricks) , Sean Owen (Databricks) , Josh Wills , Uri Laserson

AI/ML Analytics Java Python Scala Cyber Security Spark apache-spark data-engineering

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

IBM GDPS Active/Active Overview and Planning

2015-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Juliet Candee , Shu Xie , Jiong Fan , Sidney Varoni Jr. , Lydia Parziale , Paulo Shimizu

IBM data-engineering

IBM® Geographically Dispersed Parallel Sysplex™ (GDPS®) is a collection of several offerings, each addressing a different set of IT resiliency goals. It can be tailored to meet the recovery point objective (RPO), which is how much data can you are willing to lose or recreate, and the recovery time objective (RTO), which identifies how long can you afford to be without your systems for your business from the initial outage to having your critical business processes available to users. Each offering uses a combination of server and storage hardware or software-based replication, and automation and clustering software technologies. This IBM Redbooks® publication presents an overview of the IBM GDPS active/active (GDPS/AA) offering and the role it plays in delivering a business IT resilience solution.

Real-World Hadoop

2015-04-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ellen Friedman , Ted Dunning

Big Data Hadoop Apache HBase NoSQL Spark data-engineering

If you’re a business team leader, CIO, business analyst, or developer interested in how Apache Hadoop and Apache HBase-related technologies can address problems involving large-scale data in cost-effective ways, this book is for you. Using real-world stories and situations, authors Ted Dunning and Ellen Friedman show Hadoop newcomers and seasoned users alike how NoSQL databases and Hadoop can solve a variety of business and research issues. You’ll learn about early decisions and pre-planning that can make the process easier and more productive. If you’re already using these technologies, you’ll discover ways to gain the full range of benefits possible with Hadoop. While you don’t need a deep technical background to get started, this book does provide expert guidance to help managers, architects, and practitioners succeed with their Hadoop projects. Examine a day in the life of big data: India’s ambitious Aadhaar project Review tools in the Hadoop ecosystem such as Apache’s Spark, Storm, and Drill to learn how they can help you Pick up a collection of technical and strategic tips that have helped others succeed with Hadoop Learn from several prototypical Hadoop use cases, based on how organizations have actually applied the technology Explore real-world stories that reveal how MapR customers combine use cases when putting Hadoop and NoSQL to work, including in production Ted Dunning is Chief Applications Architect at MapR Technologies, and committer and PMC member of the Apache’s Drill, Storm, Mahout, and ZooKeeper projects. He is also mentor for Apache’s Datafu, Kylin, Zeppelin, Calcite, and Samoa projects. Ellen Friedman is a solutions consultant, speaker, and author, writing mainly about big data topics. She is a committer for the Apache Mahout project and a contributor to the Apache Drill project.

Hadoop: The Definitive Guide, 4th Edition

2015-04-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom White

Avro Hadoop Apache HBase HDFS Hive Parquet Spark Data Streaming data-engineering

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, youâ??ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. Youâ??ll learn about recent changes to Hadoop, and explore new case studies on Hadoopâ??s role in healthcare systems and genomics data processing. Learn fundamental components such as MapReduce, HDFS, and YARN Explore MapReduce in depth, including steps for developing applications with it Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN Learn two data formats: Avro for data serialization and Parquet for nested data Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer) Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop Learn the HBase distributed database and the ZooKeeper distributed configuration service

talk-data.com

Activity Trend

Top Events

Top Speakers

Current State of Big Data Use in Retail Supply Chains

Big Data

DS8870 Data Migration Techniques

PostgreSQL 9 Administration Cookbook - Second Edition

Hadoop Essentials

IBM z13 Configuration Setup

Modeling Food Processing Operations

Implementing the IBM System Storage SAN Volume Controller V7.4

Apache Solr Search Patterns

PostGIS in Action, Second Edition

IBM zPDT Guide and Reference: System z Personal Development Tool

IBM z13 Technical Guide

NoSQL for Mere Mortals®

The Security Data Lake

IBM z Systems Connectivity Handbook

Jump Start MySQL

Advanced Analytics with Spark

IBM GDPS Active/Active Overview and Planning

Real-World Hadoop

Hadoop: The Definitive Guide, 4th Edition