O'Reilly Data Engineering Books

Elasticsearch: The Definitive Guide

2015-01-28 O'Reilly Amazon

book

Zachary Tong , Clinton Gormley

data data-engineering search elasticsearch Analytics ELK

Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships.

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

2014-12-30 O'Reilly Amazon

book

Michael Frampton

data data-engineering Hadoop Analytics Avro Big Data

Many corporations are finding that the size of their data sets are outgrowing the capability of their systems to store and process them. The data is becoming too big to manage and use with traditional tools. The solution: implementing a big data system. As Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset shows, Apache Hadoop offers a scalable, fault-tolerant system for storing and processing data in parallel. It has a very rich toolset that allows for storage (Hadoop), configuration (YARN and ZooKeeper), collection (Nutch and Solr), processing (Storm, Pig, and Map Reduce), scheduling (Oozie), moving (Sqoop and Avro), monitoring (Chukwa, Ambari, and Hue), testing (Big Top), and analysis (Hive). The problem is that the Internet offers IT pros wading into big data many versions of the truth and some outright falsehoods born of ignorance. What is needed is a book just like this one: a wide-ranging but easily understood set of instructions to explain where to get Hadoop tools, what they can do, how to install them, how to configure them, how to integrate them, and how to use them successfully. And you need an expert who has worked in this area for a decade—someone just like author and big data expert Mike Frampton. Big Data Made Easy approaches the problem of managing massive data sets from a systems perspective, and it explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage. It explains, in an easily understood manner and through numerous examples, how to use each tool. The book also explains the sliding scale of tools available depending upon data size and when and how to use them. Big Data Made Easy shows developers and architects, as well as testers and project managers, how to: Store big data Configure big data Process big data Schedule processes Move data among SQL and NoSQL systems Monitor data Perform big data analytics Report on big data processes and projects Test big data systems Big Data Made Easy also explains the best part, which is that this toolset is free. Anyone can download it and—with the help of this book—start to use it within a day. With the skills this book will teach you under your belt, you will add value to your company or client immediately, not to mention your career.

Mastering Hadoop

2014-12-29 O'Reilly Amazon

book

Sandeep Karanth

data data-engineering Hadoop Analytics Cloud Computing HDFS

Embark on a journey to master Hadoop and its advanced features with this comprehensive book. "Mastering Hadoop" equips you with the knowledge needed to tackle complex data processing challenges and optimize your Hadoop workflows. With clear explanations and practical examples, this book is your guide to becoming proficient in leveraging Hadoop technologies. What this Book will help me do Optimize Hadoop MapReduce jobs, Pig scripts, and Hive queries for better performance. Understand and employ advanced data formats and Hadoop I/O techniques. Learn to integrate low-latency processing with Storm on YARN. Explore the cloud deployment of Hadoop and advanced HDFS alternatives. Enhance Hadoop security and master techniques for analytics using Hadoop. Author(s) None Karanth is an experienced Hadoop professional with years of expertise in data processing and distributed computing. With a practical and methodical approach, None has crafted this book to empower learners with the essentials and advanced features of Hadoop. None's focus on performance optimization and real-world applications helps bridge the gap between theory and practice. Who is it for? This book is ideal for data engineers and software developers familiar with the basics of Hadoop who seek to advance their understanding. If you aim to enhance Hadoop performance or adopt new features like YARN and Storm, this book is for you. Readers interested in Hadoop deployment, optimization, and newer capabilities will also greatly benefit. It's perfect for anyone aiming to become a Hadoop expert, from intermediate learners to advanced practitioners.

Big Data Now: 2014 Edition

2014-12-12 O'Reilly Amazon

book

O'Reilly Media, Inc.

data data-engineering AI/ML Analytics API Big Data

In the four years that O'Reilly Media, Inc. has produced its annual Big Data Now report, the data field has grown from infancy into young adulthood. Data is now a leader in some fields and a driver of innovation in others, and companies that use data and analytics to drive decision-making are outperforming their peers. And while access to big data tools and techniques once required significant expertise, today many tools have improved and communities have formed to share best practices. Companies have also started to emphasize the importance of processes, culture, and people. The topics in represent the major forces currently shaping the data world: Big Data Now: 2014 Edition Cognitive augmentation: predictive APIs, graph analytics, and Network Science dashboards Intelligence matters: defining AI, modeling intelligence, deep learning, and "summoning the demon" Cheap sensors, fast networks, and distributed computing: stream processing, hardware data flows, and computing at the edge Data (science) pipelines: broadening the coverage of analytic pipelines with specialized tools Evolving marketplace of big data components: SSDs, Hadoop 2, Spark; and why datacenters need operating systems Design and social science: human-centered design, wearables and real-time communications, and wearable etiquette Building a data culture: moving from prediction to real-time adaptation; and why you need to become a data skeptic Perils of big data: data redlining, intrusive data analysis, and the state of big data ethics

Data Architecture: A Primer for the Data Scientist

2014-11-26 O'Reilly Amazon

book

Daniel Linstedt , W. H. Inmon

data data-engineering Analytics Big Data DWH

Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can’t be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist. Drawing upon years of practical experience and using numerous examples and an easy to understand framework. W.H. Inmon, and Daniel Linstedt define the importance of data architecture and how it can be used effectively to harness big data within existing systems. You’ll be able to: Turn textual information into a form that can be analyzed by standard tools. Make the connection between analytics and Big Data Understand how Big Data fits within an existing systems environment Conduct analytics on repetitive and non-repetitive data Discusses the value in Big Data that is often overlooked, non-repetitive data, and why there is significant business value in using it Shows how to turn textual information into a form that can be analyzed by standard tools Explains how Big Data fits within an existing systems environment Presents new opportunities that are afforded by the advent of Big Data Demystifies the murky waters of repetitive and non-repetitive data in Big Data

Learning Hbase

2014-11-25 O'Reilly Amazon

book

Shashwat Shriparv

data data-engineering nosql-databases Apache HBase Analytics Big Data

In "Learning HBase", you'll dive deep into the core functionalities of Apache HBase and understand its applications in handling Big Data environments. By exploring both theoretical concepts and practical scenarios, you'll acquire the skills to set up, manage, and optimize HBase clusters. What this Book will help me do Understand and explain the components of the HBase ecosystem. Install and configure HBase clusters for optimized performance. Develop and maintain applications using HBase's structured storage model. Troubleshoot and resolve common issues in HBase deployments. Leverage Hadoop tools and advanced techniques to enhance HBase capabilities. Author(s) None Shriparv is a skilled technologist with a robust background in Big Data tools and application development. With hands-on expertise in distributed storage systems and data analytics, they lend exceptional insights into managing HBase environments. Their approach combines clarity, practicality, and a focus on real-world applicability. Who is it for? This book is ideal for system administrators and developers who are starting their journey in Big Data technology. With clear explanations and hands-on scenarios, it suits those seeking foundational and intermediate knowledge of the HBase ecosystem. Suitably designed, it helps students, early-career professionals, and mid-level technologists enhance their expertise. If you work in Big Data and want to grow your skill set in distributed storage systems, this book is for you.

IBM Software for SAP Solutions

2014-11-21 O'Reilly Amazon

book

Khirallah Birkler , Navneet Goyal , Peter Bahrs , Jorg Stolzenberg , Nick Norris , Michel Laaroussi , Michael Love , Bernd Eberhardt , Andrew Stalnecker , Derek Jennings , Stefan Momma , Manfred Oevers , Yaro Dunchych , Joe Kaczmarek , Martin Oberhofer , James Hunter , Paul Pacholski , Pierre Valiquette

data data-engineering SAP Analytics BI Data Management

SAP is a market leader in enterprise business application software. SAP solutions provide a rich set of composable application modules, and configurable functional capabilities that are expected from a comprehensive enterprise business application software suite. In most cases, companies that adopt SAP software remain heterogeneous enterprises running both SAP and non-SAP systems to support their business processes. Regardless of the specific scenario, in heterogeneous enterprises most SAP implementations must be integrated with a variety of non-SAP enterprise systems: Portals Messaging infrastructure Business process management (BPM) tools Enterprise Content Management (ECM) methods and tools Business analytics (BA) and business intelligence (BI) technologies Security Systems of record Systems of engagement When SAP software is used in a large, heterogeneous enterprise environment, SAP clients face the dilemma of selecting the correct set of tools and platforms to implement SAP functionality, and to integrate the SAP solutions with non-SAP systems. This IBM® Redbooks® publication explains the value of integrating IBM software with SAP solutions. It describes how to enhance and extend pre-built capabilities in SAP software with best-in-class IBM enterprise software, enabling clients to maximize return on investment (ROI) in their SAP investment and achieve a balanced enterprise architecture approach. This book describes IBM Reference Architecture for SAP, a prescriptive blueprint for using IBM software in SAP solutions. The reference architecture is focused on defining the use of IBM software with SAP, and is not intended to address the internal aspects of SAP components. The chapters of this book provide a specific reference architecture for many of the architectural domains that are each important for a large enterprise to establish common strategy, efficiency, and balance. The majority of the most important architectural domain topics, such as integration, process optimization, master data management, mobile access, Enterprise Content Management, business intelligence, DevOps, security, systems monitoring, and so on, are covered in the book. However, there are several other architectural domains which are not included in the book. This is not to imply that these other architectural domains are not important or are less important, or that IBM does not offer a solution to address them. It is only reflective of time constraints, available resources, and the complexity of assembling a book on an extremely broad topic. Although more content could have been added, the authors feel confident that the scope of architectural material that has been included should provide organizations with a fantastic head start in defining their own enterprise reference architecture for many of the important architectural domains, and it is hoped that this book provides great value to those reading it. This IBM Redbooks publication is targeted to the following audiences: Client decision makers and solution architects leading enterprise transformation projects and wanting to gain further insight so that they can benefit from the integration of IBM software in large-scale SAP projects. IT architects and consultants integrating IBM technology with SAP solutions.

Data Fluency: Empowering Your Organization with Effective Data Communication

2014-11-03 O'Reilly Amazon

book

Zach Gemignani , Patrick Schuermann , Chris Gemignani , Richard Galentino

data data-engineering Analytics BI Dashboard DataViz

A dream come true for those looking to improve their data fluency Analytical data is a powerful tool for growing companies, but what good is it if it hides in the shadows? Bring your data to the forefront with effective visualization and communication approaches, and let Data Fluency: Empowering Your Organization with Effective Communication show you the best tools and strategies for getting the job done right. Learn the best practices of data presentation and the ways that reporting and dashboards can help organizations effectively gauge performance, identify areas for improvement, and communicate results. Topics covered in the book include data reporting and communication, audience and user needs, data presentation tools, layout and styling, and common design failures. Those responsible for analytics, reporting, or BI implementation will find a refreshing take on data and visualization in this resource, as will report, data visualization, and dashboard designers. Conquer the challenge of making valuable data approachable and easy to understand Develop unique skills required to shape data to the needs of different audiences Full color book links to bonus content at juiceanalytics.com Written by well-known and highly esteemed authors in the data presentation community Data Fluency: Empowering Your Organization with Effective Communication focuses on user experience, making reports approachable, and presenting data in a compelling, inspiring way. The book helps to dissolve the disconnect between your data and those who might use it and can help make an impact on the people who are most affected by data. Use Data Fluency today to develop the skills necessary to turn data into effective displays for decision-making.

Building IBM Enterprise Content Management Solutions From End to End

2014-10-22 O'Reilly Amazon

book

Josemina Magdalen , Wei-Dong Zhu , Johnson Liu , Yuki Makino , Isuru Fernando , Abraruddin Khan , Ben Davies , Sven Hapke , Blair Groff , Mike Prentice

data data-engineering IBM Analytics

IBM® Enterprise Content Management (ECM) solutions provide efficient and effective ways to capture content, manage the content and business processes, discover insights from the content, and derive actions to improve business processes, products, and services. This IBM Redbooks® publication introduces and highlights some of the IBM ECM products that can be implemented and integrated together to create end-to-end ECM solutions: IBM Case Manager IBM Datacap IBM Content Manager OnDemand IBM Enterprise Records IBM Watson™ Content Analytics IBM Content Classification Not all of the products are required to be integrated into an ECM solution. Depending on your business requirements, you can choose a subset of these products to be built into your ECM solutions. This book serves as a hands-on learning guide for information technology (IT) specialists who plan to build ECM solutions from end-to-end, for a proof of concept (PoC) environment, or for a proof of technology environment. For implementing a production-strength ECM solution, also refer to IBM Knowledge Center, IBM Redbooks publications, and IBM Software Services.

IBM Tivoli Storage Productivity Center V5.2 Release Guide

2014-10-19 O'Reilly Amazon

book

Karen Orlando , Mary Lovelace , Paolo D'Angelo , Brian De Guia , Curtis Neal , Christian Sonder , Markus Standau

data data-engineering IBM ibm-tivoli Analytics Cloud Computing

IBM® Tivoli® Storage Productivity Center V5.2 is a feature-rich storage management software suite. The integrated suite provides detailed monitoring, reporting, and management within a single console. In addition, implementing the IBM SmartCloud® Virtual Storage Center (VSC) license with Tivoli Storage Productivity Center addresses new workloads that require massive scale and rapid pace, and accelerates business insight, by adding advanced analytics functions such as storage optimization, provisioning, and transformation. This IBM Redbooks® publication is intended for storage administrators and users who are installing and using the features and functions in IBM Tivoli Storage Productivity Center V5.2. The information in this Redbooks publication can be used to plan for, install, and customize the components of Tivoli Storage Productivity Center in your storage infrastructure. Note: This IBM Redbooks publication is written and based on Tivoli Storage Productivity Center V5.2.2. Sections in this book that pertain to advanced analytics, including cloud configuration, provisioning, transforming volumes, and storage optimization all require the IBM SmartCloud Virtual Storage Center license to be installed.

Architecting and Deploying DB2 with BLU Acceleration

2014-10-13 O'Reilly Amazon

book

Martin Schlegel , Ayesha Zaka , Brigitte Blaser , Whei-Jen Chen , Polly Lau , Marco Bonezzi , Alexander Zietlow , Jean Cristie Pacanaro

data data-engineering relational-databases ibm-db2 Analytics Cognos

IBM® DB2® with BLU Acceleration is a revolutionary technology that is delivered in DB2 for Linux, UNIX, and Windows Release 10.5. BLU Acceleration delivers breakthrough performance improvements for analytic queries by using dynamic in-memory columnar technologies. Different from other vendor solutions, BLU Acceleration allows the unified computing of OLTP and analytics data inside a single database, therefore, removing barriers and accelerating results for users. With observed hundredfold improvement in query response time, BLU Acceleration provides a simple, fast, and easy-to-use solution for the needs of today's organizations; quick access to business answers can be used to gain a competitive edge, lower costs, and more. This IBM Redbooks® publication introduces the concepts of DB2 with BLU Acceleration. It discusses the steps to move from a relational database to using BLU Acceleration, optimizing BLU usage, and deploying BLU into existing analytic solutions today, with an example of IBM Cognos®. This book also describes integration of DB2 with BLU Acceleration into SAP Business Warehouse (SAP BW) and SAP's near-line storage solution on DB2. This publication is intended to be helpful to a wide-ranging audience, including those readers who want to understand the technologies and those who have planning, deployment, and support responsibilities.

Hadoop in Practice, Second Edition

2014-09-29 O'Reilly Amazon

book

Alex Holmes

data data-engineering Hadoop AI/ML Analytics Big Data

Hadoop in Practice, Second Edition provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. Brand new chapters cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You'll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently. In short, this is the most practical, up-to-date coverage of Hadoop available anywhere About the Technology About the Book It's always a good time to upgrade your Hadoop skills! Hadoop in Practice, Second Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You'll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available. Readers need to know a programming language like Java and have basic familiarity with Hadoop. What's Inside Thoroughly updated for Hadoop 2 How to write YARN applications Integrate real-time technologies like Storm, Impala, and Spark Predictive analytics using Mahout and RR About the Reader About the Author Alex Holmes works on tough big-data problems. He is a software engineer, author, speaker, and blogger specializing in large-scale Hadoop projects. Quotes Very insightful. A deep dive into the Hadoop world. - Andrea Tarocchi, Red Hat, Inc. The most complete material on Hadoop and its ecosystem known to mankind! - Arthur Zubarev, Vital Insights Clear and concise, full of insights and highly applicable information. - Edward de Oliveira Ribeiro, DataStax, Inc. Comprehensive up-to-date coverage of Hadoop 2. - Muthusamy Manigandan, OzoneMedia

Getting Started with Impala

2014-09-25 O'Reilly Amazon

book

John Russell

data data-engineering Hadoop impala Analytics Big Data

Learn how to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala—the massively parallel processing SQL query engine for Apache Hadoop. The best practices in this practical guide help you design database schemas that not only interoperate with other Hadoop components, and are convenient for administers to manage and monitor, but also accommodate future expansion in data size and evolution of software capabilities. Written by John Russell, documentation lead for the Cloudera Impala project, this book gets you working with the most recent Impala releases quickly. Ideal for database developers and business analysts, the latest revision covers analytics functions, complex types, incremental statistics, subqueries, and submission to the Apache incubator. Getting Started with Impala includes advice from Cloudera’s development team, as well as insights from its consulting engagements with customers. Learn how Impala integrates with a wide range of Hadoop components Attain high performance and scalability for huge data sets on production clusters Explore common developer tasks, such as porting code to Impala and optimizing performance Use tutorials for working with billion-row tables, date- and time-based values, and other techniques Learn how to transition from rigid schemas to a flexible model that evolves as needs change Take a deep dive into joins and the roles of statistics

Master Competitive Analytics with Oracle Endeca Information Discovery

2014-09-24 O'Reilly Amazon

book

William Smith , Helen Sun

data data-engineering oracle-database-solutions Analytics Big Data Oracle

Oracle Endeca Information Discovery Best Practices Maximize the powerful capabilities of this self-service enterprise data discovery platform. Master Competitive Analytics with Oracle Endeca Information Discovery reveals how to unlock insights from any type of data, regardless of structure. The first part of the book is a complete technical guide to the product's architecture, components, and implementation. The second part presents a comprehensive collection of business analytics use cases in various industries, including financial services, healthcare, research, manufacturing, retail, consumer packaged goods, and public sector. Step-by-step instructions on implementing some of these use cases are included in this Oracle Press book. Install and manage Oracle Endeca Server Design Oracle Endeca Information Discovery Studio visualizations to facilitate user-driven data exploration and discovery Enable enterprise-driven data exploration with Oracle Endeca Information Discovery Integrator Develop and implement a fraud detection and analysis application Build a healthcare correlation application that integrates claims, patient, and operations analysis; partners; clinical research; and remote monitoring Use an enterprise architecture approach to incrementally establish big data and analytical capabilities

Reliability and Performance with IBM DB2 Analytics Accelerator V4.1

2014-09-24 O'Reilly Amazon

book

Steve Speller , Ravi Kumar , Anna Griner , Paolo Bruni , Ruiping Li , James Guo , Andy Perkins , Jason Arnold , Jeff Feinsmith , Leticia Cruz , Dino Tonelli , Jonathan Sloan , Chris Harlander , Willie Favero , Johannes Kern

data data-engineering relational-databases ibm-db2 Analytics BI

The IBM® DB2® Analytics Accelerator for IBM z/OS® is a high-performance appliance that integrates the IBM zEnterprise® infrastructure with IBM PureData™ for Analytics, powered by IBM Netezza® technology. With this integration, you can accelerate data-intensive and complex queries in a DB2 for z/OS highly secure and available environment. DB2 and the Analytics Accelerator appliance form a self-managing hybrid environment running online transaction processing and online transactional analytical processing concurrently and efficiently. These online transactions run together with business intelligence and online analytic processing workloads. DB2 Analytics Accelerator V4.1 expands the value of high-performance analytics. DB2 Analytics Accelerator V4.1 opens to static Structured Query Language (SQL) applications and row set processing, minimizes data movement, reduces latency, and improves availability. This IBM Redbooks® publication provides technical decision-makers with an understanding of the benefits of version 4.1 of the Analytics Accelerator with DB2 11 for z/OS. It describes the installation of the new functions, and the advantages to existing analytical processes as measured in our test environment. This book also introduces the DB2 Analytics Accelerator Loader V1.1, a tool that facilitates the data population of the DB2 Analytics Accelerator.

Predictive Analytics Using Oracle Data Miner

2014-08-08 O'Reilly Amazon

book

Brendan Tierney

data data-engineering oracle-database-solutions Analytics BI Oracle

Build Next-Generation In-Database Predictive Analytics Applications with Oracle Data Miner “If you have an Oracle Database and want to leverage that data to discover new insights, make predictions, and generate actionable insights, this book is a must read for you! In Predictive Analytics Using Oracle Data Miner: Develop & Use Oracle Data Mining Models in Oracle Data Miner, SQL & PL/SQL, Brendan Tierney, Oracle ACE Director and data mining expert, guides you through the basic concepts of data mining and offers step-by-step instructions for solving data-driven problems using SQL Developer’s Oracle Data Mining extension. Brendan takes it full circle by showing you how to deploy advanced analytical methodologies and predictive models immediately into enterprise-wide production environments using the in-database SQL and PL/SQL functionality. Definitely a must read for any Oracle data professional!” --Charlie Berger, Senior Director Product Management, Oracle Data Mining and Advanced Analytics Perform in-database data mining to unlock hidden insights in data. Written by an Oracle ACE Director, Predictive Analytics Using Oracle Data Miner shows you how to use this powerful tool to create and deploy advanced data mining models. Covering topics for the data scientist, Oracle developer, and Oracle database administrator, this Oracle Press guide shows you how to get started with Oracle Data Miner and build Oracle Data Miner models using SQL and PL/SQL packages. You'll get best practices for integrating your Oracle Data Miner models into applications to automate the discovery and distribution of business intelligence predictions throughout the enterprise. Install and configure Oracle Data Miner for Oracle Database 11 g Release 11.2 and Oracle Database 12 c Create Oracle Data Miner projects and workflows Prepare data for data mining Develop data mining models using association rule analysis, classification, clustering, regression, and anomaly detection Use data dictionary views and prepare your data using in-database transformations Build and use data mining models using SQL and PL/SQL packages Migrate your Oracle Data Miner models, integrate them into dashboards and applications, and run them in parallel Build transient data mining models with the Predictive Queries feature in Oracle Database 12 c

Large Scale and Big Data

2014-06-25 O'Reilly Amazon

book

Sherif Sakr , Mohamed Gaber

data data-engineering AI/ML Analytics Big Data Cloud Computing

Large Scale and Big Data: Processing and Management provides readers with a central source of reference on the data management techniques currently available for large-scale data processing. Presenting chapters written by leading researchers, academics, and practitioners, it addresses the fundamental challenges associated with Big Data processing tools and techniques across a range of computing environments. The book begins by discussing the basic concepts and tools of large-scale Big Data processing and cloud computing. It also provides an overview of different programming models and cloud-based deployment models. The book’s second section examines the usage of advanced Big Data processing techniques in different domains, including semantic web, graph processing, and stream processing. The third section discusses advanced topics of Big Data processing such as consistency management, privacy, and security. Supplying a comprehensive summary from both the research and applied perspectives, the book covers recent research discoveries and applications, making it an ideal reference for a wide range of audiences, including researchers and academics working on databases, data mining, and web scale data processing. After reading this book, you will gain a fundamental understanding of how to use Big Data-processing tools and techniques effectively across application domains. Coverage includes cloud data management architectures, big data analytics visualization, data management, analytics for vast amounts of unstructured data, clustering, classification, link analysis of big data, scalable data mining, and machine learning techniques.

Architecting and Deploying IBM DB2 with BLU Acceleration in Your Analytical Environment

2014-06-19 O'Reilly Amazon

book

Kushal Munir , Martin Schlegel , Brigitte Blaser , Whei-Jen Chen , Polly Lau , Alexander Zietlow , Aidan Craddock , Cong Lin

data data-engineering relational-databases ibm-db2 Analytics Cognos

IBM® DB2® with BLU Acceleration is a revolutionary technology that is delivered in DB2 for Linux, UNIX, and Windows Release 10.5. BLU Acceleration delivers breakthrough performance improvements for analytic queries by using dynamic in-memory columnar technologies. Different from other vendor solutions, BLU Acceleration allows the unified computing of online transaction processing (OLTP) and analytics data inside a single database, therefore, removing barriers and accelerating results for users. With observed hundredfold improvement in query response time, BLU Acceleration provides a simple, fast, and easy-to-use solution for the needs of today's organizations; quick access to business answers can be used to gain a competitive edge, lower costs, and more. This IBM Redbooks® publication introduces the concepts of DB2 with BLU Acceleration. It discusses the steps to move from a relational database to using BLU Acceleration, optimizing BLU usage, and deploying BLU into existing analytic solutions today, with an example of IBM Cognos®. This book also describes integration of DB2 with BLU Acceleration into SAP Business Warehouse (SAP BW) and SAP's near-line storage solution on DB2. This publication is intended to be helpful to a wide-ranging audience, including those readers who want to understand the technologies and readers who have planning, deployment, and support responsibilities.

Google BigQuery Analytics

2014-06-09 O'Reilly Amazon

book

Siddartha Naidu , Jordan Tigani

data data-engineering google-bigquery Analytics API BigQuery

How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results. Features a companion website that includes all code and data sets from the book Uses real-world examples to explain everything analysts need to know to effectively use BigQuery Includes web application examples coded in Python

Advanced Case Management with IBM Case Manager

2014-06-02 O'Reilly Amazon

book

Brian Benoit , Wei-Dong Zhu , Johnson Liu , Juan Felipe Ospina , Mike Marin , Seema Meena , Guillermo Rios , Bob Jackson

data data-engineering IBM Analytics

Organizations face case management challenges that require insight, responsiveness, and collaboration. IBM® Case Manager, Version 5.2, is an advanced case management product that unites information, process, and people to provide the 360-degree view of case information and achieve optimized outcomes. With IBM Case Manager, knowledge workers can extract critical case information through integrated business rules, collaboration, and analytics. This easy access to information enhances decision-making ability and leads to more successful case outcomes. IBM Case Manager also helps capture industry preferred practices in frameworks and templates to empower business users and accelerate return on investment. This IBM Redbooks® publication introduces the case management concept. It includes the reason for and benefits of case management, and why it is different from the traditional business process management or content management. In addition, this book addresses how you can design and build a case management solution with IBM Case Manager and integrate that solution with external products and components. This book is intended to provide IT architects and IT specialists with the high-level concepts of case management and the capabilities of IBM Case Manager. It also serves as a practical guide for IT professionals who are responsible for designing, building, customizing, and deploying IBM Case Manager solutions.

IBM MobileFirst Strategy Software Approach

2014-05-08 O'Reilly Amazon

book

Tony Liew , Sundaragopal Venkatraman , Tony Duong , Benjamin Koehler , Colin Mower

data data-engineering IBM Analytics Cyber Security

IBM® MobileFirst enables an enterprise to support a mobile strategy. With this end-to-end solution, IBM makes it possible for an enterprise to benefit from mobile interactions with customers, with business partners, and in organizations. There are products available from the IBM MobileFirst solution to support management, security, analytics, and development of the application and data platforms in a mobile environment. This IBM Redbooks® publication explores four areas crucial to developing a mobile strategy: Application development Mobile quality management Mobile device management Mobile analytics This IBM Redbooks publication provides an in-depth look at IBM Worklight®, IBM Rational® Test Workbench, IBM Endpoint Manager for Mobile Devices, and IBM Tealeaf® CX Mobile. This book is of interest to architects looking to design mobile enterprise solutions, and to practitioners looking to build these solutions. Related blog post 5 Things To Know About IBM MobileFirst

Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

2014-05-07 O'Reilly Amazon

book

Vijay Srinivas Agneeswaran Ph.D

data data-engineering Hadoop AI/ML Analytics Big Data

Master alternative Big Data technologies that can do what Hadoop can't: real-time analytics and iterative machine learning. When most technical professionals think of Big Data analytics today, they think of Hadoop. But there are many cutting-edge applications that Hadoop isn't well suited for, especially real-time analytics and contexts requiring the use of iterative machine learning algorithms. Fortunately, several powerful new technologies have been developed specifically for use cases such as these. Big Data Analytics Beyond Hadoop is the first guide specifically designed to help you take the next steps beyond Hadoop. Dr. Vijay Srinivas Agneeswaran introduces the breakthrough Berkeley Data Analysis Stack (BDAS) in detail, including its motivation, design, architecture, Mesos cluster management, performance, and more. He presents realistic use cases and up-to-date example code for: Spark, the next generation in-memory computing technology from UC Berkeley Storm, the parallel real-time Big Data analytics technology from Twitter GraphLab, the next-generation graph processing paradigm from CMU and the University of Washington (with comparisons to alternatives such as Pregel and Piccolo) Halo also offers architectural and design guidance and code sketches for scaling machine learning algorithms to Big Data, and then realizing them in real-time. He concludes by previewing emerging trends, including real-time video analytics, SDNs, and even Big Data governance, security, and privacy issues. He identifies intriguing startups and new research possibilities, including BDAS extensions and cutting-edge model-driven analytics. Big Data Analytics Beyond Hadoop is an indispensable resource for everyone who wants to reach the cutting edge of Big Data analytics, and stay there: practitioners, architects, programmers, data scientists, researchers, startup entrepreneurs, and advanced students.

Pig Design Patterns

2014-04-17 O'Reilly Amazon

book

Pradeep Pasupuleti

data data-engineering Hadoop pig Analytics Big Data

Discover how to simplify Hadoop programming with Pig Design Patterns, helping you create innovative enterprise-level big data solutions. This book takes you step-by-step through practical design patterns for creating efficient data processing workflows with Apache Pig. What this Book will help me do Understand and implement fundamental data processing patterns with Pig. Master advanced Pig techniques for Big Data analytics. Learn to optimize Pig scripts for performance and scalability. Build end-to-end data processing solutions with real-world examples. Integrate Pig workflows into the broader Hadoop ecosystem. Author(s) Pradeep Pasupuleti is an experienced data engineer and software developer specializing in Big Data technologies. With extensive expertise in Hadoop and Pig, Pradeep shares valuable insights and practical techniques beginners and experts alike will appreciate. Who is it for? This book is perfect for software developers and data engineers working with Hadoop who want to streamline their workflow. It is ideal for professionals already familiar with Pig and Hadoop basics looking to advance. It also suits learners aiming to implement optimized data solutions effectively.

Hadoop For Dummies

2014-04-14 O'Reilly Amazon

book

Dirk deRoos

data data-engineering Hadoop Analytics Big Data Data Science

Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters. Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.

Solr in Action

2014-03-25 O'Reilly Amazon

book

Trey Grainger , Timothy Potter

data data-engineering search solr Analytics Big Data

Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities. About the Technology About the Book Whether you're handling big (or small) data, managing documents, or building a website, it is important to be able to quickly search through your content and discover meaning in it. Apache Solr is your tool: a ready-to-deploy, Lucene-based, open source, full-text search engine. Solr can scale across many servers to enable real-time queries and data analytics across billions of documents. Solr in Action teaches you to implement scalable search using Apache Solr. This easy-to-read guide balances conceptual discussions with practical examples to show you how to implement all of Solr's core capabilities. You'll master topics like text analysis, faceted search, hit highlighting, result grouping, query suggestions, multilingual search, advanced geospatial and data operations, and relevancy tuning. What's Inside How to scale Solr for big data Rich real-world examples Solr as a NoSQL data store Advanced multilingual, data, and relevancy tricks Coverage of versions through Solr 4.7 About the Reader This book assumes basic knowledge of Java and standard database technology. No prior knowledge of Solr or Lucene is required. About the Authors Trey Grainger is a director of engineering at CareerBuilder. Timothy Potter is a senior member of the engineering team at LucidWorks. The authors work on the scalability and reliability of Solr, as well as on recommendation engine and big data analytics technologies. Quotes The knowledge and techniques you need. - From the Foreword by Yonik Seeley, Creator of Solr Readable and immediately applicable ... an excellent book. - John Viviano, InterCorp, Inc. The go-to guide for Solr ... a definitive resource for both beginners and experts. - Scott Anthony, Business Instruments A well-dosed combination of deep technical knowledge and real-world experience. - Alexandre Madurell, Piksel, Inc.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Elasticsearch: The Definitive Guide

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Mastering Hadoop

Big Data Now: 2014 Edition

Data Architecture: A Primer for the Data Scientist

Learning Hbase

IBM Software for SAP Solutions

Data Fluency: Empowering Your Organization with Effective Data Communication

Building IBM Enterprise Content Management Solutions From End to End

IBM Tivoli Storage Productivity Center V5.2 Release Guide

Architecting and Deploying DB2 with BLU Acceleration

Hadoop in Practice, Second Edition

Getting Started with Impala

Master Competitive Analytics with Oracle Endeca Information Discovery

Reliability and Performance with IBM DB2 Analytics Accelerator V4.1

Predictive Analytics Using Oracle Data Miner

Large Scale and Big Data

Architecting and Deploying IBM DB2 with BLU Acceleration in Your Analytical Environment

Google BigQuery Analytics

Advanced Case Management with IBM Case Manager

IBM MobileFirst Strategy Software Approach

Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

Pig Design Patterns

Hadoop For Dummies

Solr in Action