Analytics

NoSQL For Dummies

2015-02-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Adam Fowler

Big Data Cassandra Data Analytics Hadoop MongoDB Neo4j NoSQL RDBMS data data-engineering nosql-databases

Get up to speed on the nuances of NoSQL databases and what they mean for your organization This easy to read guide to NoSQL databases provides the type of no-nonsense overview and analysis that you need to learn, including what NoSQL is and which database is right for you. Featuring specific evaluation criteria for NoSQL databases, along with a look into the pros and cons of the most popular options, NoSQL For Dummies provides the fastest and easiest way to dive into the details of this incredible technology. You'll gain an understanding of how to use NoSQL databases for mission-critical enterprise architectures and projects, and real-world examples reinforce the primary points to create an action-oriented resource for IT pros. If you're planning a big data project or platform, you probably already know you need to select a NoSQL database to complete your architecture. But with options flooding the market and updates and add-ons coming at a rapid pace, determining what you require now, and in the future, can be a tall task. This is where NoSQL For Dummies comes in! Learn the basic tenets of NoSQL databases and why they have come to the forefront as data has outpaced the capabilities of relational databases Discover major players among NoSQL databases, including Cassandra, MongoDB, MarkLogic, Neo4J, and others Get an in-depth look at the benefits and disadvantages of the wide variety of NoSQL database options Explore the needs of your organization as they relate to the capabilities of specific NoSQL databases Big data and Hadoop get all the attention, but when it comes down to it, NoSQL databases are the engines that power many big data analytics initiatives. With NoSQL For Dummies, you'll go beyond relational databases to ramp up your enterprise's data architecture in no time.

Learning Spark

2015-02-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Konwinski (Databricks) , Holden Karau (Fight Health Insurance) , Matei Zaharia (Databricks) , Patrick Wendell (Databricks)

API Data Analytics Java Python Scala Spark SQL Data Streaming apache-spark data data-engineering

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Data: Emerging Trends and Technologies

2015-02-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alistair Croll

AI/ML Big Data Cloud Computing Hadoop data data-engineering

What are the emerging trends and technologies that will transform the data landscape in coming months? In this report from Strata + Hadoop World co-chair Alistair Croll, you'll learn how the ubiquity of cheap sensors, fast networks, and distributed computing have given rise to several developments that will soon have a profound effect on individuals and society as a whole. Machine learning, for example, has quickly moved from lab tool to hosted, pay-as-you-go services in the cloud. Those services, in turn, are leading to predictive apps that will provide individuals with the right functionality and content at the right time by continuously learning about them and predicting what they'll need. Computational power can produce cognitive augmentation. Report topics include: The swing between centralized and distributed computing Machine learning as a service Personal digital assistants and cognitive augmentation Graph databases and analytics Regulating complex algorithms The pace of real-time data and automation Solving dire problems with big data Implications of having sensors everywhere This report contains many more examples of how big data is starting to reshape business and change behavior, and it's just a small sample of the in-depth information Strata + Hadoop World provides. Pick up this report and make plans to attend one of several Strata + Hadoop World conferences in the San Francisco Bay Area, London, and New York.

Big Data Analytics

2015-02-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kim H. Pries , Robert Dunnigan

AI/ML Big Data Data Analytics GIS Hadoop Oracle data data-engineering

With this book, managers and decision makers are given the tools to make more informed decisions about big data purchasing initiatives. Big Data Analytics: A Practical Guide for Managers not only supplies descriptions of common tools, but also surveys the various products and vendors that supply the big data market. Comparing and contrasting the different types of analysis commonly conducted with big data, this accessible reference presents clear-cut explanations of the general workings of big data tools. Instead of spending time on HOW to install specific packages, it focuses on the reasons WHY readers would install a given package. The book provides authoritative guidance on a range of tools, including open source and proprietary systems. It details the strengths and weaknesses of incorporating big data analysis into decision-making and explains how to leverage the strengths while mitigating the weaknesses. Describes the benefits of distributed computing in simple terms Includes substantial vendor/tool material, especially for open source decisions Covers prominent software packages, including Hadoop and Oracle Endeca Examines GIS and machine learning applications Considers privacy and surveillance issues The book further explores basic statistical concepts that, when misapplied, can be the source of errors. Time and again, big data is treated as an oracle that discovers results nobody would have imagined. While big data can serve this valuable function, all too often these results are incorrect, yet are still reported unquestioningly. The probability of having erroneous results increases as a larger number of variables are compared unless preventative measures are taken. The approach taken by the authors is to explain these concepts so managers can ask better questions of their analysts and vendors as to the appropriateness of the methods used to arrive at a conclusion. Because the world of science and medicine has been grappling with similar issues in the publication of studies, the authors draw on their efforts and apply them to big data.

ElasticSearch Cookbook - Second Edition

2015-01-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alberto Paro

Big Data Cloud Computing ELK Java JSON Python data data-engineering elasticsearch search

The "ElasticSearch Cookbook - Second Edition" is a hands-on guide featuring over 130 advanced recipes to help you harness the power of ElasticSearch, a leading search and analytics engine. Through insightful examples and practical guidance, you'll learn to implement efficient search solutions, optimize queries, and manage ElasticSearch clusters effectively. What this Book will help me do Design and configure ElasticSearch topologies optimized for your specific deployment needs. Develop and utilize custom mappings to optimize your data indexes. Execute advanced queries and filters to refine and retrieve search results effectively. Set up and monitor ElasticSearch clusters for optimal performance. Extend ElasticSearch capabilities through plugin development and integrations using Java and Python. Author(s) Alberto Paro is a technology expert with years of experience working with ElasticSearch, Big Data solutions, and scalable cloud architecture. He has authored multiple books and technical articles on ElasticSearch, leveraging his extensive knowledge to provide practical insights. His approachable and detail-oriented style makes complex concepts accessible to technical professionals. Who is it for? This book is best suited for software developers and IT professionals looking to use ElasticSearch in their projects. Readers should be familiar with JSON, as well as basic programming skills in Java. It is ideal for those who have an understanding of search applications and want to deepen their expertise. Whether you're integrating ElasticSearch into a web application or optimizing your system's search capabilities, this book will provide the skills and knowledge you need.

Elasticsearch: The Definitive Guide

2015-01-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Zachary Tong , Clinton Gormley

ELK data data-engineering elasticsearch search

Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships.

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

2014-12-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Frampton

Avro Big Data Data Analytics Hadoop Hive NoSQL SQL data data-engineering

Many corporations are finding that the size of their data sets are outgrowing the capability of their systems to store and process them. The data is becoming too big to manage and use with traditional tools. The solution: implementing a big data system. As Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset shows, Apache Hadoop offers a scalable, fault-tolerant system for storing and processing data in parallel. It has a very rich toolset that allows for storage (Hadoop), configuration (YARN and ZooKeeper), collection (Nutch and Solr), processing (Storm, Pig, and Map Reduce), scheduling (Oozie), moving (Sqoop and Avro), monitoring (Chukwa, Ambari, and Hue), testing (Big Top), and analysis (Hive). The problem is that the Internet offers IT pros wading into big data many versions of the truth and some outright falsehoods born of ignorance. What is needed is a book just like this one: a wide-ranging but easily understood set of instructions to explain where to get Hadoop tools, what they can do, how to install them, how to configure them, how to integrate them, and how to use them successfully. And you need an expert who has worked in this area for a decade—someone just like author and big data expert Mike Frampton. Big Data Made Easy approaches the problem of managing massive data sets from a systems perspective, and it explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage. It explains, in an easily understood manner and through numerous examples, how to use each tool. The book also explains the sliding scale of tools available depending upon data size and when and how to use them. Big Data Made Easy shows developers and architects, as well as testers and project managers, how to: Store big data Configure big data Process big data Schedule processes Move data among SQL and NoSQL systems Monitor data Perform big data analytics Report on big data processes and projects Test big data systems Big Data Made Easy also explains the best part, which is that this toolset is free. Anyone can download it and—with the help of this book—start to use it within a day. With the skills this book will teach you under your belt, you will add value to your company or client immediately, not to mention your career.

Mastering Hadoop

2014-12-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sandeep Karanth

Cloud Computing Hadoop HDFS Hive Cyber Security data data-engineering

Embark on a journey to master Hadoop and its advanced features with this comprehensive book. "Mastering Hadoop" equips you with the knowledge needed to tackle complex data processing challenges and optimize your Hadoop workflows. With clear explanations and practical examples, this book is your guide to becoming proficient in leveraging Hadoop technologies. What this Book will help me do Optimize Hadoop MapReduce jobs, Pig scripts, and Hive queries for better performance. Understand and employ advanced data formats and Hadoop I/O techniques. Learn to integrate low-latency processing with Storm on YARN. Explore the cloud deployment of Hadoop and advanced HDFS alternatives. Enhance Hadoop security and master techniques for analytics using Hadoop. Author(s) None Karanth is an experienced Hadoop professional with years of expertise in data processing and distributed computing. With a practical and methodical approach, None has crafted this book to empower learners with the essentials and advanced features of Hadoop. None's focus on performance optimization and real-world applications helps bridge the gap between theory and practice. Who is it for? This book is ideal for data engineers and software developers familiar with the basics of Hadoop who seek to advance their understanding. If you aim to enhance Hadoop performance or adopt new features like YARN and Storm, this book is for you. Readers interested in Hadoop deployment, optimization, and newer capabilities will also greatly benefit. It's perfect for anyone aiming to become a Hadoop expert, from intermediate learners to advanced practitioners.

Big Data Now: 2014 Edition

2014-12-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by O'Reilly Media, Inc.

AI/ML API Big Data Hadoop Spark data data-engineering

In the four years that O'Reilly Media, Inc. has produced its annual Big Data Now report, the data field has grown from infancy into young adulthood. Data is now a leader in some fields and a driver of innovation in others, and companies that use data and analytics to drive decision-making are outperforming their peers. And while access to big data tools and techniques once required significant expertise, today many tools have improved and communities have formed to share best practices. Companies have also started to emphasize the importance of processes, culture, and people. The topics in represent the major forces currently shaping the data world: Big Data Now: 2014 Edition Cognitive augmentation: predictive APIs, graph analytics, and Network Science dashboards Intelligence matters: defining AI, modeling intelligence, deep learning, and "summoning the demon" Cheap sensors, fast networks, and distributed computing: stream processing, hardware data flows, and computing at the edge Data (science) pipelines: broadening the coverage of analytic pipelines with specialized tools Evolving marketplace of big data components: SSDs, Hadoop 2, Spark; and why datacenters need operating systems Design and social science: human-centered design, wearables and real-time communications, and wearable etiquette Building a data culture: moving from prediction to real-time adaptation; and why you need to become a data skeptic Perils of big data: data redlining, intrusive data analysis, and the state of big data ethics

Data Architecture: A Primer for the Data Scientist

2014-11-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Daniel Linstedt , W. H. Inmon

Big Data DWH data data-engineering

Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can’t be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist. Drawing upon years of practical experience and using numerous examples and an easy to understand framework. W.H. Inmon, and Daniel Linstedt define the importance of data architecture and how it can be used effectively to harness big data within existing systems. You’ll be able to: Turn textual information into a form that can be analyzed by standard tools. Make the connection between analytics and Big Data Understand how Big Data fits within an existing systems environment Conduct analytics on repetitive and non-repetitive data Discusses the value in Big Data that is often overlooked, non-repetitive data, and why there is significant business value in using it Shows how to turn textual information into a form that can be analyzed by standard tools Explains how Big Data fits within an existing systems environment Presents new opportunities that are afforded by the advent of Big Data Demystifies the murky waters of repetitive and non-repetitive data in Big Data

Learning Hbase

2014-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shashwat Shriparv

Big Data Data Analytics Hadoop Apache HBase data data-engineering nosql-databases

In "Learning HBase", you'll dive deep into the core functionalities of Apache HBase and understand its applications in handling Big Data environments. By exploring both theoretical concepts and practical scenarios, you'll acquire the skills to set up, manage, and optimize HBase clusters. What this Book will help me do Understand and explain the components of the HBase ecosystem. Install and configure HBase clusters for optimized performance. Develop and maintain applications using HBase's structured storage model. Troubleshoot and resolve common issues in HBase deployments. Leverage Hadoop tools and advanced techniques to enhance HBase capabilities. Author(s) None Shriparv is a skilled technologist with a robust background in Big Data tools and application development. With hands-on expertise in distributed storage systems and data analytics, they lend exceptional insights into managing HBase environments. Their approach combines clarity, practicality, and a focus on real-world applicability. Who is it for? This book is ideal for system administrators and developers who are starting their journey in Big Data technology. With clear explanations and hands-on scenarios, it suits those seeking foundational and intermediate knowledge of the HBase ecosystem. Suitably designed, it helps students, early-career professionals, and mid-level technologists enhance their expertise. If you work in Big Data and want to grow your skill set in distributed storage systems, this book is for you.

IBM Software for SAP Solutions

2014-11-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Khirallah Birkler , Navneet Goyal , Peter Bahrs , Jorg Stolzenberg , Nick Norris , Michel Laaroussi , Michael Love , Bernd Eberhardt , Andrew Stalnecker , Derek Jennings , Stefan Momma , Manfred Oevers , Yaro Dunchych , Joe Kaczmarek , Martin Oberhofer , James Hunter , Paul Pacholski , Pierre Valiquette

BI Data Management DevOps IBM Master Data Management SAP Cyber Security data data-engineering

SAP is a market leader in enterprise business application software. SAP solutions provide a rich set of composable application modules, and configurable functional capabilities that are expected from a comprehensive enterprise business application software suite. In most cases, companies that adopt SAP software remain heterogeneous enterprises running both SAP and non-SAP systems to support their business processes. Regardless of the specific scenario, in heterogeneous enterprises most SAP implementations must be integrated with a variety of non-SAP enterprise systems: Portals Messaging infrastructure Business process management (BPM) tools Enterprise Content Management (ECM) methods and tools Business analytics (BA) and business intelligence (BI) technologies Security Systems of record Systems of engagement When SAP software is used in a large, heterogeneous enterprise environment, SAP clients face the dilemma of selecting the correct set of tools and platforms to implement SAP functionality, and to integrate the SAP solutions with non-SAP systems. This IBM® Redbooks® publication explains the value of integrating IBM software with SAP solutions. It describes how to enhance and extend pre-built capabilities in SAP software with best-in-class IBM enterprise software, enabling clients to maximize return on investment (ROI) in their SAP investment and achieve a balanced enterprise architecture approach. This book describes IBM Reference Architecture for SAP, a prescriptive blueprint for using IBM software in SAP solutions. The reference architecture is focused on defining the use of IBM software with SAP, and is not intended to address the internal aspects of SAP components. The chapters of this book provide a specific reference architecture for many of the architectural domains that are each important for a large enterprise to establish common strategy, efficiency, and balance. The majority of the most important architectural domain topics, such as integration, process optimization, master data management, mobile access, Enterprise Content Management, business intelligence, DevOps, security, systems monitoring, and so on, are covered in the book. However, there are several other architectural domains which are not included in the book. This is not to imply that these other architectural domains are not important or are less important, or that IBM does not offer a solution to address them. It is only reflective of time constraints, available resources, and the complexity of assembling a book on an extremely broad topic. Although more content could have been added, the authors feel confident that the scope of architectural material that has been included should provide organizations with a fantastic head start in defining their own enterprise reference architecture for many of the important architectural domains, and it is hoped that this book provides great value to those reading it. This IBM Redbooks publication is targeted to the following audiences: Client decision makers and solution architects leading enterprise transformation projects and wanting to gain further insight so that they can benefit from the integration of IBM software in large-scale SAP projects. IT architects and consultants integrating IBM technology with SAP solutions.

Data Fluency: Empowering Your Organization with Effective Data Communication

2014-11-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Zach Gemignani , Patrick Schuermann , Chris Gemignani , Richard Galentino

BI Dashboard DataViz data data-engineering

A dream come true for those looking to improve their data fluency Analytical data is a powerful tool for growing companies, but what good is it if it hides in the shadows? Bring your data to the forefront with effective visualization and communication approaches, and let Data Fluency: Empowering Your Organization with Effective Communication show you the best tools and strategies for getting the job done right. Learn the best practices of data presentation and the ways that reporting and dashboards can help organizations effectively gauge performance, identify areas for improvement, and communicate results. Topics covered in the book include data reporting and communication, audience and user needs, data presentation tools, layout and styling, and common design failures. Those responsible for analytics, reporting, or BI implementation will find a refreshing take on data and visualization in this resource, as will report, data visualization, and dashboard designers. Conquer the challenge of making valuable data approachable and easy to understand Develop unique skills required to shape data to the needs of different audiences Full color book links to bonus content at juiceanalytics.com Written by well-known and highly esteemed authors in the data presentation community Data Fluency: Empowering Your Organization with Effective Communication focuses on user experience, making reports approachable, and presenting data in a compelling, inspiring way. The book helps to dissolve the disconnect between your data and those who might use it and can help make an impact on the people who are most affected by data. Use Data Fluency today to develop the skills necessary to turn data into effective displays for decision-making.

Building IBM Enterprise Content Management Solutions From End to End

2014-10-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Josemina Magdalen , Wei-Dong Zhu , Johnson Liu , Yuki Makino , Isuru Fernando , Abraruddin Khan , Ben Davies , Sven Hapke , Blair Groff , Mike Prentice

IBM data data-engineering

IBM® Enterprise Content Management (ECM) solutions provide efficient and effective ways to capture content, manage the content and business processes, discover insights from the content, and derive actions to improve business processes, products, and services. This IBM Redbooks® publication introduces and highlights some of the IBM ECM products that can be implemented and integrated together to create end-to-end ECM solutions: IBM Case Manager IBM Datacap IBM Content Manager OnDemand IBM Enterprise Records IBM Watson™ Content Analytics IBM Content Classification Not all of the products are required to be integrated into an ECM solution. Depending on your business requirements, you can choose a subset of these products to be built into your ECM solutions. This book serves as a hands-on learning guide for information technology (IT) specialists who plan to build ECM solutions from end-to-end, for a proof of concept (PoC) environment, or for a proof of technology environment. For implementing a production-strength ECM solution, also refer to IBM Knowledge Center, IBM Redbooks publications, and IBM Software Services.

IBM Tivoli Storage Productivity Center V5.2 Release Guide

2014-10-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Karen Orlando , Mary Lovelace , Paolo D'Angelo , Brian De Guia , Curtis Neal , Christian Sonder , Markus Standau

Cloud Computing IBM data data-engineering ibm-tivoli

IBM® Tivoli® Storage Productivity Center V5.2 is a feature-rich storage management software suite. The integrated suite provides detailed monitoring, reporting, and management within a single console. In addition, implementing the IBM SmartCloud® Virtual Storage Center (VSC) license with Tivoli Storage Productivity Center addresses new workloads that require massive scale and rapid pace, and accelerates business insight, by adding advanced analytics functions such as storage optimization, provisioning, and transformation. This IBM Redbooks® publication is intended for storage administrators and users who are installing and using the features and functions in IBM Tivoli Storage Productivity Center V5.2. The information in this Redbooks publication can be used to plan for, install, and customize the components of Tivoli Storage Productivity Center in your storage infrastructure. Note: This IBM Redbooks publication is written and based on Tivoli Storage Productivity Center V5.2.2. Sections in this book that pertain to advanced analytics, including cloud configuration, provisioning, transforming volumes, and storage optimization all require the IBM SmartCloud Virtual Storage Center license to be installed.

Architecting and Deploying DB2 with BLU Acceleration

2014-10-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Schlegel , Ayesha Zaka , Brigitte Blaser , Whei-Jen Chen , Polly Lau , Marco Bonezzi , Alexander Zietlow , Jean Cristie Pacanaro

Cognos IBM Linux SAP Unix data data-engineering ibm-db2 relational-databases

IBM® DB2® with BLU Acceleration is a revolutionary technology that is delivered in DB2 for Linux, UNIX, and Windows Release 10.5. BLU Acceleration delivers breakthrough performance improvements for analytic queries by using dynamic in-memory columnar technologies. Different from other vendor solutions, BLU Acceleration allows the unified computing of OLTP and analytics data inside a single database, therefore, removing barriers and accelerating results for users. With observed hundredfold improvement in query response time, BLU Acceleration provides a simple, fast, and easy-to-use solution for the needs of today's organizations; quick access to business answers can be used to gain a competitive edge, lower costs, and more. This IBM Redbooks® publication introduces the concepts of DB2 with BLU Acceleration. It discusses the steps to move from a relational database to using BLU Acceleration, optimizing BLU usage, and deploying BLU into existing analytic solutions today, with an example of IBM Cognos®. This book also describes integration of DB2 with BLU Acceleration into SAP Business Warehouse (SAP BW) and SAP's near-line storage solution on DB2. This publication is intended to be helpful to a wide-ranging audience, including those readers who want to understand the technologies and those who have planning, deployment, and support responsibilities.

Hadoop in Practice, Second Edition

2014-09-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alex Holmes

AI/ML Big Data Hadoop Java Kafka Spark SQL data data-engineering

Hadoop in Practice, Second Edition provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. Brand new chapters cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You'll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently. In short, this is the most practical, up-to-date coverage of Hadoop available anywhere About the Technology About the Book It's always a good time to upgrade your Hadoop skills! Hadoop in Practice, Second Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You'll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available. Readers need to know a programming language like Java and have basic familiarity with Hadoop. What's Inside Thoroughly updated for Hadoop 2 How to write YARN applications Integrate real-time technologies like Storm, Impala, and Spark Predictive analytics using Mahout and RR About the Reader About the Author Alex Holmes works on tough big-data problems. He is a software engineer, author, speaker, and blogger specializing in large-scale Hadoop projects. Quotes Very insightful. A deep dive into the Hadoop world. - Andrea Tarocchi, Red Hat, Inc. The most complete material on Hadoop and its ecosystem known to mankind! - Arthur Zubarev, Vital Insights Clear and concise, full of insights and highly applicable information. - Edward de Oliveira Ribeiro, DataStax, Inc. Comprehensive up-to-date coverage of Hadoop 2. - Muthusamy Manigandan, OzoneMedia

Getting Started with Impala

2014-09-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by John Russell

Big Data Hadoop SQL data data-engineering impala

Learn how to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala—the massively parallel processing SQL query engine for Apache Hadoop. The best practices in this practical guide help you design database schemas that not only interoperate with other Hadoop components, and are convenient for administers to manage and monitor, but also accommodate future expansion in data size and evolution of software capabilities. Written by John Russell, documentation lead for the Cloudera Impala project, this book gets you working with the most recent Impala releases quickly. Ideal for database developers and business analysts, the latest revision covers analytics functions, complex types, incremental statistics, subqueries, and submission to the Apache incubator. Getting Started with Impala includes advice from Cloudera’s development team, as well as insights from its consulting engagements with customers. Learn how Impala integrates with a wide range of Hadoop components Attain high performance and scalability for huge data sets on production clusters Explore common developer tasks, such as porting code to Impala and optimizing performance Use tutorials for working with billion-row tables, date- and time-based values, and other techniques Learn how to transition from rigid schemas to a flexible model that evolves as needs change Take a deep dive into joins and the roles of statistics

Master Competitive Analytics with Oracle Endeca Information Discovery

2014-09-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by William Smith (Databricks) , Helen Sun

Big Data Oracle data data-engineering oracle-database-solutions

Oracle Endeca Information Discovery Best Practices Maximize the powerful capabilities of this self-service enterprise data discovery platform. Master Competitive Analytics with Oracle Endeca Information Discovery reveals how to unlock insights from any type of data, regardless of structure. The first part of the book is a complete technical guide to the product's architecture, components, and implementation. The second part presents a comprehensive collection of business analytics use cases in various industries, including financial services, healthcare, research, manufacturing, retail, consumer packaged goods, and public sector. Step-by-step instructions on implementing some of these use cases are included in this Oracle Press book. Install and manage Oracle Endeca Server Design Oracle Endeca Information Discovery Studio visualizations to facilitate user-driven data exploration and discovery Enable enterprise-driven data exploration with Oracle Endeca Information Discovery Integrator Develop and implement a fraud detection and analysis application Build a healthcare correlation application that integrates claims, patient, and operations analysis; partners; clinical research; and remote monitoring Use an enterprise architecture approach to incrementally establish big data and analytical capabilities

Reliability and Performance with IBM DB2 Analytics Accelerator V4.1

2014-09-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Steve Speller , Ravi Kumar , Anna Griner , Paolo Bruni , Ruiping Li , James Guo , Andy Perkins , Jason Arnold , Jeff Feinsmith , Leticia Cruz , Dino Tonelli , Jonathan Sloan , Chris Harlander , Willie Favero , Johannes Kern

BI IBM Netezza SQL data data-engineering ibm-db2 relational-databases

The IBM® DB2® Analytics Accelerator for IBM z/OS® is a high-performance appliance that integrates the IBM zEnterprise® infrastructure with IBM PureData™ for Analytics, powered by IBM Netezza® technology. With this integration, you can accelerate data-intensive and complex queries in a DB2 for z/OS highly secure and available environment. DB2 and the Analytics Accelerator appliance form a self-managing hybrid environment running online transaction processing and online transactional analytical processing concurrently and efficiently. These online transactions run together with business intelligence and online analytic processing workloads. DB2 Analytics Accelerator V4.1 expands the value of high-performance analytics. DB2 Analytics Accelerator V4.1 opens to static Structured Query Language (SQL) applications and row set processing, minimizes data movement, reduces latency, and improves availability. This IBM Redbooks® publication provides technical decision-makers with an understanding of the benefits of version 4.1 of the Analytics Accelerator with DB2 11 for z/OS. It describes the installation of the new functions, and the advantages to existing analytical processes as measured in our test environment. This book also introduces the DB2 Analytics Accelerator Loader V1.1, a tool that facilitates the data population of the DB2 Analytics Accelerator.

talk-data.com

Activity Trend

Top Events

Top Speakers

NoSQL For Dummies

Learning Spark

Data: Emerging Trends and Technologies

Big Data Analytics

ElasticSearch Cookbook - Second Edition

Elasticsearch: The Definitive Guide

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Mastering Hadoop

Big Data Now: 2014 Edition

Data Architecture: A Primer for the Data Scientist

Learning Hbase

IBM Software for SAP Solutions

Data Fluency: Empowering Your Organization with Effective Data Communication

Building IBM Enterprise Content Management Solutions From End to End

IBM Tivoli Storage Productivity Center V5.2 Release Guide

Architecting and Deploying DB2 with BLU Acceleration

Hadoop in Practice, Second Edition

Getting Started with Impala

Master Competitive Analytics with Oracle Endeca Information Discovery

Reliability and Performance with IBM DB2 Analytics Accelerator V4.1