API

Essentials of Cloud Application Development on IBM Bluemix

2017-08-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hala Aziz , Ahmed Azraq , Sally Fikry , Ben Smith , Mohamed El-Khouly , Ahmed S. Hassan

Agile/Scrum Cloud Computing Computer Science Dashboard DevOps Git IBM JavaScript JSON data data-engineering

Abstract This IBM® Redbooks® publication is based on the Presentations Guide of the course Essentials of Cloud Application Development on IBM Bluemix that was developed by the IBM Redbooks team in partnership with IBM Skills Academy Program. This course is designed to teach university students the basic skills that are required to develop, deploy, and test cloud-based applications that use the IBM Bluemix® cloud services. The primary target audience for this course is university students in undergraduate computer science and computer engineer programs with no previous experience working in cloud environments. However, anyone new to cloud computing can also benefit from this course. After completing this course, you should be able to accomplish the following tasks: Define cloud computing Describe the factors that lead to the adoption of cloud computing Describe the choices that developers have when creating cloud applications Describe infrastructure as a service, platform as a service, and software as a service Describe IBM Bluemix and its architecture Identify the runtimes and services that IBM Bluemix offers Describe IBM Bluemix infrastructure types Create an application in IBM Bluemix Describe the IBM Bluemix dashboard, catalog, and documentation features Explain how the application route is used to test an application from the browser Create services in IBM Bluemix Describe how to bind services to an application in IBM Bluemix Describe the environment variables that are used with IBM Bluemix services Explain what are IBM Bluemix organizations, domains, spaces, and users Describe how to create an IBM SDK for Node.js application that runs on IBM Bluemix Explain how to manage your IBM Bluemix account with the Cloud Foundry CLI Describe how to set up and use the IBM Bluemix plug-in for Eclipse Describe the role of Node.js for server-side scripting Describe IBM Bluemix DevOps Services and the capabilities of IBM DevOps Services Identify the Web IDE features in IBM Bluemix DevOps Describe how to connect a Git repository client to Bluemix DevOps Services project Explain the pipeline build and deploy processes that IBM Bluemix DevOps Services use Describe how IBM Bluemix DevOps Services integrate with the IBM Bluemix cloud Describe the agile planning tools in IBM Bluemix Describe the characteristics of REST APIs Explain the advantages of the JSON data format Describe an example of REST APIs using Watson Describe the main types of data services in IBM Bluemix Describe the benefits of IBM Cloudant® Explain how Cloudant databases and documents are accessed from IBM Bluemix Describe how to use REST APIs to interact with Cloudant database Describe Bluemix mobile backend as a service (MBaaS) and the MBaaS architecture Describe the Push Notifications service Describe the App ID service Describe the Kinetise service Describe how to create Bluemix Mobile applications by using MobileFirst Services Starter Boilerplate The workshop materials were created in June 2017. Therefore, all IBM Bluemix features that are described in this Presentations Guide and IBM Bluemix user interfaces that are used in the examples are current as of June 2017.

Apache Spark 2.x for Java Developers

2017-07-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sourav Gulati (Databricks) , Sumit Kumar

AI/ML Analytics Big Data CSV Java JSON Kafka Scala Spark SQL Data Streaming XML +3 more

Delve into mastering big data processing with 'Apache Spark 2.x for Java Developers.' This book provides a practical guide to implementing Apache Spark using the Java APIs, offering a unique opportunity for Java developers to leverage Spark's powerful framework without transitioning to Scala. What this Book will help me do Learn how to process data from formats like XML, JSON, CSV using Spark Core. Implement real-time analytics using Spark Streaming and third-party tools like Kafka. Understand data querying with Spark SQL and master SQL schema processing. Apply machine learning techniques with Spark MLlib to real-world scenarios. Explore graph processing and analytics using Spark GraphX. Author(s) None Kumar and None Gulati, experienced professionals in Java development and big data, bring their wealth of practical experience and passion for teaching to this book. With a clear and concise writing style, they aim to simplify Spark for Java developers, making big data approachable. Who is it for? This book is perfect for Java developers who are eager to expand their skillset into big data processing with Apache Spark. Whether you are a seasoned Spark user or first diving into big data concepts, this book meets you at your level. With practical examples and straightforward explanations, you can unlock the potential of Spark in real-world scenarios.

Building on Multi-Model Databases

2017-07-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pete Aven

Data Management Cyber Security data data-engineering data-models

In many organizations today, businesspeople are busy requesting unified views of data stored across multiple sources within their organizations. But integrating multiple data types from multiple data stores is a complex, error-prone, and time-consuming process of cobbling everything together manually. This concise book examines how multi-model databases can help you integrate data storage and access across your organization in a seamless and elegant way. Author Pete Aven and Diane Burley from MarkLogic explain how this latest evolution in data management naturally accepts heterogeneous data, enabling you to eventually phase out technical data silos. Through several case studies, you’ll discover how organizations use multi-model databases to reduce complexity, save money, take advantage of opportunities, lessen risk, and shorten time to value. Get unified views across disparate data models and formats within a single database Learn how multi-model databases leverage the inherent structure of the data being stored Load and use unstructured and semi-structured data (such as documents and text) as is Provide agility in data access and delivery through APIs, interfaces, and indexes Learn how to scale a multi-model database, and provide ACID capabilities and security Examine how a multi-model database would fit into your existing architecture

JSON at Work

2017-07-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom Marrs

Java JavaScript JSON JSON Schema Kafka MongoDB data data-engineering storage-formats

JSON is becoming the backbone for meaningful data interchange over the internet. This format is now supported by an entire ecosystem of standards, tools, and technologies for building truly elegant, useful, and efficient applications. With this hands-on guide, author and architect Tom Marrs shows you how to build enterprise-class applications and services by leveraging JSON tooling and message/document design. JSON at Work provides application architects and developers with guidelines, best practices, and use cases, along with lots of real-world examples and code samples. You’ll start with a comprehensive JSON overview, explore the JSON ecosystem, and then dive into JSON’s use in the enterprise. Get acquainted with JSON basics and learn how to model JSON data Learn how to use JSON with Node.js, Ruby on Rails, and Java Structure JSON documents with JSON Schema to design and test APIs Search the contents of JSON documents with JSON Search tools Convert JSON documents to other data formats with JSON Transform tools Compare JSON-based hypermedia formats, including HAL and jsonapi Leverage MongoDB to store and access JSON documents Use Apache Kafka to exchange JSON-based messages between services

Learning Elasticsearch

2017-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Abhishek Andhavarapu

Analytics Cloud Computing ELK Kibana data data-engineering elasticsearch search

This comprehensive guide to Elasticsearch will teach you how to build robust and scalable search and analytics applications using Elasticsearch 5.x. You will learn the fundamentals of Elasticsearch, including its APIs and tools, and how to apply them to real-world problems. By the end of the book, you will have a solid grasp of Elasticsearch and be ready to implement your own solutions. What this Book will help me do Master the setup and configuration of Elasticsearch and Kibana. Learn to efficiently query and analyze both structured and unstructured data. Understand how to use Elasticsearch aggregations to perform advanced analytics. Gain knowledge of advanced search features including geospatial queries and autocomplete. Explore the Elastic Stack and learn deployment best practices and cloud hosting options. Author(s) None Andhavarapu is an expert in database technology and distributed systems, with years of experience in Elasticsearch. Their passion for search technologies is reflected in their clear and practical teaching style. They've written this guide to help developers of all levels get up to speed with Elasticsearch quickly and comprehensively. Who is it for? This book is perfect for software developers looking to implement effective search and analytics solutions. It's ideal for those who are new to Elasticsearch as well as for professionals familiar with other search tools like Lucene or Solr. The book assumes basic programming knowledge but no prior experience with Elasticsearch.

Sams Teach Yourself Hadoop in 24 Hours

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeffrey Aven

Big Data Cloud Computing Hadoop HDFS Hive Java Spark data data-engineering

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Mastering Spark for Data Science

2017-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matthew Hallett , David George , Antoine Amend (Databricks) , Andrew Morgan

AI/ML Analytics Big Data Data Science Spark SQL Data Streaming apache-spark data data-engineering

Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products About This Book Develop and apply advanced analytical techniques with Spark Learn how to tell a compelling story with data science using Spark’s ecosystem Explore data at scale and work with cutting edge data science methods Who This Book Is For This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, especially those who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes. What You Will Learn Learn the design patterns that integrate Spark into industrialized data science pipelines See how commercial data scientists design scalable code and reusable code for data science services Explore cutting edge data science methods so that you can study trends and causality Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs Find out how Spark can be used as a universal ingestion engine tool and as a web scraper Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams Study advanced Spark concepts, solution design patterns, and integration architectures Demonstrate powerful data science pipelines In Detail Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs. This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more. You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly. Style and approach This is an advanced guide for those with beginner-level familiarity with the Spark architecture and working with Data Science applications. Mastering Spark for Data Science is a practical tutorial that uses core Spark APIs and takes a deep dive into advanced libraries including: Spark SQL, visual streaming, and MLlib. This book expands on titles like: Machine Learning with Spark and Learning Spark. It is the next learning curve for those comfortable with Spark and looking to improve their skills.

Pro Apache Phoenix: An SQL Driver for HBase, First Edition

2016-12-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Magham , Shakil Akhtar

Big Data Data Modelling Hadoop Apache HBase NoSQL Spark SQL data data-engineering nosql-databases

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space. This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds. Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop. You will learn how to: Handle a petabyte data store by applying familiar SQL techniques Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase Apply best practices while working with a scalable data store on Hadoop and HBase Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis Demonstrate real-time use cases and big data modeling techniques Who This Book Is For Data engineers, Big Data administrators, and architects

Apache HBase Primer

2016-11-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Deepak Vohra

Data Modelling Hadoop Apache HBase NoSQL data data-engineering nosql-databases

Learn the fundamental foundations and concepts of the Apache HBase (NoSQL) open source database. It covers the HBase data model, architecture, schema design, API, and administration. Apache HBase is the database for the Apache Hadoop framework. HBase is a column family based NoSQL database that provides a flexible schema model. What You'll Learn Work with the core concepts of HBase Discover the HBase data model, schema design, and architecture Use the HBase API and administration Who This Book Is For Apache HBase (NoSQL) database users, designers, developers, and admins.

Oracle R Enterprise: Harnessing the Power of R in Oracle Database

2016-11-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brendan Tierney

Analytics Big Data Hadoop Oracle R SQL data data-engineering oracle-database-solutions

Master the Big Data Capabilities of Oracle R Enterprise Effectively manage your enterprise’s big data and keep complex processes running smoothly using the hands-on information contained in this Oracle Press guide. Oracle R Enterprise: Harnessing the Power of R in Oracle Database shows, step-by-step, how to create and execute large-scale predictive analytics and maintain superior performance. Discover how to explore and prepare your data, accurately model business processes, generate sophisticated graphics, and write and deploy powerful scripts. You will also find out how to effectively incorporate Oracle R Enterprise features in APEX applications, OBIEE dashboards, and Apache Hadoop systems. Learn to: • Install, configure, and administer Oracle R Enterprise • Establish connections and move data to the database • Create Oracle R Enterprise packages and functions • Use the R language to work with data in Oracle Database • Build models using ODM, ORE, and other algorithms • Develop and deploy R scripts and use the R script repository • Execute embedded R scripts and employ ORE SQL API functions • Map and manipulate data using Oracle R Advanced Analytics for Hadoop • Use ORE in Oracle Data Miner, OBIEE, and other applications

Spark in Action

2016-11-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marko Bonaci , Petar Zecevic

AI/ML Analytics Big Data DevOps Docker Java Python Scala Spark SQL Data Streaming Virtual Machine +3 more

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0. About the Technology Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades. About the Book Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code. What's Inside Updated for Spark 2.0 Real-life case studies Spark DevOps with Docker Examples in Scala, and online in Java and Python About the Reader Written for experienced programmers with some background in big data or machine learning. About the Authors Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community. Quotes Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide. - Jonathan Sharley, Pandora Media Must-have! Speed up your learning of Spark as a distributed computing framework. - Robert Ormandi, Yahoo! An easy-to-follow, step-by-step guide. - Gaurav Bhardwaj, 3Pillar Global An ambitiously comprehensive overview of Spark and its diverse ecosystem. - Jonathan Miller, Optensity

Fast Data Processing with Spark 2 - Third Edition

2016-10-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Holden Karau (Fight Health Insurance) , Krishna Sankar

AI/ML Analytics Big Data Cloud Computing Data Analytics Data Engineering Java Scala Spark apache-spark data data-engineering

Fast Data Processing with Spark 2 takes you through the essentials of leveraging Spark for big data analysis. You will learn how to install and set up Spark, handle data using its APIs, and apply advanced functionality like machine learning and graph processing. By the end of the book, you will be well-equipped to use Spark in real-world data processing tasks. What this Book will help me do Install and configure Apache Spark for optimal performance. Interact with distributed datasets using the resilient distributed dataset (RDD) API. Leverage the flexibility of DataFrame API for efficient big data analytics. Apply machine learning models using Spark MLlib to solve complex problems. Perform graph analysis using GraphX to uncover structural insights in data. Author(s) Krishna Sankar is an experienced data scientist and thought leader in big data technologies. With a deep understanding of machine learning, distributed systems, and Apache Spark, Krishna has guided numerous projects in data engineering and big data processing. Matei Zaharia, the co-author, is also widely recognized in the field of distributed systems and cloud computing, contributing to Apache Spark development. Who is it for? This book is catered to software developers and data engineers with a foundational understanding of Scala or Java programming. Beginner to medium-level understanding of big data processing concepts is recommended for readers. If you are aspiring to solve big data problems using scalable distributed computing frameworks, this book is perfect for you. By the end, you will be confident in building Spark-powered applications and analyzing data efficiently.

Essentials of Cloud Application Development on IBM Bluemix

2016-10-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hala Aziz , Ahmed Azraq , Sally Fikry , Ben Smith , Mohamed El-Khouly

Cloud Computing Computer Science DevOps Git IBM JavaScript JSON data data-engineering

Abstract This IBM® Redbooks® publication is based on the Presentations Guide of the course "Essentials of Cloud Application Development on IBM Bluemix" that was developed by the IBM Redbooks team in partnership with IBM Middle East and Africa (MEA) University Program. This course is designed to teach university students the basic skills that are required to develop, deploy, and test cloud-based applications that use the IBM Bluemix® cloud services. The primary target audience for this course is university students in undergraduate computer science and computer engineer programs with no previous experience working in cloud environments. However, anyone new to cloud computing can benefit from this course. After completing this course, you should be able to accomplish these tasks: Describe the factors that lead to the adoption of cloud computing. Describe infrastructure as a service, platform as a service, and software as a service. Define cloud computing. Describe IBM Bluemix. Describe the architecture of IBM Bluemix. Identify the runtimes and services that Bluemix offers. Explain how to get started with Bluemix. Describe Bluemix organizations, domains, spaces, and users. Create Bluemix applications. Use services in a Bluemix application. Set environmental variables that are used with Bluemix services. Deploy and run Bluemix applications. Describe how to create an IBM SDK for Node.js application that runs on Bluemix. Explain how to manage a Bluemix account with the Cloud Foundry CLI.[ ]Describe how to integrate workstation development platforms with Bluemix. Manage application code and assets with IBM Bluemix DevOps services. Work with the Git repository that is used by DevOps services. Describe the characteristics of REST APIs. Describe the use of JSON as the preferred data format for REST APIs. dentify the data services that are available on Bluemix. Describe the features in Bluemix for developing mobile applications. Create a MobileFirst Services Starter application on Bluemix. Send push notifications from Bluemix and receive them on the mobile device emulator. The workshop materials were created in August 2016. Thus, all IBM Bluemix features discussed in this Presentations Guide and Bluemix user interfaces used in the examples are current as of August 2016. Note: This IBM Redbooks publication references exercises that are NOT included with this book. The exercises are only available to students attending the course.

Spark for Data Science

2016-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bikramaditya Singhal , Srinivas Duvvuri

AI/ML Analytics Big Data Data Analytics Data Science Spark apache-spark data data-engineering

Explore how to leverage Apache Spark for efficient big data analytics and machine learning solutions in "Spark for Data Science". This detailed guide provides you with the skills to process massive datasets, perform data analytics, and build predictive models using Spark's powerful tools like RDDs, DataFrames, and Datasets. What this Book will help me do Gain expertise in data processing and transformation with Spark. Perform advanced statistical analysis to uncover insights. Master machine learning techniques to create predictive models using Spark. Utilize Spark's APIs to process and visualize big data. Build scalable and efficient data science solutions. Author(s) This book is co-authored by None Singhal and None Duvvuri, both accomplished data scientists with extensive experience in Apache Spark and big data technologies. They bring their practical industry expertise to explain complex topics in a straightforward manner. Their writing emphasizes real-world applications and step-by-step procedural guidance, making this a valuable resource for learners. Who is it for? This book is ideally suited for technologists seeking to incorporate data science capabilities into their work with Apache Spark, data scientists interested in machine learning algorithms implemented in Spark, and beginners aiming to step into the field of big data analytics. Whether you are familiar with Spark or completely new, this book offers valuable insights and practical knowledge.

Sams Teach Yourself Apache Spark™ in 24 Hours

2016-08-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeffrey Aven

AI/ML Big Data Cassandra Cloud Computing Data Engineering Kafka NoSQL Python Scala Spark SQL Data Streaming +3 more

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility. This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data. Learn how to • Discover what Apache Spark does and how it fits into the Big Data landscape • Deploy and run Spark locally or in the cloud • Interact with Spark from the shell • Make the most of the Spark Cluster Architecture • Develop Spark applications with Scala and functional Python • Program with the Spark API, including transformations and actions • Apply practical data engineering/analysis approaches designed for Spark • Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output • Optimize Spark solution performance • Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra) • Leverage cutting-edge functional programming techniques • Extend Spark with streaming, R, and Sparkling Water • Start building Spark-based machine learning and graph-processing applications • Explore advanced messaging technologies, including Kafka • Preview and prepare for Spark’s next generation of innovations Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.

Monitoring Elasticsearch

2016-07-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dan Noble

ELK Kibana data data-engineering elasticsearch search

"Monitoring Elasticsearch" focuses on teaching readers how to manage and monitor the health and performance of Elasticsearch clusters. Through practical steps and real-world examples, this book ensures that users can diagnose, resolve, and prevent common issues to optimize system reliability and performance. What this Book will help me do Obtain a clear understanding of Elasticsearch monitoring tools and their features. Learn how to diagnose and troubleshoot common Elasticsearch performance issues. Master the use of Elasticsearch APIs for monitoring and analysis. Explore the best practices for effectively maintaining cluster reliability. Understand the features of tools like Kibana, Marvel, and BigDesk for Elasticsearch monitoring. Author(s) The authors of "Monitoring Elasticsearch" are experts in distributed systems and database management, with extensive experience in Elasticsearch deployment and monitoring. They bring their practical knowledge, teaching readers clear and actionable techniques. Their approachable style makes complex systems accessible, helping professionals and aficionados alike. Who is it for? This book is ideal for developers and system administrators who work with Elasticsearch, regardless of their industry. Whether you're new to Elasticsearch or aiming to deepen your expertise, you will find practical solutions and helpful tools. The content suits a range of experiences, from beginners curious about cluster monitoring to experts needing solutions for specific issues. If you use Elasticsearch or plan to, this book is for you.

Architecting HBase Applications

2016-07-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kevin O'Dell , Jean-Marc Spaggiari

Data Management Apache HBase Java Kafka Master Data Management Spark data data-engineering nosql-databases

HBase is a remarkable tool for indexing mass volumes of data, but getting started with this distributed database and its ecosystem can be daunting. With this hands-on guide, you’ll learn how to architect, design, and deploy your own HBase applications by examining real-world solutions. Along with HBase principles and cluster deployment guidelines, this book includes in-depth case studies that demonstrate how large companies solved specific use cases with HBase. Authors Jean-Marc Spaggiari and Kevin O’Dell also provide draft solutions and code examples to help you implement your own versions of those use cases, from master data management (MDM) and document storage to near real-time event processing. You’ll also learn troubleshooting techniques to help you avoid common deployment mistakes. Learn exactly what HBase does, what its ecosystem includes, and how to set up your environment Explore how real-world HBase instances were deployed and put into production Examine documented use cases for tracking healthcare claims, digital advertising, data management, and product quality Understand how HBase works with tools and techniques such as Spark, Kafka, MapReduce, and the Java API Learn how to identify the causes and understand the consequences of the most common HBase issues

Java XML and JSON

2016-06-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeff Friesen

Java JSON XML data data-engineering storage-formats

Java XML and JSON is your one-stop guide to mastering the XML metalanguage and JSON data format along with significant Java APIs for parsing and creating XML/JSON documents (and more). The first six chapters focus on XML along with the SAX, DOM, StAX, XPath, and XSLT APIs. The remaining four chapters focus on JSON along with the mJson, GSON, and JsonPath APIs. Each chapter ends with select exercises designed to challenge your grasp of the chapter's content. An appendix provides the answers to these exercises. What You'll Learn Master the XML language Learn how to validate XML documents Learn how to parse XML documents with the SAX, DOM, and StAX APIs Learn how to create XML documents with the DOM and StAX APIs Learn how to extract values from XML documents with the XPath API Learn how to transform XML documents with the XSLT API Master the JSON format Learn how to validate JSON documents Learn how to parse and create JSON documents with the mJson and Gson APIs Learn how to extract values from JSON documents with the JsonPath API Who This Book Is For Intermediate or advanced Java programmers/developers.

Spark GraphX in Action

2016-06-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Malak , Robin East

AI/ML Analytics Big Data Scala Spark apache-spark data data-engineering

Spark GraphX in Action starts out with an overview of Apache Spark and the GraphX graph processing API. This example-based tutorial then teaches you how to configure GraphX and how to use it interactively. Along the way, you'll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data. About the Technology GraphX is a powerful graph processing API for the Apache Spark analytics engine that lets you draw insights from large datasets. GraphX gives you unprecedented speed and capacity for running massively parallel and machine learning algorithms. About the Book Spark GraphX in Action begins with the big picture of what graphs can be used for. This example-based tutorial teaches you how to use GraphX interactively. You'll start with a crystal-clear introduction to building big data graphs from regular data, and then explore the problems and possibilities of implementing graph algorithms and architecting graph processing pipelines. Along the way, you'll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data. What's Inside Understanding graph technology Using the GraphX API Developing algorithms for big graphs Machine learning with graphs Graph visualization About the Reader Readers should be comfortable writing code. Experience with Apache Spark and Scala is not required. About the Authors Michael Malak has worked on Spark applications for Fortune 500 companies since early 2013. Robin East has worked as a consultant to large organizations for over 15 years and is a data scientist at Worldpay. Quotes Learn complex graph processing from two experienced authors…A comprehensive guide. - Gaurav Bhardwaj, 3Pillar Global The best resource to go from GraphX novice to expert in the least amount of time. - Justin Fister, PaperRater A must-read for anyone serious about large-scale graph data mining! - Antonio Magnaghi, OpenMail Reveals the awesome and elegant capabilities of working with linked data for large-scale datasets. - Sumit Pal, Independent consultant

Spring Persistence with Hibernate, Second Edition

2016-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brian D. Murphy , Paul Fisher

Agile/Scrum Java data data-engineering database-management-tools hibernate object-relational-mapping

Learn how to use the core Hibernate APIs and tools as part of the Spring Framework. This book illustrates how these two frameworks can be best utilized. Other persistence solutions available in Spring are also shown including the Java Persistence API (JPA). Spring Persistence with Hibernate, Second Edition has been updated to cover Spring Framework version 4 and Hibernate version 5. After reading and using this book, you'll have the fundamentals to apply these persistence solutions into your own mission-critical enterprise Java applications that you build using Spring. Persistence is an important set of techniques and technologies for accessing and using data, and ensuring that data is mobile regardless of specific applications and contexts. In Java development, persistence is a key factor in enterprise, e-commerce, and other transaction-oriented applications. Today, the agile and open source Spring Framework is the leading out-of-the-box, open source solution for enterprise Java developers; in it, you can find a number of Java persistence solutions What You'll Learn Use Spring Persistence, including using persistence tools in Spring as well as choosing the best Java persistence frameworks outside of Spring Take advantage of Spring Framework features such as Inversion of Control (IoC), aspect-oriented programming (AOP), and more Work with Spring JDBC, use declarative transactions with Spring, and reap the benefits of a lightweight persistence strategy Harness Hibernate and integrate it into your Spring-based enterprise Java applications for transactions, data processing, and more Integrate JPA for creating a well-layered persistence tier in your enterprise Java application Who This Book Is For This book is ideal for developers interested in learning more about persistence framework options on the Java platform, as well as fundamental Spring concepts. Because the book covers several persistence frameworks, it is suitable for anyone interested in learning more about Spring or any of the frameworks covered. Lastly, this book covers advanced topics related to persistence architecture and design patterns, and is ideal for beginning developers looking to learn more in these areas.

talk-data.com

Activity Trend

Top Events

Top Speakers

Essentials of Cloud Application Development on IBM Bluemix

Apache Spark 2.x for Java Developers

Building on Multi-Model Databases

JSON at Work

Learning Elasticsearch

Sams Teach Yourself Hadoop in 24 Hours

Mastering Spark for Data Science

Pro Apache Phoenix: An SQL Driver for HBase, First Edition

Apache HBase Primer

Oracle R Enterprise: Harnessing the Power of R in Oracle Database

Spark in Action

Fast Data Processing with Spark 2 - Third Edition

Essentials of Cloud Application Development on IBM Bluemix

Spark for Data Science

Sams Teach Yourself Apache Spark™ in 24 Hours

Monitoring Elasticsearch

Architecting HBase Applications

Java XML and JSON

Spark GraphX in Action

Spring Persistence with Hibernate, Second Edition