Java

Pro MySQL NDB Cluster

2017-11-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jesper Wisborg Krogh , Mikiya Okuno

API MySQL SQL data data-engineering relational-databases

Create and run a real-time, highly-available, and high-redundancy version of the world's most popular open-source database, MySQL. You will understand the advantages and disadvantages of the MySQL NDB Cluster solution, and when MySQL NDB Cluster is the right choice. Pro MySQL NDB Cluster walks you through the full lifecycle of a MySQL Cluster installation: starting with the installation and initial configuration, moving through online configuration and schema changes, and completing with online upgrades. Along the way, you will learn to monitor your cluster, make decisions about schema design, implement geographic replication, troubleshoot and optimize performance, and much more. This book covers the many programming APIs that are supported by MySQL NDB Cluster. There's also robust coverage of connecting to MySQL NDB Cluster from Java, SQL, memcached, and even from C++. From any of these languages, you'll be able to connect and store and retrieve data as your applications demand. The book: Covers MySQL NDB Cluster concepts and architecture Takes you through the MySQL NDB Cluster lifecycle from installation to upgrades Guides you through DBA and Developer decisions when working with MySQL NDB Cluster What You'll Learn Understand the shared-nothing architecture behind MySQL NDB Cluster Plan, install, and configure a MySQL NDB Cluster environment Perform everyday tasks such as backing up, restoring, and upgrading Develop applications from Java, memcached, C++, and SQL Troubleshoot and resolve application performance problems Master enterprise-level features such the MySQL NDB Cluster Manager Who This Book Is For Database administrators and developers who are looking into deploying MySQL NDB Cluster, or who already have a cluster in production and want to increase their knowledge and ability to handle routine administrative tasks and troubleshooting. The book also is for those developers wanting to employ MySQL NDB Cluster as their chosen storage engine from Java, memcached, and C++ applications.

Practical Real-time Data Processing and Analytics

2017-09-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Prateek Bhati , Selva raj Ramasamy , Shilpi Saxena , Saurabh Gupta (The Modern Data Company)

Analytics Flink Data Engineering Kafka Spark data data-engineering real-time-analytics streaming-messaging

This book provides a comprehensive guide to real-time data processing and analytics using modern frameworks like Apache Spark, Flink, Storm, and Kafka. Through practical examples and in-depth explanations, you will learn how to implement efficient, scalable, real-time processing pipelines. What this Book will help me do Understand real-time data processing essentials and the technology stack Learn integration of components like Apache Spark and Kafka Master the concepts of stream processing with detailed case studies Gain expertise in developing monitoring and alerting solutions for real-time systems Prepare to implement production-grade real-time data solutions Author(s) Shilpi Saxena and Saurabh Gupta, the authors, are experienced professionals in distributed systems and data engineering, focusing on practical applications of real-time computing. They bring their extensive industry experience to this book, helping readers understand the complexities of real-time data solutions in an approachable and hands-on manner. Who is it for? This book is ideal for software engineers and data engineers with a background in Java who seek to develop real-time data solutions. It is suitable for readers familiar with concepts of real-time data processing, and enhances knowledge in frameworks like Spark, Flink, Storm, and Kafka. Target audience includes learners building production data solutions and those designing distributed analytics engines.

Oracle ADF Survival Guide: Mastering the Application Development Framework

2017-09-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sten Vesterli

ADF Oracle data data-engineering oracle-database-solutions

Quickly get up to speed with Oracle's Application Development Framework (ADF). Rapidly build modern, user-friendly applications that will be easy to re-use, expand, and maintain. Oracle ADF Survival Guide covers the latest 12c version and explains all the important concepts and parts, including ADF Faces, ADF Task Flows, ADF Business Components, ADF Skins, the new Alta UI, and how to implement business logic in all layers of the application. Organizations with existing investments in Oracle database and Oracle Forms applications will be able to leverage Oracle's best practice for application development in moving those applications to the ADF framework. The book: Explains all parts of the ADF stack Shows how to integrate with databases and web services Demonstrates the best practice for ADF enterprise architecture What You Will Learn Rapidly build great-looking, user-friendly screens Build page flows visually for improved communication with business users Easily connect your user interface to databases and other back-end systems Leverage the best practice for productive team development Establish a solid enterprise architecture for maximum reuse and maintainability Automate your build and deployment process Who This Book Is For Experienced developers who want to rapidly become productive with Oracle's Application Development Framework (ADF) 12c. It is for Oracle Forms and database developers working for organizations who have followed Oracle’s strategic direction to ADF, as well as for experienced Java developers who want to learn Oracle’s highly-productive, JSF framework.

Building Data Streaming Applications with Apache Kafka

2017-08-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Manisha Sethi , Anshul Joshi , Chanchal Singh , Manish Kumar

Data Engineering Kafka Spark Data Streaming data data-engineering streaming-messaging

Learn how to design and build efficient real-time streaming applications using Apache Kafka, a leading distributed streaming platform. This book provides comprehensive guidance on setting up Kafka clusters, developing producers and consumers, and integrating with frameworks like Spark, Storm, and Heron. By the end, you'll master the skills needed to create enterprise-grade data streaming solutions. What this Book will help me do Grasp the core concepts and components of Apache Kafka and its ecosystem. Develop robust Kafka producers and consumers to process real-time data streams. Design and implement streaming applications using Spark, Storm, and Heron. Plan Kafka deployments with a focus on scalability, capacity, and fault tolerance. Ensure secure data streaming with best practices for securing Apache Kafka. Author(s) The authors, None Singh and None Kumar, bring years of expertise in data engineering and distributed systems. Having worked extensively with streaming technologies like Apache Kafka, they aim to share their in-depth knowledge through practical examples and real-world scenarios. Their approach to teaching focuses on making complex concepts easily understandable. Who is it for? This book is ideal for software developers and data engineers who are eager to learn Apache Kafka for building streaming applications. Some experience with programming, particularly Java, will help readers get the most out of the material. If you are working on data-processing systems or looking to enhance your skills in real-time data handling, this book caters to your needs.

Mastering Apache Storm

2017-08-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ankit Jain

Big Data Hadoop Apache HBase Kafka Redis Data Streaming data data-engineering storm streaming-messaging

Mastering Apache Storm is your step-by-step guide to mastering real-time data streaming with this robust framework. You'll learn how to process big data efficiently and integrate Apache Storm with popular technologies like Kafka, HBase, and Redis to maximize its potential. This book walks you through from basic concepts to advanced implementations of Apache Storm in real-world scenarios. What this Book will help me do Understand the core features and operation of Apache Storm for real-time data streaming. Integrate Apache Storm with other Big Data frameworks like Kafka, HBase, Redis, and Hadoop. Effectively deploy and manage multi-node Apache Storm clusters in real-world environments. Monitor and analyze your data streams and system health effectively using built-in and external tools. Learn to implement fault-tolerant, scalable, and distributed stream processing applications in Apache Storm. Author(s) None Jain is an experienced software developer and technical instructor specializing in distributed systems and real-time data processing. With years of experience working with Apache Storm and related technologies, their teachings focus on practical, hands-on learning to equip readers with actionable skills. Who is it for? This book is ideal for Java developers aspiring to build expertise in real-time data streaming and distributed processing applications using Apache Storm. Beginners can start with the fundamentals provided, while those with prior knowledge can delve into intermediate and advanced implementations.

Apache Spark 2.x for Java Developers

2017-07-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sourav Gulati (Databricks) , Sumit Kumar

AI/ML Analytics API Big Data CSV JSON Kafka Scala Spark SQL Data Streaming XML +3 more

Delve into mastering big data processing with 'Apache Spark 2.x for Java Developers.' This book provides a practical guide to implementing Apache Spark using the Java APIs, offering a unique opportunity for Java developers to leverage Spark's powerful framework without transitioning to Scala. What this Book will help me do Learn how to process data from formats like XML, JSON, CSV using Spark Core. Implement real-time analytics using Spark Streaming and third-party tools like Kafka. Understand data querying with Spark SQL and master SQL schema processing. Apply machine learning techniques with Spark MLlib to real-world scenarios. Explore graph processing and analytics using Spark GraphX. Author(s) None Kumar and None Gulati, experienced professionals in Java development and big data, bring their wealth of practical experience and passion for teaching to this book. With a clear and concise writing style, they aim to simplify Spark for Java developers, making big data approachable. Who is it for? This book is perfect for Java developers who are eager to expand their skillset into big data processing with Apache Spark. Whether you are a seasoned Spark user or first diving into big data concepts, this book meets you at your level. With practical examples and straightforward explanations, you can unlock the potential of Spark in real-world scenarios.

JSON at Work

2017-07-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom Marrs

API JavaScript JSON JSON Schema Kafka MongoDB data data-engineering storage-formats

JSON is becoming the backbone for meaningful data interchange over the internet. This format is now supported by an entire ecosystem of standards, tools, and technologies for building truly elegant, useful, and efficient applications. With this hands-on guide, author and architect Tom Marrs shows you how to build enterprise-class applications and services by leveraging JSON tooling and message/document design. JSON at Work provides application architects and developers with guidelines, best practices, and use cases, along with lots of real-world examples and code samples. You’ll start with a comprehensive JSON overview, explore the JSON ecosystem, and then dive into JSON’s use in the enterprise. Get acquainted with JSON basics and learn how to model JSON data Learn how to use JSON with Node.js, Ruby on Rails, and Java Structure JSON documents with JSON Schema to design and test APIs Search the contents of JSON documents with JSON Search tools Convert JSON documents to other data formats with JSON Transform tools Compare JSON-based hypermedia formats, including HAL and jsonapi Leverage MongoDB to store and access JSON documents Use Apache Kafka to exchange JSON-based messages between services

Understanding Message Brokers

2017-06-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jakub Korab

Kafka data data-engineering streaming-messaging streaming & messaging

Messaging is one of the more poorly understood areas of IT; most developers and architects have only a passing familiarity with how broker-based messaging technologies work. This practical report not only helps you get up to speed on the essentials of messaging, but also compares two of today’s most popular messaging technologies—Apache ActiveMQ and Apache Kafka. Author and consultant Jakub Korab describes use cases and design choices that lead developers to very different approaches for developing message-based systems. You’ll come away with a high-level understanding of both ActiveMQ and Kafka, including how they should and should not be used, how they handle concerns such as throughput and high-availability, and what to look out for when considering other messaging technologies in future. Understand the types of problems that messaging systems address Explore three primary messaging patterns: point-to-point, publish-subscribe, and a hybrid of both Dive into ActiveMQ, a classic broker-centric design implemented through Java libraries that works for a broad range of messaging use cases Examine Kafka, a distributed system that can be scaled to provide massive performance and fault tolerance through replication Learn the mechanical complexities that message-based systems need to address, and some patterns you can apply to deal with those complexities

Advanced Analytics with Spark, 2nd Edition

2017-06-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sandy Ryza (Databricks) , Sean Owen (Databricks) , Josh Wills , Uri Laserson

AI/ML Analytics Data Science Python Scala Cyber Security Spark apache-spark data data-engineering

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses

Data Lake for Enterprises

2017-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pankaj Misra , Tomcy John , Vivek Mishra

AI/ML AWS Lambda Big Data Data Lake Data Management Hadoop Kafka Spark data data-engineering data-lake storage-repositories

"Data Lake for Enterprises" is a comprehensive guide to building data lakes using the Lambda Architecture. It introduces big data technologies like Hadoop, Spark, and Flume, showing how to use them effectively to manage and leverage enterprise-scale data. You'll gain the skills to design and implement data systems that handle complex data challenges. What this Book will help me do Master the use of Lambda Architecture to create scalable and effective data management systems. Understand and implement technologies like Hadoop, Spark, Kafka, and Flume in an enterprise data lake. Integrate batch and stream processing techniques using big data tools for comprehensive data analysis. Optimize data lakes for performance and reliability with practical insights and techniques. Implement real-world use cases of data lakes and machine learning for predictive data insights. Author(s) None Mishra, None John, and Pankaj Misra are recognized experts in big data systems with a strong background in designing and deploying data solutions. With a clear and methodical teaching style, they bring years of experience to this book, providing readers with the tools and knowledge required to excel in enterprise big data initiatives. Who is it for? This book is ideal for software developers, data architects, and IT professionals looking to integrate a data lake strategy into their enterprises. It caters to readers with a foundational understanding of Java and big data concepts, aiming to advance their practical knowledge of building scalable data systems. If you're eager to delve into cutting-edge technologies and transform enterprise data management, this book is for you.

IBM DB2 Web Query for i: The Nuts and Bolts

2017-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rob Bestgen , Doug Mack , Lin Su , Simona Pacchiarini , Kathryn Steinbrink , Hernando Bedoya , Kevin Trisko , Jim Bainbridge , Mike Cain

BI HTML IBM Microsoft SQL SQL Server data data-engineering ibm-db2 relational-databases

Abstract Business Intelligence (BI) is a broad term that relates to applications that analyze data to understand and act on the key metrics that drive profitability in an enterprise. Key to analyzing that data is providing fast, easy access to it while delivering it in formats or tools that best fit the needs of the user. At the core of any BI solution are user query and reporting tools that provide intuitive access to data supporting a spectrum of users from executives to “power users,” from spreadsheet aficionados to the external Internet consumer. IBM® DB2® Web Query for i offers a set of modernized tools for a more robust, extensible, and productive reporting solution than the popular IBM Query for System i® tool (also known as IBM Query/400). IBM DB2 Web Query for i preserves investments in the reports that are developed with Query/400 by offering a choice of importing definitions into the new technology or continuing to run existing Query/400 reports as is. But, it also offers significant productivity and performance enhancements by leveraging the latest in DB2 for i query optimization technology. The DB2 Web Query for i product is a web-based query and report writing product that offers enhanced capabilities over the IBM Query for iSeries product (also commonly known as Query/400). IBM DB2 Web Query for i includes Query for iSeries technology to assist customers in their transition to DB2 Web Query. It offers a more modernized, Java based solution for a more robust, extensible, and productive reporting solution. DB2 Web Query provides the ability to query or build reports against data that is stored in DB2 for i (or Microsoft SQL Server) databases through browser-based user interface technologies: Build reports with ease through the web-based, ribbon-like InfoAssist tool that leverages a common look and feel that can extend the number of personnel that can generate their own reports. Simplify the management of reports by significantly reducing the number of report definitions that are required through the use of parameter driven reports. Deliver data to users in many different formats, including directly into spreadsheets, or in boardroom-quality PDF format, or viewed from the browser in HTML. Leverage advanced reporting functions, such as matrix reporting, ranking, color coding, drill-down, and font customization to enhance the visualization of DB2 data. DB2 Web Query offers features to import Query/400 definitions and enhance their look and functions. By using it, you can add OLAP-like slicing and dicing to the reports or view reports in disconnected mode for users on the go. This IBM Redbooks® publication provides a broad understanding of what can be done with the DB2 Web Query product. This publication is a companion of DB2 Web Query Tutorials, SG24-8378, which has a group of self-explanatory tutorials to help you get up to speed quickly.

Learning Apache Cassandra - Second Edition

2017-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sandeep Yarabarla , Graham Doman

Cassandra NoSQL RDBMS data data-engineering nosql-databases

Learning Apache Cassandra is an engaging and in-depth guide to understanding the concepts and practical applications of Apache Cassandra, one of the most robust distributed NoSQL databases available. By the end of this book, you will have the necessary skills to design and manage scalable, high-performance database solutions tailored for modern applications. What this Book will help me do Set up Apache Cassandra and its multi-node clusters confidently and efficiently. Master schema design principles, including the use of composite keys, collections, and user-defined types. Implement efficient query strategies with secondary indexes and materialized views. Understand data distribution strategies and tune consistency levels for different application requirements. Dive into advanced topics like user-defined functions, batch operations, and Java client optimizations for scalable database architecture. Author(s) None Yarabarla brings practical expertise and deep knowledge to the subject of Apache Cassandra. With hands-on industry experience designing scalable database solutions, the author ensures complex topics are presented through clear and actionable insights. This is coupled with real-world scenarios to help you apply your learning effectively. Who is it for? This book is ideal for developers and IT professionals interested in learning Apache Cassandra from scratch or enhancing their NoSQL database expertise. It is particularly suited for those transitioning from relational databases to NoSQL systems. Even without prior coding experience, readers can expect to follow along and achieve practical results.

MQTT Essentials - A Lightweight IoT Protocol

2017-04-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Gastón C. Hillar

IoT JavaScript Python data data-engineering rabbitmq streaming-messaging

Dive into the world of MQTT, the preferred protocol for IoT and M2M communication. This book provides a comprehensive guide to understanding, implementing, and securing MQTT-based systems, enabling readers to create efficient and lightweight communication networks for their connected devices. What this Book will help me do Understand the underlying principles and protocol structure of MQTT. Securely configure and deploy an MQTT broker for communication. Develop Python, Java, and JavaScript-based MQTT client applications. Utilize MQTT for real-world IoT use cases such as sensor data interchange. Optimize MQTT usage for low-latency and lightweight communication scenarios. Author(s) Gastón C. Hillar is an experienced IoT developer and author with a deep understanding of IoT protocols and technologies. With years of practical experience in designing and deploying secure IoT systems, Gastón specializes in breaking down complex topics into digestible and actionable insights. Through his books, he aims to empower developers to effectively integrate IoT technologies into their work. Who is it for? The book is tailored for software developers and engineers who are looking to integrate MQTT into their IoT solutions. It's ideal for individuals with pre-existing knowledge in IoT concepts who want to deepen their understanding of MQTT. Readers seeking to secure, optimize, and utilize MQTT for communication and automation tasks will find it especially useful. It's a perfect fit for those working with Python, Java, and web technologies in IoT contexts.

Sams Teach Yourself Hadoop in 24 Hours

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeffrey Aven

API Big Data Cloud Computing Hadoop HDFS Hive Spark data data-engineering

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Elasticsearch 5.x Cookbook - Third Edition

2017-02-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alberto Paro

Analytics Big Data ELK JSON data data-engineering elasticsearch search

Elasticsearch 5.x Cookbook is a comprehensive guide that teaches you how to leverage the full power of Elasticsearch for high-performance search and analytics. Through step-by-step recipes, you'll explore deployment, query building, plugin integration, and advanced analytics, ensuring you can manage and scale Elasticsearch like a pro. What this Book will help me do Understand and deploy complex Elasticsearch cluster topologies for optimal performance. Create tailored mappings to gain finer control over data indexing and retrieval. Design and execute advanced queries and analytics using Elasticsearch capabilities. Integrate Elasticsearch with popular programming languages and big data platforms. Monitor and improve Elasticsearch cluster health using the best practices and tools. Author(s) Alberto Paro is a seasoned software engineer and data scientist with extensive experience in distributed systems and search technologies. Having worked on numerous search-related projects, he brings practical, real-world insights to his writing. Alberto is passionate about teaching and simplifying complex concepts, making this book both approachable and expertly detailed. Who is it for? This book is ideal for developers or data engineers seeking to utilize Elasticsearch for advanced search and analytics tasks. If you have some prior knowledge of JSON and programming concepts, particularly Java, you will benefit most from this material. Whether you're looking to integrate Elasticsearch into your systems or to optimize its usage, this book caters to your needs.

Beginning Hibernate: For Hibernate 5

2016-11-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dave Minter , Joseph Ottinger , Jeff Linwood

Big Data MongoDB NoSQL data data-engineering database-management-tools hibernate object-relational-mapping

Get started with the Hibernate 5 persistence layer and gain a clear introduction to the current standard for object-relational persistence in Java. This updated edition includes the new Hibernate 5.0 framework as well as coverage of NoSQL, MongoDB, and other related technologies, ranging from applications to big data. Beginning Hibernate is ideal if you're experienced in Java with databases (the traditional, or connected, approach), but new to open-source, lightweight Hibernate. The book keeps its focus on Hibernate without wasting time on nonessential third-party tools, so you'll be able to immediately start building transaction-based engines and applications. Experienced authors Joseph Ottinger with Dave Minter and Jeff Linwood provide more in-depth examples than any other book for Hibernate beginners. They present their material in a lively, example-based manner—not a dry, theoretical, hard-to-read fashion. What You'll Learn Build enterprise Java-based transaction-type applications that access complex data with Hibernate Work with Hibernate 5 using a present-day build process Use Java 8 features with Hibernate Integrate into the persistence life cycle Map using Java's annotations Search and query with the new version of Hibernate Integrate with MongoDB using NoSQL Keep track of versioned data with Hibernate Envers Who This Book Is For Experienced Java developers interested in learning how to use and apply object-relational persistence in Java and who are new to the Hibernate persistence framework.

Spark in Action

2016-11-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marko Bonaci , Petar Zecevic

AI/ML Analytics API Big Data DevOps Docker Python Scala Spark SQL Data Streaming Virtual Machine +3 more

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0. About the Technology Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades. About the Book Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code. What's Inside Updated for Spark 2.0 Real-life case studies Spark DevOps with Docker Examples in Scala, and online in Java and Python About the Reader Written for experienced programmers with some background in big data or machine learning. About the Authors Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community. Quotes Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide. - Jonathan Sharley, Pandora Media Must-have! Speed up your learning of Spark as a distributed computing framework. - Robert Ormandi, Yahoo! An easy-to-follow, step-by-step guide. - Gaurav Bhardwaj, 3Pillar Global An ambitiously comprehensive overview of Spark and its diverse ecosystem. - Jonathan Miller, Optensity

Learning IBM Bluemix

2016-10-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sreelatha Sankaranarayanan

Cloud Computing IBM JavaScript Cyber Security data data-engineering

Learning IBM Bluemix provides a comprehensive introduction to developing and deploying applications with the IBM Bluemix cloud platform. By following detailed examples and guided exercises, you'll understand the full life cycle of cloud-based application development, from initial setup to scaling and security. What this Book will help me do Understand the capabilities of IBM Bluemix as a Platform as a Service to build applications efficiently. Learn to develop and deploy applications using Cloud Foundry command line and Bluemix console. Explore microservices architecture and build scalable applications using Bluemix tools. Integrate on-premises systems with cloud-hosted applications on Bluemix. Develop mobile client applications with the support of Bluemix's Mobile services. Author(s) Sreelatha Sankaranarayanan is an experienced developer and cloud technology author, with extensive expertise in IBM Bluemix. Her passion for simplifying complex concepts is reflected in her engaging writing style, ensuring learners can master new skills effectively. She brings years of real-world experience in cloud computing and software development to her instructional materials. Who is it for? This book is tailored for developers aiming to transition to cloud-based application development using IBM Bluemix, with a focus on practical application. Readers should have foundational skills in Java and Node.js to fully benefit. Ideal for professionals looking to expand their capabilities with cloud infrastructure, or for those wanting to leverage microservices and cloud solutions in their applications.

Fast Data Processing with Spark 2 - Third Edition

2016-10-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Holden Karau (Fight Health Insurance) , Krishna Sankar

AI/ML Analytics API Big Data Cloud Computing Data Analytics Data Engineering Scala Spark apache-spark data data-engineering

Fast Data Processing with Spark 2 takes you through the essentials of leveraging Spark for big data analysis. You will learn how to install and set up Spark, handle data using its APIs, and apply advanced functionality like machine learning and graph processing. By the end of the book, you will be well-equipped to use Spark in real-world data processing tasks. What this Book will help me do Install and configure Apache Spark for optimal performance. Interact with distributed datasets using the resilient distributed dataset (RDD) API. Leverage the flexibility of DataFrame API for efficient big data analytics. Apply machine learning models using Spark MLlib to solve complex problems. Perform graph analysis using GraphX to uncover structural insights in data. Author(s) Krishna Sankar is an experienced data scientist and thought leader in big data technologies. With a deep understanding of machine learning, distributed systems, and Apache Spark, Krishna has guided numerous projects in data engineering and big data processing. Matei Zaharia, the co-author, is also widely recognized in the field of distributed systems and cloud computing, contributing to Apache Spark development. Who is it for? This book is catered to software developers and data engineers with a foundational understanding of Scala or Java programming. Beginner to medium-level understanding of big data processing concepts is recommended for readers. If you are aspiring to solve big data problems using scalable distributed computing frameworks, this book is perfect for you. By the end, you will be confident in building Spark-powered applications and analyzing data efficiently.

Hadoop Blueprints

2016-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anurag Shrivastava , Sudheesh Narayan , Tanmay Deshpande

Big Data Hadoop Marketing Spark data data-engineering

"Hadoop Blueprints" guides you through using Hadoop and its ecosystem to solve real-life business problems. You will explore six case studies covering areas like fraud detection, marketing analysis, and data lakes, providing a thorough and practical understanding of Hadoop applications. What this Book will help me do Understand how to use Hadoop to solve real-life business scenarios effectively. Learn to build a 360-degree customer view integrating different data types. Develop and deploy a fraud detection system leveraging Hadoop technologies. Explore marketing campaign analysis and improvement using data-driven workflows on Hadoop. Gain hands-on experience with creating and maintaining efficient data lakes. Author(s) Sudheesh Narayan, along with his co-authors Anurag Shrivastava and Nod Deshpande, brings extensive experience in Big Data technologies. They have been involved in developing solutions utilizing Hadoop, Apache Spark, and other ecosystem components. Their practical approach to presenting complex technical topics ensures readers can apply their knowledge to real-world scenarios. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who have a foundational understanding of Hadoop and seek to expand their practical skills. Readers should be familiar with Java or other scripting languages. It's perfect for those aiming to build actionable solutions for business problems using Big Data technologies.

talk-data.com

Activity Trend

Top Events

Top Speakers

Pro MySQL NDB Cluster

Practical Real-time Data Processing and Analytics

Oracle ADF Survival Guide: Mastering the Application Development Framework

Building Data Streaming Applications with Apache Kafka

Mastering Apache Storm

Apache Spark 2.x for Java Developers

JSON at Work

Understanding Message Brokers

Advanced Analytics with Spark, 2nd Edition

Data Lake for Enterprises

IBM DB2 Web Query for i: The Nuts and Bolts

Learning Apache Cassandra - Second Edition

MQTT Essentials - A Lightweight IoT Protocol

Sams Teach Yourself Hadoop in 24 Hours

Elasticsearch 5.x Cookbook - Third Edition

Beginning Hibernate: For Hibernate 5

Spark in Action

Learning IBM Bluemix

Fast Data Processing with Spark 2 - Third Edition

Hadoop Blueprints