O'Reilly Data Engineering Books

SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

2019-10-18 O'Reilly Amazon

book

Bob Ward

data data-engineering relational-databases microsoft-sql-server AI/ML Analytics

Get up to speed on the game-changing developments in SQL Server 2019. No longer just a database engine, SQL Server 2019 is cutting edge with support for machine learning (ML), big data analytics, Linux, containers, Kubernetes, Java, and data virtualization to Azure. This is not a book on traditional database administration for SQL Server. It focuses on all that is new for one of the most successful modernized data platforms in the industry. It is a book for data professionals who already know the fundamentals of SQL Server and want to up their game by building their skills in some of the hottest new areas in technology. SQL Server 2019 Revealed begins with a look at the project's team goal to integrate the world of big data with SQL Server into a major product release. The book then dives into the details of key new capabilities in SQL Server 2019 using a “learn by example” approach for Intelligent Performance, security, mission-criticalavailability, and features for the modern developer. Also covered are enhancements to SQL Server 2019 for Linux and gain a comprehensive look at SQL Server using containers and Kubernetes clusters. The book concludes by showing you how to virtualize your data access with Polybase to Oracle, MongoDB, Hadoop, and Azure, allowing you to reduce the need for expensive extract, transform, and load (ETL) applications. You will then learn how to take your knowledge of containers, Kubernetes, and Polybase to build a comprehensive solution called Big Data Clusters, which is a marquee feature of 2019. You will also learn how to gain access to Spark, SQL Server, and HDFS to build intelligence over your own data lake and deploy end-to-end machine learning applications. What You Will Learn Implement Big Data Clusters with SQL Server, Spark, and HDFS Create a Data Hub with connections to Oracle, Azure, Hadoop, and other sources Combine SQL and Spark to build a machine learning platform for AI applications Boost your performance with no application changes using Intelligent Performance Increase security of your SQL Server through Secure Enclaves and Data Classification Maximize database uptime through online indexing and Accelerated Database Recovery Build new modern applications with Graph, ML Services, and T-SQL Extensibility with Java Improve your ability to deploy SQL Server on Linux Gain in-depth knowledge to run SQL Server with containers and Kubernetes Know all the new database engine features for performance, usability, and diagnostics Use the latest tools and methods to migrate your database to SQL Server 2019 Apply your knowledge of SQL Server 2019 to Azure Who This Book Is For IT professionals and developers who understand the fundamentals of SQL Server and wish to focus on learning about the new, modern capabilities of SQL Server 2019. The book is for those who want to learn about SQL Server 2019 and the new Big Data Clusters and AI feature set, support for machine learning and Java, how to run SQL Server with containers and Kubernetes, and increased capabilities around Intelligent Performance, advanced security, and high availability.

Deep Learning for Search

2019-06-13 O'Reilly Amazon

book

Tommaso Teofili

data data-engineering search AI/ML Data Science Java

Deep Learning for Search teaches you how to improve the effectiveness of your search by implementing neural network-based techniques. By the time you're finished with the book, you'll be ready to build amazing search engines that deliver the results your users need and that get better as time goes on! About the Technology Deep learning handles the toughest search challenges, including imprecise search terms, badly indexed data, and retrieving images with minimal metadata. And with modern tools like DL4J and TensorFlow, you can apply powerful DL techniques without a deep background in data science or natural language processing (NLP). This book will show you how. About the Book Deep Learning for Search teaches you to improve your search results with neural networks. You’ll review how DL relates to search basics like indexing and ranking. Then, you’ll walk through in-depth examples to upgrade your search with DL techniques using Apache Lucene and Deeplearning4j. As the book progresses, you’ll explore advanced topics like searching through images, translating user queries, and designing search engines that improve as they learn! What's Inside Accurate and relevant rankings Searching across languages Content-based image search Search with recommendations About the Reader For developers comfortable with Java or a similar language and search basics. No experience with deep learning or NLP needed. About the Author Tommaso Teofili is a software engineer with a passion for open source and machine learning. As a member of the Apache Software Foundation, he contributes to a number of open source projects, ranging from topics like information retrieval (such as Lucene and Solr) to natural language processing and machine translation (including OpenNLP, Joshua, and UIMA). He currently works at Adobe, developing search and indexing infrastructure components, and researching the areas of natural language processing, information retrieval, and deep learning. He has presented search and machine learning talks at conferences including BerlinBuzzwords, International Conference on Computational Science, ApacheCon, EclipseCon, and others. You can find him on Twitter at @tteofili. Quotes A practical approach that shows you the state of the art in using neural networks, AI, and deep learning in the development of search engines. - From the Foreword by Chris Mattmann, NASA JPL A thorough and thoughtful synthesis of traditional search and the latest advancements in deep learning. - Greg Zanotti, Marquette Partners A well-laid-out deep dive into the latest technologies that will take your search engine to the next level. - Andrew Wyllie, Thynk Health Hands-on exercises teach you how to master deep learning for search-based products. - Antonio Magnaghi, System1

Mastering Hadoop 3

2019-02-28 O'Reilly Amazon

book

Timothy Wong , Chanchal Singh , Manish Kumar

data data-engineering Hadoop Flink Big Data Data Engineering

"Mastering Hadoop 3" is your in-depth guide to understanding and mastering the advanced features of the Hadoop ecosystem. With a focus on distributed computing and data processing, this book covers essential tools such as YARN, MapReduce, and Apache Spark to help you build scalable, efficient data pipelines. What this Book will help me do Gain a comprehensive understanding of Hadoop Distributed File System (HDFS) and YARN for effective resource management. Master data processing with MapReduce and learn to integrate with real-time processing engines like Spark and Flink. Develop and secure enterprise-grade Hadoop-based data pipelines by implementing robust security and governance measures. Explore techniques for batch data processing, data modeling, and designing applications tailored for Hadoop environments. Understand best practices for optimizing and troubleshooting Hadoop clusters for enhanced performance and reliability. Author(s) The authors, including None Wong, None Singh, and None Kumar, bring together years of experience in big data engineering, distributed systems, and enterprise application development. They aim to provide a clear pathway to mastering Hadoop ecosystem tools. Who is it for? This book is ideal for budding big data professionals who have some familiarity with Java and basic Hadoop concepts and wish to elevate their expertise. If you're a Hadoop career practitioner keen to expand your understanding of the ecosystem's advanced capabilities or a professional looking to implement Hadoop in organizational workflows, this book is well-suited for you.

Apache Spark Quick Start Guide

2019-01-31 O'Reilly Amazon

book

Akash Grade , Shrey Mehrotra

data data-engineering apache-spark AI/ML API Big Data

Dive into the world of scalable data processing with the "Apache Spark Quick Start Guide." This book offers a foundational introduction to Spark, empowering readers to harness its capabilities for big data processing. With clear explanations and hands-on examples, you'll learn to implement Spark applications that handle complex data tasks efficiently. What this Book will help me do Understand and implement Spark's RDDs and DataFrame APIs to process large datasets effectively. Set up a local development environment for Spark-based projects. Develop skills to debug and optimize slow-performing Spark applications. Harness built-in modules of Spark for SQL, streaming, and machine learning applications. Adopt best practices and optimization techniques for high-performance Spark applications. Author(s) Shrey Mehrotra is a seasoned software developer with expertise in big data technologies, particularly Apache Spark. With years of hands-on industry experience, Shrey focuses on making complex technical concepts accessible to all. Through his writing, he aims to share clear, practical guidance for developers of all levels. Who is it for? This guide is perfect for big data enthusiasts and professionals looking to learn Apache Spark's capabilities from scratch. It's aimed at data engineers interested in optimizing application performance and data scientists wanting to integrate machine learning with Spark. A basic familiarity with either Scala, Python, or Java is recommended.

Java XML and JSON: Document Processing for Java SE

2019-01-10 O'Reilly Amazon

book

Jeff Friesen

data data-engineering storage-formats XML API Java

Use this guide to master the XML metalanguage and JSON data format along with significant Java APIs for parsing and creating XML and JSON documents from the Java language. New in this edition is coverage of Jackson (a JSON processor for Java) and Oracle’s own Java API for JSON processing (JSON-P), which is a JSON processing API for Java EE that also can be used with Java SE. This new edition of Java XML and JSON also expands coverage of DOM and XSLT to include additional API content and useful examples. All examples in this book have been tested under Java 11. In some cases, source code has been simplified to use Java 11’s var language feature. The first six chapters focus on XML along with the SAX, DOM, StAX, XPath, and XSLT APIs. The remaining six chapters focus on JSON along with the mJson, GSON, JsonPath, Jackson, and JSON-P APIs. Each chapter ends with select exercises designed to challenge your grasp of the chapter's content.An appendix provides the answers to these exercises. What You'll Learn Master the XML language Create, validate, parse, and transform XML documents Apply Java’s SAX, DOM, StAX, XPath, and XSLT APIs Master the JSON format for serializing and transmitting data Code against third-party APIs such as Jackson, mJson, Gson, JsonPath Master Oracle’s JSON-P API in a Java SE context Who This Book Is For Intermediate and advanced Java programmers who are developing applications that must access data stored in XML or JSON documents. The book also targets developers wanting to understand the XML language and JSON data format.

Apache Kafka Quick Start Guide

2018-12-27 O'Reilly Amazon

book

Raúl Estrada

data data-engineering streaming-messaging Kafka Java Data Streaming

Dive into the world of Apache Kafka with this concise guide that focuses on its practical use for real-time data processing in distributed systems. You'll explore Kafka's capabilities, covering essentials like configuration, messaging, serialization, and handling complex data streams using Kafka Streams and KSQL. By the end, you'll be equipped to tackle real-world streaming challenges confidently. What this Book will help me do Understand how to set up and configure Apache Kafka for real-time processing environments. Master key concepts like message validation, enrichment, and serialization. Learn to use the Schema Registry for data validation and versioning. Gain hands-on experience with data streaming and aggregation using Kafka Streams. Develop skills in using KSQL for data manipulation and stream querying. Author(s) None Estrada is an experienced software engineer with a deep understanding of distributed systems and real-time data processing. With expertise in Apache Kafka and other event-streaming platforms, Estrada approaches technical writing with an emphasis on clarity and practical application. Their passion for helping developers achieve success is reflected in their authoritative yet approachable style. Who is it for? This book is perfect for software engineers and backend developers interested in mastering real-time data processing using Apache Kafka. It is designed for readers who are eager to solve practical problems in distributed systems, irrespective of whether they have prior Kafka experience. Some familiarity with Java or other JVM languages will be helpful, although not strictly necessary. This is an ideal resource for learners seeking a hands-on, practical approach to Apache Kafka.

PostgreSQL 11 Server Side Programming Quick Start Guide

2018-11-29 O'Reilly Amazon

book

Luca Ferrari

data data-engineering relational-databases postgresql Data Management Java

PostgreSQL 11 Server Side Programming Quick Start Guide introduces you to the world of database programming directly at the database level. This book delves into the concepts of server-side programming, providing you with the necessary tools to author stored procedures, triggers, and extensions for your PostgreSQL instance. What this Book will help me do Learn how to create stored procedures and functions for efficient database logic. Understand how to use triggers and rules to maintain data integrity. Gain expertise in developing extensions to extend PostgreSQL functionality. Master techniques for handling inter-process communication and background workers. Explore custom data types and integration with programming languages like Java and Perl. Author(s) None Ferrari, a seasoned database administrator and developer, specializes in delivering insightful PostgreSQL training. With extensive experience in both database management and software development, None brings practical knowledge and real-world examples to guide readers through mastering PostgreSQL server-side programming. Who is it for? This book is tailored for database administrators, developers, and engineers who have a basic understanding of PostgreSQL and are looking to expand their knowledge into server-side programming. If you're aiming to implement advanced database functionality or streamline data management tasks in PostgreSQL, this book is for you. It is ideal for those who wish to apply database programming techniques to enterprise-grade challenges. Beginner-friendly but designed to empower professionals with actionable insights.

Apache Hadoop 3 Quick Start Guide

2018-10-31 O'Reilly Amazon

book

Hrishikesh Vijay Karambelkar

data data-engineering Hadoop Analytics Big Data Data Analytics

Dive into the world of distributed data processing with the 'Apache Hadoop 3 Quick Start Guide.' This comprehensive resource equips you with the knowledge needed to handle large datasets effectively using Apache Hadoop. Learn how to set up and configure Hadoop, work with its core components, and explore its powerful ecosystem tools. What this Book will help me do Understand the fundamental concepts of Apache Hadoop, including HDFS, MapReduce, and YARN, and use them to store and process large datasets. Set up and configure Hadoop 3 in both developer and production environments to suit various deployment needs. Gain hands-on experience with Hadoop ecosystem tools like Hive, Kafka, and Spark to enhance your big data processing capabilities. Learn to manage, monitor, and troubleshoot Hadoop clusters efficiently to ensure smooth operations. Analyze real-time streaming data with tools like Apache Storm and perform advanced data analytics using Apache Spark. Author(s) The author of this guide, Vijay Karambelkar, brings years of experience working with big data technologies and Apache Hadoop in real-world applications. With a passion for teaching and simplifying complex topics, Vijay has compiled his expertise to help learners confidently approach Hadoop 3. His detailed, example-driven approach makes this book a practical resource for aspiring data professionals. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who aspire to dive into the field of big data. If you're new to Apache Hadoop or looking to upgrade your skills to include version 3, this guide is for you. A basic understanding of Java programming is recommended to make the most of the topics covered. Embark on this journey to enhance your career in data-intensive industries.

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

2018-08-16 O'Reilly Amazon

book

Hien Luu

data data-engineering apache-spark AI/ML Analytics Big Data

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it. Along the way, you’ll discover resilient distributed datasets (RDDs); use Spark SQL for structured data; and learn stream processing and build real-time applications with Spark Structured Streaming. Furthermore, you’ll learn the fundamentals of Spark ML for machine learning and much more. After you read this book, you will have the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications. What You Will Learn Understand Spark unified data processing platform Howto run Spark in Spark Shell or Databricks Use and manipulate RDDs Deal with structured data using Spark SQL through its operations and advanced functions Build real-time applications using Spark Structured Streaming Develop intelligent applications with the Spark Machine Learning library Who This Book Is For Programmers and developers active in big data, Hadoop, and Java but who are new to the Apache Spark platform.

Seven NoSQL Databases in a Week

2018-03-29 O'Reilly Amazon

book

Xun (Brian) Wu , Sudarshan Kadambi

data data-engineering nosql-databases Cassandra DynamoDB Apache HBase

Learn the fundamentals of seven essential NoSQL databases in just one week with this book. Covering MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and HBase, you'll explore their functionalities and practical applications. Designed to give you a working understanding of NoSQL database types, this guide helps aspiring DBAs and developers comprehend and utilize modern data solutions. What this Book will help me do Master the fundamentals of MongoDB, including high-performance, high-availability, and scaling features. Gain hands-on experience with Neo4j to perform database queries and integrate with Python and Java applications. Learn efficient querying with Redis for storage and retrieval tasks. Understand Cassandra's powerful solution for scalable and fault-tolerant systems. Get well-versed with HBase for creating tables, and reading and writing data efficiently. Author(s) Sudarshan Kadambi and Xun (Brian) Wu bring a wealth of experience in database technologies. They have worked extensively in the software development and database management fields. With their practical and concise teaching approach, the authors make complex topics accessible for readers. Who is it for? This book is ideal for budding DBAs and developers looking to understand NoSQL databases. It is particularly useful for those transitioning from relational databases who want to learn about modern database technologies. Suitable for both beginners and those with some database knowledge, it aims to bridge skill gaps and expand the reader's technical expertise.

Camel in Action, Second Edition

2018-02-25 O'Reilly Amazon

book

Jonathan Anstey , Claus Ibsen

data data-engineering streaming-messaging camel Cloud Computing Docker

Camel in Action, Second Edition is the most complete Camel book on the market. Written by core developers of Camel and the authors of the highly acclaimed first edition, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. About the Technology Apache Camel is a Java framework that implements enterprise integration patterns (EIPs) and comes with over 200 adapters to third-party systems. A concise DSL lets you build integration logic into your app with just a few lines of Java or XML. By using Camel, you benefit from the testing and experience of a large and vibrant open source community. About the Book Camel in Action, Second Edition is the definitive guide to the Camel framework. It starts with core concepts like sending, receiving, routing, and transforming data. It then goes in depth on many topics such as how to develop, debug, test, deal with errors, secure, scale, cluster, deploy, and monitor your Camel applications. The book also discusses how to run Camel with microservices, reactive systems, containers, and in the cloud. What's Inside Coverage of all relevant EIPs Camel microservices with Spring Boot Camel on Docker and Kubernetes Error handling, testing, security, clustering, monitoring, and deployment Hundreds of examples in Java and XML About the Reader Readers should be familiar with Java. This book is accessible to beginners and invaluable to experts. About the Authors Claus Ibsen is a senior principal engineer working for Red Hat specializing in cloud and integration. He has worked on Apache Camel for the last nine years where he heads the project. Claus lives in Denmark. Jonathan Anstey is an engineering manager at Red Hat and a core Camel contributor. He lives in Newfoundland, Canada. Quotes I highly recommend this book to anyone with even a passing interest in Apache Camel. Do take Camel for a ride...and don't get the hump! - From the Foreword by James Strachan, Creator of Apache Camel Claus and Jon are great writers, relying on figures and diagrams where needed and presenting lots of code snippets and worked examples. - From the Foreword by Dr. Mark Little, Technical Director of JBoss The second edition of this all-time classic is an indispensable companion for your Apache Camel rides. - Gregor Zurowski, Apache Camel Committer The absolute best way to learn and use Camel - top to bottom, front to back, and all the way through. Camel is a fantastic tool - every Java coder should have a copy of this book. - Rick Wagner, Red Hat An excellent book and the definite reference for experienced engineers. - Yan Guo, EventBrite

Mastering Apache Solr 7.x

2018-02-22 O'Reilly Amazon

book

Chintan Mehta , Sandeep Nair , Dharmesh Vasoya

data data-engineering search solr API BI

"Mastering Apache Solr 7.x" is your practical guide to building, advancing, and optimizing enterprise search solutions using Solr 7. With this book, you will harness the robust features of Solr, implement efficient search capabilities, and tackle complex business intelligence problems to achieve unparalleled search performance. What this Book will help me do Develop and implement efficient schemas using the Solr Schema API. Optimize enterprise search performance with advanced querying and scoring techniques. Implement fault-tolerant and distributed search systems using SolrCloud. Leverage Apache Tika for seamless data indexing and content extraction. Utilize programming languages like JavaScript, Python, and Ruby to integrate with Solr. Author(s) With years of experience in search technologies and deep expertise in Apache Solr, authors None Nair, None Mehta, and Dharmesh Vasoya bring together a wealth of knowledge in this book. Their collaborative insights equip readers to master advanced Solr features, sharing practical examples and real-world applications with a passion for clarity and efficiency. Who is it for? This book is ideal for software developers, data engineers, and database architects who aim to design and implement effective enterprise search systems. It is tailored for readers with prior experience in Apache Solr or Java programming, focusing on those eager to enhance their search solution expertise. Achieve your advanced search system goals here.

Liberty in IBM CICS: Deploying and Managing Java EE Applications

2018-01-18 O'Reilly Amazon

book

Mitch Johnson , Phil Wakelin , Jonathan Lawrence , Tito Paiva , Carlos Donatucci , Michael Jones

data data-engineering IBM API CI/CD GitHub

Abstract This IBM® Redbooks® publication is intended for IBM CICS® system programmers and IBM Z architects. It describes how to deploy and manage Java EE 7 web-based applications in an IBM CICS Liberty JVM server and access data on IBM Db2® for IBM z/OS® and IBM MQ for z/OS sub systems. In this book, we describe the key steps to create and install a Liberty JVM server within a CICS region. We then describe how to best use the different deployment techniques for Java EE applications and the specific considerations when deploying applications that use JDBC, JMS, and the new CICS link to Liberty API. Finally, we describe how to secure web applications in CICS Liberty, including transport-level security and request authentication and authorization by using IBM RACF® and LDAP registries. Information is also provided about how to build a high availability infrastructure and how to use the logging and monitoring functions that are available in the CICS Liberty environment. This book is based on IBM CICS Transaction Server (CICS TS) V5.4 that uses the embedded IBM WebSphere® Application Server Liberty technology. It is also applicable to CICS TS V5.3 with the fixes for the continuous delivery APAR PI77502 applied. Sample applications are used throughout this publication and are freely available for download from the IBM CICSDev GitHub organization along with detailed deployment instructions.

Scaling Data Services with Pivotal GemFire

2018-01-15 O'Reilly Amazon

book

Mike Stolz

data data-engineering Cloud Computing Java Virtual Machine

In-memory data grids (IMDG) such as Pivotal GemFire, which is powered by Apache Geode, are key to making today’s modern high-speed, data-intensive applications work. By keeping data in the RAM of a horizontally scalable cluster of servers, IMDG solutions enable apps to achieve consistently low latency for data access at any scale. Many in the application development community, however, aren’t aware of IMDG’s benefits, use cases, or underlying technology. This report brings you up to speed by providing GemFire basics, including use cases and easily understood examples. You’ll determine whether GemFire can benefit your application, and learn how to install a simple test environment and build a small proof of concept. Explore GemFire use cases for Java applications—including microservices, high-speed data ingest, and transaction and event processing Get an architectural overview of GemFire, and learn installation requirements for both hardware/VM and cloud Dive into GemFire’s capabilities with continuous queries, server-side functions, and Apache Lucene integration Learn how GemFire works with the persistence model, off-heap memory, and WAN replication

Apache Kafka 1.0 Cookbook

2017-12-22 O'Reilly Amazon

book

Raúl Estrada

data data-engineering streaming-messaging Kafka ELK Hadoop

Dive into the essential resource for mastering Apache Kafka with this cookbook of practical recipes. You'll explore the dynamic features of Kafka 1.0, integrate it with enterprise data solutions, and confidently manage messaging and streaming data in real-time. What this Book will help me do Effectively install and configure Apache Kafka in a professional environment. Implement Kafka producers and consumers to manage real-time data streams. Utilize Confluent platforms and Kafka streams for advanced data processing. Monitor Kafka clusters with tools like Graphite and Ganglia for optimal performance. Integrate Kafka seamlessly with tools such as Hadoop, Spark, and Elasticsearch. Author(s) None Estrada and None Zinoviev have extensive experience in enterprise data systems and have been dedicated contributors to the Apache Kafka ecosystem. Their combined expertise encompasses developing robust, real-time distributed systems and delivering insightful technical guidance. Through this book, they share their vast knowledge and practical solutions, tailored for both developers and administrators. Who is it for? This book is tailored for developers and administrators looking to enhance their expertise in Apache Kafka. Developers should be comfortable with Java or Scala to fully utilize examples, while administrators benefit from prior knowledge of Kafka operations. Ideal readers are those seeking actionable techniques to efficiently manage and integrate Kafka into their enterprise systems.

Pro MySQL NDB Cluster

2017-11-03 O'Reilly Amazon

book

Jesper Wisborg Krogh , Mikiya Okuno

data data-engineering relational-databases MySQL API Java

Create and run a real-time, highly-available, and high-redundancy version of the world's most popular open-source database, MySQL. You will understand the advantages and disadvantages of the MySQL NDB Cluster solution, and when MySQL NDB Cluster is the right choice. Pro MySQL NDB Cluster walks you through the full lifecycle of a MySQL Cluster installation: starting with the installation and initial configuration, moving through online configuration and schema changes, and completing with online upgrades. Along the way, you will learn to monitor your cluster, make decisions about schema design, implement geographic replication, troubleshoot and optimize performance, and much more. This book covers the many programming APIs that are supported by MySQL NDB Cluster. There's also robust coverage of connecting to MySQL NDB Cluster from Java, SQL, memcached, and even from C++. From any of these languages, you'll be able to connect and store and retrieve data as your applications demand. The book: Covers MySQL NDB Cluster concepts and architecture Takes you through the MySQL NDB Cluster lifecycle from installation to upgrades Guides you through DBA and Developer decisions when working with MySQL NDB Cluster What You'll Learn Understand the shared-nothing architecture behind MySQL NDB Cluster Plan, install, and configure a MySQL NDB Cluster environment Perform everyday tasks such as backing up, restoring, and upgrading Develop applications from Java, memcached, C++, and SQL Troubleshoot and resolve application performance problems Master enterprise-level features such the MySQL NDB Cluster Manager Who This Book Is For Database administrators and developers who are looking into deploying MySQL NDB Cluster, or who already have a cluster in production and want to increase their knowledge and ability to handle routine administrative tasks and troubleshooting. The book also is for those developers wanting to employ MySQL NDB Cluster as their chosen storage engine from Java, memcached, and C++ applications.

Practical Real-time Data Processing and Analytics

2017-09-28 O'Reilly Amazon

book

Prateek Bhati , Selva raj Ramasamy , Shilpi Saxena , Saurabh Gupta

data data-engineering streaming-messaging real-time-analytics Analytics Flink

This book provides a comprehensive guide to real-time data processing and analytics using modern frameworks like Apache Spark, Flink, Storm, and Kafka. Through practical examples and in-depth explanations, you will learn how to implement efficient, scalable, real-time processing pipelines. What this Book will help me do Understand real-time data processing essentials and the technology stack Learn integration of components like Apache Spark and Kafka Master the concepts of stream processing with detailed case studies Gain expertise in developing monitoring and alerting solutions for real-time systems Prepare to implement production-grade real-time data solutions Author(s) Shilpi Saxena and Saurabh Gupta, the authors, are experienced professionals in distributed systems and data engineering, focusing on practical applications of real-time computing. They bring their extensive industry experience to this book, helping readers understand the complexities of real-time data solutions in an approachable and hands-on manner. Who is it for? This book is ideal for software engineers and data engineers with a background in Java who seek to develop real-time data solutions. It is suitable for readers familiar with concepts of real-time data processing, and enhances knowledge in frameworks like Spark, Flink, Storm, and Kafka. Target audience includes learners building production data solutions and those designing distributed analytics engines.

Oracle ADF Survival Guide: Mastering the Application Development Framework

2017-09-04 O'Reilly Amazon

book

Sten Vesterli

data data-engineering oracle-database-solutions ADF Java Oracle

Quickly get up to speed with Oracle's Application Development Framework (ADF). Rapidly build modern, user-friendly applications that will be easy to re-use, expand, and maintain. Oracle ADF Survival Guide covers the latest 12c version and explains all the important concepts and parts, including ADF Faces, ADF Task Flows, ADF Business Components, ADF Skins, the new Alta UI, and how to implement business logic in all layers of the application. Organizations with existing investments in Oracle database and Oracle Forms applications will be able to leverage Oracle's best practice for application development in moving those applications to the ADF framework. The book: Explains all parts of the ADF stack Shows how to integrate with databases and web services Demonstrates the best practice for ADF enterprise architecture What You Will Learn Rapidly build great-looking, user-friendly screens Build page flows visually for improved communication with business users Easily connect your user interface to databases and other back-end systems Leverage the best practice for productive team development Establish a solid enterprise architecture for maximum reuse and maintainability Automate your build and deployment process Who This Book Is For Experienced developers who want to rapidly become productive with Oracle's Application Development Framework (ADF) 12c. It is for Oracle Forms and database developers working for organizations who have followed Oracle’s strategic direction to ADF, as well as for experienced Java developers who want to learn Oracle’s highly-productive, JSF framework.

Building Data Streaming Applications with Apache Kafka

2017-08-18 O'Reilly Amazon

book

Manisha Sethi , Anshul Joshi , Chanchal Singh , Manish Kumar

data data-engineering streaming-messaging Kafka Data Engineering Java

Learn how to design and build efficient real-time streaming applications using Apache Kafka, a leading distributed streaming platform. This book provides comprehensive guidance on setting up Kafka clusters, developing producers and consumers, and integrating with frameworks like Spark, Storm, and Heron. By the end, you'll master the skills needed to create enterprise-grade data streaming solutions. What this Book will help me do Grasp the core concepts and components of Apache Kafka and its ecosystem. Develop robust Kafka producers and consumers to process real-time data streams. Design and implement streaming applications using Spark, Storm, and Heron. Plan Kafka deployments with a focus on scalability, capacity, and fault tolerance. Ensure secure data streaming with best practices for securing Apache Kafka. Author(s) The authors, None Singh and None Kumar, bring years of expertise in data engineering and distributed systems. Having worked extensively with streaming technologies like Apache Kafka, they aim to share their in-depth knowledge through practical examples and real-world scenarios. Their approach to teaching focuses on making complex concepts easily understandable. Who is it for? This book is ideal for software developers and data engineers who are eager to learn Apache Kafka for building streaming applications. Some experience with programming, particularly Java, will help readers get the most out of the material. If you are working on data-processing systems or looking to enhance your skills in real-time data handling, this book caters to your needs.

Mastering Apache Storm

2017-08-16 O'Reilly Amazon

book

Ankit Jain

data data-engineering streaming-messaging storm Big Data Hadoop

Mastering Apache Storm is your step-by-step guide to mastering real-time data streaming with this robust framework. You'll learn how to process big data efficiently and integrate Apache Storm with popular technologies like Kafka, HBase, and Redis to maximize its potential. This book walks you through from basic concepts to advanced implementations of Apache Storm in real-world scenarios. What this Book will help me do Understand the core features and operation of Apache Storm for real-time data streaming. Integrate Apache Storm with other Big Data frameworks like Kafka, HBase, Redis, and Hadoop. Effectively deploy and manage multi-node Apache Storm clusters in real-world environments. Monitor and analyze your data streams and system health effectively using built-in and external tools. Learn to implement fault-tolerant, scalable, and distributed stream processing applications in Apache Storm. Author(s) None Jain is an experienced software developer and technical instructor specializing in distributed systems and real-time data processing. With years of experience working with Apache Storm and related technologies, their teachings focus on practical, hands-on learning to equip readers with actionable skills. Who is it for? This book is ideal for Java developers aspiring to build expertise in real-time data streaming and distributed processing applications using Apache Storm. Beginners can start with the fundamentals provided, while those with prior knowledge can delve into intermediate and advanced implementations.

Apache Spark 2.x for Java Developers

2017-07-26 O'Reilly Amazon

book

Sourav Gulati , Sumit Kumar

data data-engineering apache-spark AI/ML Analytics API

Delve into mastering big data processing with 'Apache Spark 2.x for Java Developers.' This book provides a practical guide to implementing Apache Spark using the Java APIs, offering a unique opportunity for Java developers to leverage Spark's powerful framework without transitioning to Scala. What this Book will help me do Learn how to process data from formats like XML, JSON, CSV using Spark Core. Implement real-time analytics using Spark Streaming and third-party tools like Kafka. Understand data querying with Spark SQL and master SQL schema processing. Apply machine learning techniques with Spark MLlib to real-world scenarios. Explore graph processing and analytics using Spark GraphX. Author(s) None Kumar and None Gulati, experienced professionals in Java development and big data, bring their wealth of practical experience and passion for teaching to this book. With a clear and concise writing style, they aim to simplify Spark for Java developers, making big data approachable. Who is it for? This book is perfect for Java developers who are eager to expand their skillset into big data processing with Apache Spark. Whether you are a seasoned Spark user or first diving into big data concepts, this book meets you at your level. With practical examples and straightforward explanations, you can unlock the potential of Spark in real-world scenarios.

JSON at Work

2017-07-03 O'Reilly Amazon

book

Tom Marrs

data data-engineering storage-formats JSON API Java

JSON is becoming the backbone for meaningful data interchange over the internet. This format is now supported by an entire ecosystem of standards, tools, and technologies for building truly elegant, useful, and efficient applications. With this hands-on guide, author and architect Tom Marrs shows you how to build enterprise-class applications and services by leveraging JSON tooling and message/document design. JSON at Work provides application architects and developers with guidelines, best practices, and use cases, along with lots of real-world examples and code samples. You’ll start with a comprehensive JSON overview, explore the JSON ecosystem, and then dive into JSON’s use in the enterprise. Get acquainted with JSON basics and learn how to model JSON data Learn how to use JSON with Node.js, Ruby on Rails, and Java Structure JSON documents with JSON Schema to design and test APIs Search the contents of JSON documents with JSON Search tools Convert JSON documents to other data formats with JSON Transform tools Compare JSON-based hypermedia formats, including HAL and jsonapi Leverage MongoDB to store and access JSON documents Use Apache Kafka to exchange JSON-based messages between services

Understanding Message Brokers

2017-06-15 O'Reilly Amazon

book

Jakub Korab

data data-engineering streaming-messaging streaming & messaging Java Kafka

Messaging is one of the more poorly understood areas of IT; most developers and architects have only a passing familiarity with how broker-based messaging technologies work. This practical report not only helps you get up to speed on the essentials of messaging, but also compares two of today’s most popular messaging technologies—Apache ActiveMQ and Apache Kafka. Author and consultant Jakub Korab describes use cases and design choices that lead developers to very different approaches for developing message-based systems. You’ll come away with a high-level understanding of both ActiveMQ and Kafka, including how they should and should not be used, how they handle concerns such as throughput and high-availability, and what to look out for when considering other messaging technologies in future. Understand the types of problems that messaging systems address Explore three primary messaging patterns: point-to-point, publish-subscribe, and a hybrid of both Dive into ActiveMQ, a classic broker-centric design implemented through Java libraries that works for a broad range of messaging use cases Examine Kafka, a distributed system that can be scaled to provide massive performance and fault tolerance through replication Learn the mechanical complexities that message-based systems need to address, and some patterns you can apply to deal with those complexities

Advanced Analytics with Spark, 2nd Edition

2017-06-12 O'Reilly Amazon

book

Josh Wills , Sandy Ryza , Sean Owen , Uri Laserson

data data-engineering apache-spark AI/ML Analytics Data Science

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses

Data Lake for Enterprises

2017-05-31 O'Reilly Amazon

book

Pankaj Misra , Tomcy John , Vivek Mishra

data data-engineering storage-repositories data-lake AI/ML AWS Lambda

"Data Lake for Enterprises" is a comprehensive guide to building data lakes using the Lambda Architecture. It introduces big data technologies like Hadoop, Spark, and Flume, showing how to use them effectively to manage and leverage enterprise-scale data. You'll gain the skills to design and implement data systems that handle complex data challenges. What this Book will help me do Master the use of Lambda Architecture to create scalable and effective data management systems. Understand and implement technologies like Hadoop, Spark, Kafka, and Flume in an enterprise data lake. Integrate batch and stream processing techniques using big data tools for comprehensive data analysis. Optimize data lakes for performance and reliability with practical insights and techniques. Implement real-world use cases of data lakes and machine learning for predictive data insights. Author(s) None Mishra, None John, and Pankaj Misra are recognized experts in big data systems with a strong background in designing and deploying data solutions. With a clear and methodical teaching style, they bring years of experience to this book, providing readers with the tools and knowledge required to excel in enterprise big data initiatives. Who is it for? This book is ideal for software developers, data architects, and IT professionals looking to integrate a data lake strategy into their enterprises. It caters to readers with a foundational understanding of Java and big data concepts, aiming to advance their practical knowledge of building scalable data systems. If you're eager to delve into cutting-edge technologies and transform enterprise data management, this book is for you.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

Deep Learning for Search

Mastering Hadoop 3

Apache Spark Quick Start Guide

Java XML and JSON: Document Processing for Java SE

Apache Kafka Quick Start Guide

PostgreSQL 11 Server Side Programming Quick Start Guide

Apache Hadoop 3 Quick Start Guide

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Seven NoSQL Databases in a Week

Camel in Action, Second Edition

Mastering Apache Solr 7.x

Liberty in IBM CICS: Deploying and Managing Java EE Applications

Scaling Data Services with Pivotal GemFire

Apache Kafka 1.0 Cookbook

Pro MySQL NDB Cluster

Practical Real-time Data Processing and Analytics

Oracle ADF Survival Guide: Mastering the Application Development Framework

Building Data Streaming Applications with Apache Kafka

Mastering Apache Storm

Apache Spark 2.x for Java Developers

JSON at Work

Understanding Message Brokers

Advanced Analytics with Spark, 2nd Edition

Data Lake for Enterprises