Java

Practical Apache Lucene 8: Uncover the Search Capabilities of Your Application

2020-10-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Atri Sharma

AI/ML data data-engineering lucene search

Gain a thorough knowledge of Lucene's capabilities and use it to develop your own search applications. This book explores the Java-based, high-performance text search engine library used to build search capabilities in your applications. Starting with the basics of Lucene and searching, you will learn about the types of queries used in it and also take a look at scoring models. Applying this basic knowledge, you will develop a hello world app using basic Lucene queries and explore functions like scoring and document level boosting. Along the way you will also uncover the concepts of partial searching and matching in Lucene and then learn how to integrate geographical information (geospatial data) in Lucene using spatial queries and n-dimensional indexing. This will prepare you to build a location-aware search engine with a representative data set that allows location constraints to be specified during a search. You’ll also develop atext classifier using Lucene and Apache Mahout, a popular machine learning framework. After a detailed review of performance bench-marking and common issues associated with it, you’ll learn some of the best practices of tuning the performance of your application. By the end of the book you’ll be able to build your first Lucene patch, where you will not only write your patch, but also test it and ensure it adheres to community coding standards. What You’ll Learn Master the basics of Apache Lucene Utilize different query types in Apache Lucene Explore scoring and document level boosting Integrate geospatial data into your application Who This Book Is For Developers wanting to learn the finer details of Apache Lucene by developing a series of projects with it.

Learning Spark, 2nd Edition

2020-07-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Brooke Wenig , Jules S. Damji (Anyscale Inc) , Tathagata Das (Databricks)

AI/ML Analytics API Avro CSV Data Analytics Delta Hive JSON Kafka ORC Parquet +9 more

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

Spark in Action, Second Edition

2020-06-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jean-Georges Perrin (Actian)

AI/ML Analytics API Big Data ELK GitHub Hadoop IBM Python Scala Spark SQL +4 more

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop. About the Technology Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem. About the Book Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms. What's Inside Writing Spark applications in Java Spark application architecture Ingestion through files, databases, streaming, and Elasticsearch Querying distributed datasets with Spark SQL About the Reader This book does not assume previous experience with Spark, Scala, or Hadoop. About the Author Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years. Quotes This book reveals the tools and secrets you need to drive innovation in your company or community. - Rob Thomas, IBM An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing. - Anupam Sengupta, GuardHat Inc. This book will help spark a love affair with distributed processing. - Conor Redmond, InComm Product Control Currently the best book on the subject! - Markus Breuer, Materna IPS

Cassandra: The Definitive Guide, 3rd Edition

2020-04-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cassandra Data Modelling JavaScript Python data data-engineering nosql-databases

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third edition—updated for Cassandra 4.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s nonrelational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data

Monitoring and Managing the IBM Elastic Storage Server Using the GUI

2019-11-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stefan Roth , Liju Jose , Przemyslaw Podfigurny , Alexander Wolf-Reber , Markus Rohwedder

ELK IBM data data-engineering

The IBM® Elastic Storage Server GUI provides an easy way to configure and monitor various features that are available with the IBM ESS system. It is a web application that runs on common web browsers, such as Chrome, Firefox, and Edge. The ESS GUI uses Java Script and Ajax technologies to enable smooth and desktop-like interfacing. This IBM Redpaper publication provides a broad understanding of the architecture and features of the ESS GUI. It includes information about how to install and configure the GUI and in-depth information about the use of the GUI options. The primary audience for this paper includes experienced and new users of the ESS system.

SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

2019-10-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bob Ward (Azure Data)

AI/ML Analytics Azure Big Data Data Analytics Data Lake ETL/ELT Hadoop HDFS Kubernetes Linux MongoDB +8 more

Get up to speed on the game-changing developments in SQL Server 2019. No longer just a database engine, SQL Server 2019 is cutting edge with support for machine learning (ML), big data analytics, Linux, containers, Kubernetes, Java, and data virtualization to Azure. This is not a book on traditional database administration for SQL Server. It focuses on all that is new for one of the most successful modernized data platforms in the industry. It is a book for data professionals who already know the fundamentals of SQL Server and want to up their game by building their skills in some of the hottest new areas in technology. SQL Server 2019 Revealed begins with a look at the project's team goal to integrate the world of big data with SQL Server into a major product release. The book then dives into the details of key new capabilities in SQL Server 2019 using a “learn by example” approach for Intelligent Performance, security, mission-criticalavailability, and features for the modern developer. Also covered are enhancements to SQL Server 2019 for Linux and gain a comprehensive look at SQL Server using containers and Kubernetes clusters. The book concludes by showing you how to virtualize your data access with Polybase to Oracle, MongoDB, Hadoop, and Azure, allowing you to reduce the need for expensive extract, transform, and load (ETL) applications. You will then learn how to take your knowledge of containers, Kubernetes, and Polybase to build a comprehensive solution called Big Data Clusters, which is a marquee feature of 2019. You will also learn how to gain access to Spark, SQL Server, and HDFS to build intelligence over your own data lake and deploy end-to-end machine learning applications. What You Will Learn Implement Big Data Clusters with SQL Server, Spark, and HDFS Create a Data Hub with connections to Oracle, Azure, Hadoop, and other sources Combine SQL and Spark to build a machine learning platform for AI applications Boost your performance with no application changes using Intelligent Performance Increase security of your SQL Server through Secure Enclaves and Data Classification Maximize database uptime through online indexing and Accelerated Database Recovery Build new modern applications with Graph, ML Services, and T-SQL Extensibility with Java Improve your ability to deploy SQL Server on Linux Gain in-depth knowledge to run SQL Server with containers and Kubernetes Know all the new database engine features for performance, usability, and diagnostics Use the latest tools and methods to migrate your database to SQL Server 2019 Apply your knowledge of SQL Server 2019 to Azure Who This Book Is For IT professionals and developers who understand the fundamentals of SQL Server and wish to focus on learning about the new, modern capabilities of SQL Server 2019. The book is for those who want to learn about SQL Server 2019 and the new Big Data Clusters and AI feature set, support for machine learning and Java, how to run SQL Server with containers and Kubernetes, and increased capabilities around Intelligent Performance, advanced security, and high availability.

Deep Learning for Search

2019-06-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tommaso Teofili

AI/ML Data Science NLP TensorFlow data data-engineering search

Deep Learning for Search teaches you how to improve the effectiveness of your search by implementing neural network-based techniques. By the time you're finished with the book, you'll be ready to build amazing search engines that deliver the results your users need and that get better as time goes on! About the Technology Deep learning handles the toughest search challenges, including imprecise search terms, badly indexed data, and retrieving images with minimal metadata. And with modern tools like DL4J and TensorFlow, you can apply powerful DL techniques without a deep background in data science or natural language processing (NLP). This book will show you how. About the Book Deep Learning for Search teaches you to improve your search results with neural networks. You’ll review how DL relates to search basics like indexing and ranking. Then, you’ll walk through in-depth examples to upgrade your search with DL techniques using Apache Lucene and Deeplearning4j. As the book progresses, you’ll explore advanced topics like searching through images, translating user queries, and designing search engines that improve as they learn! What's Inside Accurate and relevant rankings Searching across languages Content-based image search Search with recommendations About the Reader For developers comfortable with Java or a similar language and search basics. No experience with deep learning or NLP needed. About the Author Tommaso Teofili is a software engineer with a passion for open source and machine learning. As a member of the Apache Software Foundation, he contributes to a number of open source projects, ranging from topics like information retrieval (such as Lucene and Solr) to natural language processing and machine translation (including OpenNLP, Joshua, and UIMA). He currently works at Adobe, developing search and indexing infrastructure components, and researching the areas of natural language processing, information retrieval, and deep learning. He has presented search and machine learning talks at conferences including BerlinBuzzwords, International Conference on Computational Science, ApacheCon, EclipseCon, and others. You can find him on Twitter at @tteofili. Quotes A practical approach that shows you the state of the art in using neural networks, AI, and deep learning in the development of search engines. - From the Foreword by Chris Mattmann, NASA JPL A thorough and thoughtful synthesis of traditional search and the latest advancements in deep learning. - Greg Zanotti, Marquette Partners A well-laid-out deep dive into the latest technologies that will take your search engine to the next level. - Andrew Wyllie, Thynk Health Hands-on exercises teach you how to master deep learning for search-based products. - Antonio Magnaghi, System1

Mastering Hadoop 3

2019-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Timothy Wong , Chanchal Singh , Manish Kumar

Flink Big Data Data Engineering Data Modelling Hadoop HDFS Cyber Security Spark data data-engineering

"Mastering Hadoop 3" is your in-depth guide to understanding and mastering the advanced features of the Hadoop ecosystem. With a focus on distributed computing and data processing, this book covers essential tools such as YARN, MapReduce, and Apache Spark to help you build scalable, efficient data pipelines. What this Book will help me do Gain a comprehensive understanding of Hadoop Distributed File System (HDFS) and YARN for effective resource management. Master data processing with MapReduce and learn to integrate with real-time processing engines like Spark and Flink. Develop and secure enterprise-grade Hadoop-based data pipelines by implementing robust security and governance measures. Explore techniques for batch data processing, data modeling, and designing applications tailored for Hadoop environments. Understand best practices for optimizing and troubleshooting Hadoop clusters for enhanced performance and reliability. Author(s) The authors, including None Wong, None Singh, and None Kumar, bring together years of experience in big data engineering, distributed systems, and enterprise application development. They aim to provide a clear pathway to mastering Hadoop ecosystem tools. Who is it for? This book is ideal for budding big data professionals who have some familiarity with Java and basic Hadoop concepts and wish to elevate their expertise. If you're a Hadoop career practitioner keen to expand your understanding of the ecosystem's advanced capabilities or a professional looking to implement Hadoop in organizational workflows, this book is well-suited for you.

Apache Spark Quick Start Guide

2019-01-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Akash Grade , Shrey Mehrotra

AI/ML API Big Data Python Scala Spark SQL Data Streaming apache-spark data data-engineering

Dive into the world of scalable data processing with the "Apache Spark Quick Start Guide." This book offers a foundational introduction to Spark, empowering readers to harness its capabilities for big data processing. With clear explanations and hands-on examples, you'll learn to implement Spark applications that handle complex data tasks efficiently. What this Book will help me do Understand and implement Spark's RDDs and DataFrame APIs to process large datasets effectively. Set up a local development environment for Spark-based projects. Develop skills to debug and optimize slow-performing Spark applications. Harness built-in modules of Spark for SQL, streaming, and machine learning applications. Adopt best practices and optimization techniques for high-performance Spark applications. Author(s) Shrey Mehrotra is a seasoned software developer with expertise in big data technologies, particularly Apache Spark. With years of hands-on industry experience, Shrey focuses on making complex technical concepts accessible to all. Through his writing, he aims to share clear, practical guidance for developers of all levels. Who is it for? This guide is perfect for big data enthusiasts and professionals looking to learn Apache Spark's capabilities from scratch. It's aimed at data engineers interested in optimizing application performance and data scientists wanting to integrate machine learning with Spark. A basic familiarity with either Scala, Python, or Java is recommended.

Java XML and JSON: Document Processing for Java SE

2019-01-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeff Friesen

API JSON Oracle XML data data-engineering storage-formats

Use this guide to master the XML metalanguage and JSON data format along with significant Java APIs for parsing and creating XML and JSON documents from the Java language. New in this edition is coverage of Jackson (a JSON processor for Java) and Oracle’s own Java API for JSON processing (JSON-P), which is a JSON processing API for Java EE that also can be used with Java SE. This new edition of Java XML and JSON also expands coverage of DOM and XSLT to include additional API content and useful examples. All examples in this book have been tested under Java 11. In some cases, source code has been simplified to use Java 11’s var language feature. The first six chapters focus on XML along with the SAX, DOM, StAX, XPath, and XSLT APIs. The remaining six chapters focus on JSON along with the mJson, GSON, JsonPath, Jackson, and JSON-P APIs. Each chapter ends with select exercises designed to challenge your grasp of the chapter's content.An appendix provides the answers to these exercises. What You'll Learn Master the XML language Create, validate, parse, and transform XML documents Apply Java’s SAX, DOM, StAX, XPath, and XSLT APIs Master the JSON format for serializing and transmitting data Code against third-party APIs such as Jackson, mJson, Gson, JsonPath Master Oracle’s JSON-P API in a Java SE context Who This Book Is For Intermediate and advanced Java programmers who are developing applications that must access data stored in XML or JSON documents. The book also targets developers wanting to understand the XML language and JSON data format.

Apache Kafka Quick Start Guide

2018-12-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Raúl Estrada

Kafka Data Streaming data data-engineering streaming-messaging

Dive into the world of Apache Kafka with this concise guide that focuses on its practical use for real-time data processing in distributed systems. You'll explore Kafka's capabilities, covering essentials like configuration, messaging, serialization, and handling complex data streams using Kafka Streams and KSQL. By the end, you'll be equipped to tackle real-world streaming challenges confidently. What this Book will help me do Understand how to set up and configure Apache Kafka for real-time processing environments. Master key concepts like message validation, enrichment, and serialization. Learn to use the Schema Registry for data validation and versioning. Gain hands-on experience with data streaming and aggregation using Kafka Streams. Develop skills in using KSQL for data manipulation and stream querying. Author(s) None Estrada is an experienced software engineer with a deep understanding of distributed systems and real-time data processing. With expertise in Apache Kafka and other event-streaming platforms, Estrada approaches technical writing with an emphasis on clarity and practical application. Their passion for helping developers achieve success is reflected in their authoritative yet approachable style. Who is it for? This book is perfect for software engineers and backend developers interested in mastering real-time data processing using Apache Kafka. It is designed for readers who are eager to solve practical problems in distributed systems, irrespective of whether they have prior Kafka experience. Some familiarity with Java or other JVM languages will be helpful, although not strictly necessary. This is an ideal resource for learners seeking a hands-on, practical approach to Apache Kafka.

PostgreSQL 11 Server Side Programming Quick Start Guide

2018-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Luca Ferrari (Bending Spoons)

Data Management data data-engineering postgresql relational-databases

PostgreSQL 11 Server Side Programming Quick Start Guide introduces you to the world of database programming directly at the database level. This book delves into the concepts of server-side programming, providing you with the necessary tools to author stored procedures, triggers, and extensions for your PostgreSQL instance. What this Book will help me do Learn how to create stored procedures and functions for efficient database logic. Understand how to use triggers and rules to maintain data integrity. Gain expertise in developing extensions to extend PostgreSQL functionality. Master techniques for handling inter-process communication and background workers. Explore custom data types and integration with programming languages like Java and Perl. Author(s) None Ferrari, a seasoned database administrator and developer, specializes in delivering insightful PostgreSQL training. With extensive experience in both database management and software development, None brings practical knowledge and real-world examples to guide readers through mastering PostgreSQL server-side programming. Who is it for? This book is tailored for database administrators, developers, and engineers who have a basic understanding of PostgreSQL and are looking to expand their knowledge into server-side programming. If you're aiming to implement advanced database functionality or streamline data management tasks in PostgreSQL, this book is for you. It is ideal for those who wish to apply database programming techniques to enterprise-grade challenges. Beginner-friendly but designed to empower professionals with actionable insights.

Apache Hadoop 3 Quick Start Guide

2018-10-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hrishikesh Vijay Karambelkar

Analytics Big Data Data Analytics Hadoop HDFS Hive Kafka Spark Data Streaming data data-engineering

Dive into the world of distributed data processing with the 'Apache Hadoop 3 Quick Start Guide.' This comprehensive resource equips you with the knowledge needed to handle large datasets effectively using Apache Hadoop. Learn how to set up and configure Hadoop, work with its core components, and explore its powerful ecosystem tools. What this Book will help me do Understand the fundamental concepts of Apache Hadoop, including HDFS, MapReduce, and YARN, and use them to store and process large datasets. Set up and configure Hadoop 3 in both developer and production environments to suit various deployment needs. Gain hands-on experience with Hadoop ecosystem tools like Hive, Kafka, and Spark to enhance your big data processing capabilities. Learn to manage, monitor, and troubleshoot Hadoop clusters efficiently to ensure smooth operations. Analyze real-time streaming data with tools like Apache Storm and perform advanced data analytics using Apache Spark. Author(s) The author of this guide, Vijay Karambelkar, brings years of experience working with big data technologies and Apache Hadoop in real-world applications. With a passion for teaching and simplifying complex topics, Vijay has compiled his expertise to help learners confidently approach Hadoop 3. His detailed, example-driven approach makes this book a practical resource for aspiring data professionals. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who aspire to dive into the field of big data. If you're new to Apache Hadoop or looking to upgrade your skills to include version 3, this guide is for you. A basic understanding of Java programming is recommended to make the most of the topics covered. Embark on this journey to enhance your career in data-intensive industries.

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

2018-08-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hien Luu

AI/ML Analytics Big Data Cloud Computing Databricks Hadoop Spark SQL Data Streaming apache-spark data data-engineering

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it. Along the way, you’ll discover resilient distributed datasets (RDDs); use Spark SQL for structured data; and learn stream processing and build real-time applications with Spark Structured Streaming. Furthermore, you’ll learn the fundamentals of Spark ML for machine learning and much more. After you read this book, you will have the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications. What You Will Learn Understand Spark unified data processing platform Howto run Spark in Spark Shell or Databricks Use and manipulate RDDs Deal with structured data using Spark SQL through its operations and advanced functions Build real-time applications using Spark Structured Streaming Develop intelligent applications with the Spark Machine Learning library Who This Book Is For Programmers and developers active in big data, Hadoop, and Java but who are new to the Apache Spark platform.

Seven NoSQL Databases in a Week

2018-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xun (Brian) Wu , Sudarshan Kadambi

Cassandra DynamoDB Apache HBase MongoDB Neo4j NoSQL Python RDBMS Redis data data-engineering nosql-databases

Learn the fundamentals of seven essential NoSQL databases in just one week with this book. Covering MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and HBase, you'll explore their functionalities and practical applications. Designed to give you a working understanding of NoSQL database types, this guide helps aspiring DBAs and developers comprehend and utilize modern data solutions. What this Book will help me do Master the fundamentals of MongoDB, including high-performance, high-availability, and scaling features. Gain hands-on experience with Neo4j to perform database queries and integrate with Python and Java applications. Learn efficient querying with Redis for storage and retrieval tasks. Understand Cassandra's powerful solution for scalable and fault-tolerant systems. Get well-versed with HBase for creating tables, and reading and writing data efficiently. Author(s) Sudarshan Kadambi and Xun (Brian) Wu bring a wealth of experience in database technologies. They have worked extensively in the software development and database management fields. With their practical and concise teaching approach, the authors make complex topics accessible for readers. Who is it for? This book is ideal for budding DBAs and developers looking to understand NoSQL databases. It is particularly useful for those transitioning from relational databases who want to learn about modern database technologies. Suitable for both beginners and those with some database knowledge, it aims to bridge skill gaps and expand the reader's technical expertise.

Camel in Action, Second Edition

2018-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jonathan Anstey , Claus Ibsen

Cloud Computing Docker Kubernetes Cyber Security XML camel data data-engineering streaming-messaging

Camel in Action, Second Edition is the most complete Camel book on the market. Written by core developers of Camel and the authors of the highly acclaimed first edition, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. About the Technology Apache Camel is a Java framework that implements enterprise integration patterns (EIPs) and comes with over 200 adapters to third-party systems. A concise DSL lets you build integration logic into your app with just a few lines of Java or XML. By using Camel, you benefit from the testing and experience of a large and vibrant open source community. About the Book Camel in Action, Second Edition is the definitive guide to the Camel framework. It starts with core concepts like sending, receiving, routing, and transforming data. It then goes in depth on many topics such as how to develop, debug, test, deal with errors, secure, scale, cluster, deploy, and monitor your Camel applications. The book also discusses how to run Camel with microservices, reactive systems, containers, and in the cloud. What's Inside Coverage of all relevant EIPs Camel microservices with Spring Boot Camel on Docker and Kubernetes Error handling, testing, security, clustering, monitoring, and deployment Hundreds of examples in Java and XML About the Reader Readers should be familiar with Java. This book is accessible to beginners and invaluable to experts. About the Authors Claus Ibsen is a senior principal engineer working for Red Hat specializing in cloud and integration. He has worked on Apache Camel for the last nine years where he heads the project. Claus lives in Denmark. Jonathan Anstey is an engineering manager at Red Hat and a core Camel contributor. He lives in Newfoundland, Canada. Quotes I highly recommend this book to anyone with even a passing interest in Apache Camel. Do take Camel for a ride...and don't get the hump! - From the Foreword by James Strachan, Creator of Apache Camel Claus and Jon are great writers, relying on figures and diagrams where needed and presenting lots of code snippets and worked examples. - From the Foreword by Dr. Mark Little, Technical Director of JBoss The second edition of this all-time classic is an indispensable companion for your Apache Camel rides. - Gregor Zurowski, Apache Camel Committer The absolute best way to learn and use Camel - top to bottom, front to back, and all the way through. Camel is a fantastic tool - every Java coder should have a copy of this book. - Rick Wagner, Red Hat An excellent book and the definite reference for experienced engineers. - Yan Guo, EventBrite

Mastering Apache Solr 7.x

2018-02-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chintan Mehta , Dharmesh Vasoya , Sandeep Nair (Microsoft)

API BI JavaScript Python data data-engineering search solr

"Mastering Apache Solr 7.x" is your practical guide to building, advancing, and optimizing enterprise search solutions using Solr 7. With this book, you will harness the robust features of Solr, implement efficient search capabilities, and tackle complex business intelligence problems to achieve unparalleled search performance. What this Book will help me do Develop and implement efficient schemas using the Solr Schema API. Optimize enterprise search performance with advanced querying and scoring techniques. Implement fault-tolerant and distributed search systems using SolrCloud. Leverage Apache Tika for seamless data indexing and content extraction. Utilize programming languages like JavaScript, Python, and Ruby to integrate with Solr. Author(s) With years of experience in search technologies and deep expertise in Apache Solr, authors None Nair, None Mehta, and Dharmesh Vasoya bring together a wealth of knowledge in this book. Their collaborative insights equip readers to master advanced Solr features, sharing practical examples and real-world applications with a passion for clarity and efficiency. Who is it for? This book is ideal for software developers, data engineers, and database architects who aim to design and implement effective enterprise search systems. It is tailored for readers with prior experience in Apache Solr or Java programming, focusing on those eager to enhance their search solution expertise. Achieve your advanced search system goals here.

Liberty in IBM CICS: Deploying and Managing Java EE Applications

2018-01-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mitch Johnson , Phil Wakelin , Jonathan Lawrence , Tito Paiva , Carlos Donatucci , Michael Jones

API CI/CD GitHub IBM Cyber Security data data-engineering

Abstract This IBM® Redbooks® publication is intended for IBM CICS® system programmers and IBM Z architects. It describes how to deploy and manage Java EE 7 web-based applications in an IBM CICS Liberty JVM server and access data on IBM Db2® for IBM z/OS® and IBM MQ for z/OS sub systems. In this book, we describe the key steps to create and install a Liberty JVM server within a CICS region. We then describe how to best use the different deployment techniques for Java EE applications and the specific considerations when deploying applications that use JDBC, JMS, and the new CICS link to Liberty API. Finally, we describe how to secure web applications in CICS Liberty, including transport-level security and request authentication and authorization by using IBM RACF® and LDAP registries. Information is also provided about how to build a high availability infrastructure and how to use the logging and monitoring functions that are available in the CICS Liberty environment. This book is based on IBM CICS Transaction Server (CICS TS) V5.4 that uses the embedded IBM WebSphere® Application Server Liberty technology. It is also applicable to CICS TS V5.3 with the fixes for the continuous delivery APAR PI77502 applied. Sample applications are used throughout this publication and are freely available for download from the IBM CICSDev GitHub organization along with detailed deployment instructions.

Scaling Data Services with Pivotal GemFire

2018-01-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mike Stolz

Cloud Computing Virtual Machine data data-engineering

In-memory data grids (IMDG) such as Pivotal GemFire, which is powered by Apache Geode, are key to making today’s modern high-speed, data-intensive applications work. By keeping data in the RAM of a horizontally scalable cluster of servers, IMDG solutions enable apps to achieve consistently low latency for data access at any scale. Many in the application development community, however, aren’t aware of IMDG’s benefits, use cases, or underlying technology. This report brings you up to speed by providing GemFire basics, including use cases and easily understood examples. You’ll determine whether GemFire can benefit your application, and learn how to install a simple test environment and build a small proof of concept. Explore GemFire use cases for Java applications—including microservices, high-speed data ingest, and transaction and event processing Get an architectural overview of GemFire, and learn installation requirements for both hardware/VM and cloud Dive into GemFire’s capabilities with continuous queries, server-side functions, and Apache Lucene integration Learn how GemFire works with the persistence model, off-heap memory, and WAN replication

Apache Kafka 1.0 Cookbook

2017-12-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Raúl Estrada

ELK Hadoop Kafka Scala Spark Data Streaming data data-engineering streaming-messaging

Dive into the essential resource for mastering Apache Kafka with this cookbook of practical recipes. You'll explore the dynamic features of Kafka 1.0, integrate it with enterprise data solutions, and confidently manage messaging and streaming data in real-time. What this Book will help me do Effectively install and configure Apache Kafka in a professional environment. Implement Kafka producers and consumers to manage real-time data streams. Utilize Confluent platforms and Kafka streams for advanced data processing. Monitor Kafka clusters with tools like Graphite and Ganglia for optimal performance. Integrate Kafka seamlessly with tools such as Hadoop, Spark, and Elasticsearch. Author(s) None Estrada and None Zinoviev have extensive experience in enterprise data systems and have been dedicated contributors to the Apache Kafka ecosystem. Their combined expertise encompasses developing robust, real-time distributed systems and delivering insightful technical guidance. Through this book, they share their vast knowledge and practical solutions, tailored for both developers and administrators. Who is it for? This book is tailored for developers and administrators looking to enhance their expertise in Apache Kafka. Developers should be comfortable with Java or Scala to fully utilize examples, while administrators benefit from prior knowledge of Kafka operations. Ideal readers are those seeking actionable techniques to efficiently manage and integrate Kafka into their enterprise systems.

talk-data.com

Activity Trend

Top Events

Top Speakers

Practical Apache Lucene 8: Uncover the Search Capabilities of Your Application

Learning Spark, 2nd Edition

Spark in Action, Second Edition

Cassandra: The Definitive Guide, 3rd Edition

Monitoring and Managing the IBM Elastic Storage Server Using the GUI

SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

Deep Learning for Search

Mastering Hadoop 3

Apache Spark Quick Start Guide

Java XML and JSON: Document Processing for Java SE

Apache Kafka Quick Start Guide

PostgreSQL 11 Server Side Programming Quick Start Guide

Apache Hadoop 3 Quick Start Guide

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Seven NoSQL Databases in a Week

Camel in Action, Second Edition

Mastering Apache Solr 7.x

Liberty in IBM CICS: Deploying and Managing Java EE Applications

Scaling Data Services with Pivotal GemFire

Apache Kafka 1.0 Cookbook