O'Reilly Data Engineering Books

IBM DB2 Web Query for i: The Nuts and Bolts

2017-05-11 O'Reilly Amazon

book

Rob Bestgen , Doug Mack , Lin Su , Simona Pacchiarini , Kathryn Steinbrink , Hernando Bedoya , Kevin Trisko , Jim Bainbridge , Mike Cain

data data-engineering relational-databases ibm-db2 BI HTML

Abstract Business Intelligence (BI) is a broad term that relates to applications that analyze data to understand and act on the key metrics that drive profitability in an enterprise. Key to analyzing that data is providing fast, easy access to it while delivering it in formats or tools that best fit the needs of the user. At the core of any BI solution are user query and reporting tools that provide intuitive access to data supporting a spectrum of users from executives to “power users,” from spreadsheet aficionados to the external Internet consumer. IBM® DB2® Web Query for i offers a set of modernized tools for a more robust, extensible, and productive reporting solution than the popular IBM Query for System i® tool (also known as IBM Query/400). IBM DB2 Web Query for i preserves investments in the reports that are developed with Query/400 by offering a choice of importing definitions into the new technology or continuing to run existing Query/400 reports as is. But, it also offers significant productivity and performance enhancements by leveraging the latest in DB2 for i query optimization technology. The DB2 Web Query for i product is a web-based query and report writing product that offers enhanced capabilities over the IBM Query for iSeries product (also commonly known as Query/400). IBM DB2 Web Query for i includes Query for iSeries technology to assist customers in their transition to DB2 Web Query. It offers a more modernized, Java based solution for a more robust, extensible, and productive reporting solution. DB2 Web Query provides the ability to query or build reports against data that is stored in DB2 for i (or Microsoft SQL Server) databases through browser-based user interface technologies: Build reports with ease through the web-based, ribbon-like InfoAssist tool that leverages a common look and feel that can extend the number of personnel that can generate their own reports. Simplify the management of reports by significantly reducing the number of report definitions that are required through the use of parameter driven reports. Deliver data to users in many different formats, including directly into spreadsheets, or in boardroom-quality PDF format, or viewed from the browser in HTML. Leverage advanced reporting functions, such as matrix reporting, ranking, color coding, drill-down, and font customization to enhance the visualization of DB2 data. DB2 Web Query offers features to import Query/400 definitions and enhance their look and functions. By using it, you can add OLAP-like slicing and dicing to the reports or view reports in disconnected mode for users on the go. This IBM Redbooks® publication provides a broad understanding of what can be done with the DB2 Web Query product. This publication is a companion of DB2 Web Query Tutorials, SG24-8378, which has a group of self-explanatory tutorials to help you get up to speed quickly.

Learning Apache Cassandra - Second Edition

2017-04-25 O'Reilly Amazon

book

Sandeep Yarabarla , Graham Doman

data data-engineering nosql-databases Cassandra Java NoSQL

Learning Apache Cassandra is an engaging and in-depth guide to understanding the concepts and practical applications of Apache Cassandra, one of the most robust distributed NoSQL databases available. By the end of this book, you will have the necessary skills to design and manage scalable, high-performance database solutions tailored for modern applications. What this Book will help me do Set up Apache Cassandra and its multi-node clusters confidently and efficiently. Master schema design principles, including the use of composite keys, collections, and user-defined types. Implement efficient query strategies with secondary indexes and materialized views. Understand data distribution strategies and tune consistency levels for different application requirements. Dive into advanced topics like user-defined functions, batch operations, and Java client optimizations for scalable database architecture. Author(s) None Yarabarla brings practical expertise and deep knowledge to the subject of Apache Cassandra. With hands-on industry experience designing scalable database solutions, the author ensures complex topics are presented through clear and actionable insights. This is coupled with real-world scenarios to help you apply your learning effectively. Who is it for? This book is ideal for developers and IT professionals interested in learning Apache Cassandra from scratch or enhancing their NoSQL database expertise. It is particularly suited for those transitioning from relational databases to NoSQL systems. Even without prior coding experience, readers can expect to follow along and achieve practical results.

MQTT Essentials - A Lightweight IoT Protocol

2017-04-14 O'Reilly Amazon

book

Gastón C. Hillar

data data-engineering streaming-messaging rabbitmq IoT Java

Dive into the world of MQTT, the preferred protocol for IoT and M2M communication. This book provides a comprehensive guide to understanding, implementing, and securing MQTT-based systems, enabling readers to create efficient and lightweight communication networks for their connected devices. What this Book will help me do Understand the underlying principles and protocol structure of MQTT. Securely configure and deploy an MQTT broker for communication. Develop Python, Java, and JavaScript-based MQTT client applications. Utilize MQTT for real-world IoT use cases such as sensor data interchange. Optimize MQTT usage for low-latency and lightweight communication scenarios. Author(s) Gastón C. Hillar is an experienced IoT developer and author with a deep understanding of IoT protocols and technologies. With years of practical experience in designing and deploying secure IoT systems, Gastón specializes in breaking down complex topics into digestible and actionable insights. Through his books, he aims to empower developers to effectively integrate IoT technologies into their work. Who is it for? The book is tailored for software developers and engineers who are looking to integrate MQTT into their IoT solutions. It's ideal for individuals with pre-existing knowledge in IoT concepts who want to deepen their understanding of MQTT. Readers seeking to secure, optimize, and utilize MQTT for communication and automation tasks will find it especially useful. It's a perfect fit for those working with Python, Java, and web technologies in IoT contexts.

Sams Teach Yourself Hadoop in 24 Hours

2017-04-07 O'Reilly Amazon

book

Jeffrey Aven

data data-engineering Hadoop API Big Data Cloud Computing

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Elasticsearch 5.x Cookbook - Third Edition

2017-02-06 O'Reilly Amazon

book

Alberto Paro

data data-engineering search elasticsearch Analytics Big Data

Elasticsearch 5.x Cookbook is a comprehensive guide that teaches you how to leverage the full power of Elasticsearch for high-performance search and analytics. Through step-by-step recipes, you'll explore deployment, query building, plugin integration, and advanced analytics, ensuring you can manage and scale Elasticsearch like a pro. What this Book will help me do Understand and deploy complex Elasticsearch cluster topologies for optimal performance. Create tailored mappings to gain finer control over data indexing and retrieval. Design and execute advanced queries and analytics using Elasticsearch capabilities. Integrate Elasticsearch with popular programming languages and big data platforms. Monitor and improve Elasticsearch cluster health using the best practices and tools. Author(s) Alberto Paro is a seasoned software engineer and data scientist with extensive experience in distributed systems and search technologies. Having worked on numerous search-related projects, he brings practical, real-world insights to his writing. Alberto is passionate about teaching and simplifying complex concepts, making this book both approachable and expertly detailed. Who is it for? This book is ideal for developers or data engineers seeking to utilize Elasticsearch for advanced search and analytics tasks. If you have some prior knowledge of JSON and programming concepts, particularly Java, you will benefit most from this material. Whether you're looking to integrate Elasticsearch into your systems or to optimize its usage, this book caters to your needs.

Beginning Hibernate: For Hibernate 5

2016-11-10 O'Reilly Amazon

book

Dave Minter , Joseph B. Ottinger , Jeff Linwood

data data-engineering database-management-tools object-relational-mapping hibernate Big Data

Get started with the Hibernate 5 persistence layer and gain a clear introduction to the current standard for object-relational persistence in Java. This updated edition includes the new Hibernate 5.0 framework as well as coverage of NoSQL, MongoDB, and other related technologies, ranging from applications to big data. Beginning Hibernate is ideal if you're experienced in Java with databases (the traditional, or connected, approach), but new to open-source, lightweight Hibernate. The book keeps its focus on Hibernate without wasting time on nonessential third-party tools, so you'll be able to immediately start building transaction-based engines and applications. Experienced authors Joseph Ottinger with Dave Minter and Jeff Linwood provide more in-depth examples than any other book for Hibernate beginners. They present their material in a lively, example-based manner—not a dry, theoretical, hard-to-read fashion. What You'll Learn Build enterprise Java-based transaction-type applications that access complex data with Hibernate Work with Hibernate 5 using a present-day build process Use Java 8 features with Hibernate Integrate into the persistence life cycle Map using Java's annotations Search and query with the new version of Hibernate Integrate with MongoDB using NoSQL Keep track of versioned data with Hibernate Envers Who This Book Is For Experienced Java developers interested in learning how to use and apply object-relational persistence in Java and who are new to the Hibernate persistence framework.

Spark in Action

2016-11-03 O'Reilly Amazon

book

Marko Bonaci , Petar Zecevic

data data-engineering apache-spark AI/ML Analytics API

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0. About the Technology Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades. About the Book Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code. What's Inside Updated for Spark 2.0 Real-life case studies Spark DevOps with Docker Examples in Scala, and online in Java and Python About the Reader Written for experienced programmers with some background in big data or machine learning. About the Authors Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community. Quotes Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide. - Jonathan Sharley, Pandora Media Must-have! Speed up your learning of Spark as a distributed computing framework. - Robert Ormandi, Yahoo! An easy-to-follow, step-by-step guide. - Gaurav Bhardwaj, 3Pillar Global An ambitiously comprehensive overview of Spark and its diverse ecosystem. - Jonathan Miller, Optensity

Learning IBM Bluemix

2016-10-25 O'Reilly Amazon

book

Sreelatha Sankaranarayanan

data data-engineering IBM Cloud Computing Java JavaScript

Learning IBM Bluemix provides a comprehensive introduction to developing and deploying applications with the IBM Bluemix cloud platform. By following detailed examples and guided exercises, you'll understand the full life cycle of cloud-based application development, from initial setup to scaling and security. What this Book will help me do Understand the capabilities of IBM Bluemix as a Platform as a Service to build applications efficiently. Learn to develop and deploy applications using Cloud Foundry command line and Bluemix console. Explore microservices architecture and build scalable applications using Bluemix tools. Integrate on-premises systems with cloud-hosted applications on Bluemix. Develop mobile client applications with the support of Bluemix's Mobile services. Author(s) Sreelatha Sankaranarayanan is an experienced developer and cloud technology author, with extensive expertise in IBM Bluemix. Her passion for simplifying complex concepts is reflected in her engaging writing style, ensuring learners can master new skills effectively. She brings years of real-world experience in cloud computing and software development to her instructional materials. Who is it for? This book is tailored for developers aiming to transition to cloud-based application development using IBM Bluemix, with a focus on practical application. Readers should have foundational skills in Java and Node.js to fully benefit. Ideal for professionals looking to expand their capabilities with cloud infrastructure, or for those wanting to leverage microservices and cloud solutions in their applications.

Fast Data Processing with Spark 2 - Third Edition

2016-10-24 O'Reilly Amazon

book

Krishna Sankar , Holden Karau

data data-engineering apache-spark AI/ML Analytics API

Fast Data Processing with Spark 2 takes you through the essentials of leveraging Spark for big data analysis. You will learn how to install and set up Spark, handle data using its APIs, and apply advanced functionality like machine learning and graph processing. By the end of the book, you will be well-equipped to use Spark in real-world data processing tasks. What this Book will help me do Install and configure Apache Spark for optimal performance. Interact with distributed datasets using the resilient distributed dataset (RDD) API. Leverage the flexibility of DataFrame API for efficient big data analytics. Apply machine learning models using Spark MLlib to solve complex problems. Perform graph analysis using GraphX to uncover structural insights in data. Author(s) Krishna Sankar is an experienced data scientist and thought leader in big data technologies. With a deep understanding of machine learning, distributed systems, and Apache Spark, Krishna has guided numerous projects in data engineering and big data processing. Matei Zaharia, the co-author, is also widely recognized in the field of distributed systems and cloud computing, contributing to Apache Spark development. Who is it for? This book is catered to software developers and data engineers with a foundational understanding of Scala or Java programming. Beginner to medium-level understanding of big data processing concepts is recommended for readers. If you are aspiring to solve big data problems using scalable distributed computing frameworks, this book is perfect for you. By the end, you will be confident in building Spark-powered applications and analyzing data efficiently.

Hadoop Blueprints

2016-09-30 O'Reilly Amazon

book

Anurag Shrivastava , Sudheesh Narayan , Tanmay Deshpande

data data-engineering Hadoop Big Data Java Marketing

"Hadoop Blueprints" guides you through using Hadoop and its ecosystem to solve real-life business problems. You will explore six case studies covering areas like fraud detection, marketing analysis, and data lakes, providing a thorough and practical understanding of Hadoop applications. What this Book will help me do Understand how to use Hadoop to solve real-life business scenarios effectively. Learn to build a 360-degree customer view integrating different data types. Develop and deploy a fraud detection system leveraging Hadoop technologies. Explore marketing campaign analysis and improvement using data-driven workflows on Hadoop. Gain hands-on experience with creating and maintaining efficient data lakes. Author(s) Sudheesh Narayan, along with his co-authors Anurag Shrivastava and Nod Deshpande, brings extensive experience in Big Data technologies. They have been involved in developing solutions utilizing Hadoop, Apache Spark, and other ecosystem components. Their practical approach to presenting complex technical topics ensures readers can apply their knowledge to real-world scenarios. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who have a foundational understanding of Hadoop and seek to expand their practical skills. Readers should be familiar with Java or other scripting languages. It's perfect for those aiming to build actionable solutions for business problems using Big Data technologies.

PostgreSQL Development Essentials

2016-09-26 O'Reilly Amazon

book

Baji Shaik , Manpreet Kaur

data data-engineering relational-databases postgresql Java SQL

Dive into the advanced features of PostgreSQL and master database development with 'PostgreSQL Development Essentials'. This book guides you step-by-step through topics like advanced SQL queries, database design, query optimization, and using PostgreSQL with programming languages like Java and PHP. By the end, you'll have the skills to build secure, efficient, and enterprise-ready database applications. What this Book will help me do Write powerful and complex SQL queries to harness the full potential of PostgreSQL. Create and optimize robust database designs tailored to application needs. Enhance database performance through indexing, partitioning, and query optimization. Integrate PostgreSQL seamlessly with Java and PHP for advanced application development. Utilize PostgreSQL extensions and features to expand functionality and ensure scalability. Author(s) Baji Shaik, the author of 'PostgreSQL Development Essentials', has extensive experience in database development and optimization with a focus on PostgreSQL. With his practical approach, Baji simplifies advanced concepts and provides actionable insights to empower developers. His teaching style bridges technical depth with accessibility, making this book an essential for professionals aiming to excel in PostgreSQL. Who is it for? This book is designed for software developers and database professionals who have a foundational understanding of PostgreSQL and are eager to deepen their expertise. It is ideal for those looking to enrich their skills in advanced SQL, optimizing database performance, and integrating PostgreSQL with application frameworks like Java and PHP. If you're aiming to elevate your database applications to enterprise-grade quality while ensuring both efficiency and scalability, this book is tailored for you.

Hadoop: Data Processing and Modelling

2016-08-31 O'Reilly Amazon

book

Sandeep Karanth , Tanmay Deshpande , Garry Turkington

data data-engineering Hadoop AI/ML Big Data DWH

Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets About This Book Conquer the mountain of data using Hadoop 2.X tools The authors succeed in creating a context for Hadoop and its ecosystem Hands-on examples and recipes giving the bigger picture and helping you to master Hadoop 2.X data processing platforms Overcome the challenging data processing problems using this exhaustive course with Hadoop 2.X Who This Book Is For This course is for Java developers, who know scripting, wanting a career shift to Hadoop - Big Data segment of the IT industry. So if you are a novice in Hadoop or an expert, this book will make you reach the most advanced level in Hadoop 2.X. What You Will Learn Best practices for setup and configuration of Hadoop clusters, tailoring the system to the problem at hand Integration with relational databases, using Hive for SQL queries and Sqoop for data transfer Installing and maintaining Hadoop 2.X cluster and its ecosystem Advanced Data Analysis using the Hive, Pig, and Map Reduce programs Machine learning principles with libraries such as Mahout and Batch and Stream data processing using Apache Spark Understand the changes involved in the process in the move from Hadoop 1.0 to Hadoop 2.0 Dive into YARN and Storm and use YARN to integrate Storm with Hadoop Deploy Hadoop on Amazon Elastic MapReduce and Discover HDFS replacements and learn about HDFS Federation In Detail As Marc Andreessen has said "Data is eating the world," which can be witnessed today being the age of Big Data, businesses are producing data in huge volumes every day and this rise in tide of data need to be organized and analyzed in a more secured way. With proper and effective use of Hadoop, you can build new-improved models, and based on that you will be able to make the right decisions. The first module, Hadoop beginners Guide will walk you through on understanding Hadoop with very detailed instructions and how to go about using it. Commands are explained using sections called "What just happened" for more clarity and understanding. The second module, Hadoop Real World Solutions Cookbook, 2nd edition, is an essential tutorial to effectively implement a big data warehouse in your business, where you get detailed practices on the latest technologies such as YARN and Spark. Big data has become a key basis of competition and the new waves of productivity growth. Hence, once you get familiar with the basics and implement the end-to-end big data use cases, you will start exploring the third module, Mastering Hadoop. So, now the question is if you need to broaden your Hadoop skill set to the next level after you nail the basics and the advance concepts, then this course is indispensable. When you finish this course, you will be able to tackle the real-world scenarios and become a big data expert using the tools and the knowledge based on the various step-by-step tutorials and recipes. Style and approach This course has covered everything right from the basic concepts of Hadoop till you master the advance mechanisms to become a big data expert. The goal here is to help you learn the basic essentials using the step-by-step tutorials and from there moving toward the recipes with various real-world solutions for you. It covers all the important aspects of Hadoop from system designing and configuring Hadoop, machine learning principles with various libraries with chapters illustrated with code fragments and schematic diagrams. This is a compendious course to explore Hadoop from the basics to the most advanced techniques available in Hadoop 2.X.

Interactive Spark using PySpark

2016-08-15 O'Reilly Amazon

book

Benjamin Bengfort , Jenny Kim

data data-engineering apache-spark PySpark AI/ML Analytics

Apache Spark is an in-memory framework that allows data scientists to explore and interact with big data much more quickly than with Hadoop. Python users can work with Spark using an interactive shell called PySpark. Why is it important? PySpark makes the large-scale data processing capabilities of Apache Spark accessible to data scientists who are more familiar with Python than Scala or Java. This also allows for reuse of a wide variety of Python libraries for machine learning, data visualization, numerical analysis, etc. What you'll learn—and how you can apply it Compare the different components provided by Spark, and what use cases they fit. Learn how to use RDDs (resilient distributed datasets) with PySpark. Write Spark applications in Python and submit them to the cluster as Spark jobs. Get an introduction to the Spark computing framework. Apply this approach to a worked example to determine the most frequent airline delays in a specific month and year. This lesson is for you because… You're a data scientist, familiar with Python coding, who needs to get up and running with PySpark You're a Python developer who needs to leverage the distributed computing resources available on a Hadoop cluster, without learning Java or Scala first Prerequisites Familiarity with writing Python applications Some familiarity with bash command-line operations Basic understanding of how to use simple functional programming constructs in Python, such as closures, lambdas, maps, etc. Materials or downloads needed in advance Apache Spark This lesson is taken from by Jenny Kim and Benjamin Bengfort. Data Analytics with Hadoop

Architecting HBase Applications

2016-07-18 O'Reilly Amazon

book

Kevin O'Dell , Jean-Marc Spaggiari

data data-engineering nosql-databases Apache HBase API Data Management

HBase is a remarkable tool for indexing mass volumes of data, but getting started with this distributed database and its ecosystem can be daunting. With this hands-on guide, you’ll learn how to architect, design, and deploy your own HBase applications by examining real-world solutions. Along with HBase principles and cluster deployment guidelines, this book includes in-depth case studies that demonstrate how large companies solved specific use cases with HBase. Authors Jean-Marc Spaggiari and Kevin O’Dell also provide draft solutions and code examples to help you implement your own versions of those use cases, from master data management (MDM) and document storage to near real-time event processing. You’ll also learn troubleshooting techniques to help you avoid common deployment mistakes. Learn exactly what HBase does, what its ecosystem includes, and how to set up your environment Explore how real-world HBase instances were deployed and put into production Examine documented use cases for tracking healthcare claims, digital advertising, data management, and product quality Understand how HBase works with tools and techniques such as Spark, Kafka, MapReduce, and the Java API Learn how to identify the causes and understand the consequences of the most common HBase issues

Cassandra: The Definitive Guide, 2nd Edition

2016-07-12 O'Reilly Amazon

book

Eben Hewitt , Jeff Carpenter

data data-engineering nosql-databases Cassandra Cloud Computing Data Modelling

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition—updated for Cassandra 3.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra on site, in the Cloud, or with Docker Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene

Java XML and JSON

2016-06-15 O'Reilly Amazon

book

Jeff Friesen

data data-engineering storage-formats XML API Java

Java XML and JSON is your one-stop guide to mastering the XML metalanguage and JSON data format along with significant Java APIs for parsing and creating XML/JSON documents (and more). The first six chapters focus on XML along with the SAX, DOM, StAX, XPath, and XSLT APIs. The remaining four chapters focus on JSON along with the mJson, GSON, and JsonPath APIs. Each chapter ends with select exercises designed to challenge your grasp of the chapter's content. An appendix provides the answers to these exercises. What You'll Learn Master the XML language Learn how to validate XML documents Learn how to parse XML documents with the SAX, DOM, and StAX APIs Learn how to create XML documents with the DOM and StAX APIs Learn how to extract values from XML documents with the XPath API Learn how to transform XML documents with the XSLT API Master the JSON format Learn how to validate JSON documents Learn how to parse and create JSON documents with the mJson and Gson APIs Learn how to extract values from JSON documents with the JsonPath API Who This Book Is For Intermediate or advanced Java programmers/developers.

Spring Persistence with Hibernate, Second Edition

2016-05-31 O'Reilly Amazon

book

Brian D. Murphy , Paul Fisher

data data-engineering database-management-tools object-relational-mapping hibernate Agile/Scrum

Learn how to use the core Hibernate APIs and tools as part of the Spring Framework. This book illustrates how these two frameworks can be best utilized. Other persistence solutions available in Spring are also shown including the Java Persistence API (JPA). Spring Persistence with Hibernate, Second Edition has been updated to cover Spring Framework version 4 and Hibernate version 5. After reading and using this book, you'll have the fundamentals to apply these persistence solutions into your own mission-critical enterprise Java applications that you build using Spring. Persistence is an important set of techniques and technologies for accessing and using data, and ensuring that data is mobile regardless of specific applications and contexts. In Java development, persistence is a key factor in enterprise, e-commerce, and other transaction-oriented applications. Today, the agile and open source Spring Framework is the leading out-of-the-box, open source solution for enterprise Java developers; in it, you can find a number of Java persistence solutions What You'll Learn Use Spring Persistence, including using persistence tools in Spring as well as choosing the best Java persistence frameworks outside of Spring Take advantage of Spring Framework features such as Inversion of Control (IoC), aspect-oriented programming (AOP), and more Work with Spring JDBC, use declarative transactions with Spring, and reap the benefits of a lightweight persistence strategy Harness Hibernate and integrate it into your Spring-based enterprise Java applications for transactions, data processing, and more Integrate JPA for creating a well-layered persistence tier in your enterprise Java application Who This Book Is For This book is ideal for developers interested in learning more about persistence framework options on the Java platform, as well as fundamental Spring concepts. Because the book covers several persistence frameworks, it is suitable for anyone interested in learning more about Spring or any of the frameworks covered. Lastly, this book covers advanced topics related to persistence architecture and design patterns, and is ideal for beginning developers looking to learn more in these areas.

Professional Hadoop

2016-05-23 O'Reilly Amazon

book

Benoy Antony , Cheryl Adams , Cazen Lee , Konstantin Boudnik , Branky Shao , Kai Sasaki

data data-engineering Hadoop Big Data Java Kafka

The professional's one-stop guide to this open-source, Java-based big data framework Professional Hadoop is the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings. Written by an expert team of certified Hadoop developers, committers, and Summit speakers, this book details every key aspect of Hadoop technology to enable optimal processing of large data sets. Designed expressly for the professional developer, this book skips over the basics of database development to get you acquainted with the framework's processes and capabilities right away. The discussion covers each key Hadoop component individually, culminating in a sample application that brings all of the pieces together to illustrate the cooperation and interplay that make Hadoop a major big data solution. Coverage includes everything from storage and security to computing and user experience, with expert guidance on integrating other software and more. Hadoop is quickly reaching significant market usage, and more and more developers are being called upon to develop big data solutions using the Hadoop framework. This book covers the process from beginning to end, providing a crash course for professionals needing to learn and apply Hadoop quickly. Configure storage, UE, and in-memory computing Integrate Hadoop with other programs including Kafka and Storm Master the fundamentals of Apache Big Top and Ignite Build robust data security with expert tips and advice Hadoop's popularity is largely due to its accessibility. Open-source and written in Java, the framework offers almost no barrier to entry for experienced database developers already familiar with the skills and requirements real-world programming entails. Professional Hadoop gives you the practical information and framework-specific skills you need quickly.

Mastering Hibernate

2016-05-16 O'Reilly Amazon

book

Ramin Rad

data data-engineering database-management-tools object-relational-mapping hibernate Java

Mastering Hibernate is your comprehensive guide to understanding and mastering Hibernate, a powerful Object-Relational Mapping tool for Java and .Net applications. Through this book, you will dive deep into the mechanics of Hibernate, exploring its core concepts and architecture. Whether you're working with SQL or NoSQL data stores, this book ensures you can unlock Hibernate's full potential. What this Book will help me do Grasp the internal workings of Hibernate, including its session management and entity lifecycle. Optimize mapping between Java classes and relational database structures for better performance. Effectively manage relationships and collections within your data models using Hibernate features. Utilize Hibernate's caching systems to improve application performance and scalability. Handle multi-tenant database configurations with confidence using Hibernate's architectural capabilities. Author(s) None Rad is an experienced software developer and educator specializing in Java-based applications and enterprise architecture. With years of hands-on practice using Hibernate in real-world scenarios, None Rad has curated this book to serve as a clear and practical guide. Their writing reflects deep technical expertise combined with an approachable and illustrative teaching style, ensuring learning is both effective and engaging. Who is it for? This book is ideal for software developers and engineers who are familiar with Java or other similar object-oriented programming languages. Whether you're a professional looking to deepen your understanding of Hibernate's internals or a developer aiming to create more efficient ORM solutions, this book has something for you. Readers should have a basic understanding of Java and relational databases, but no prior Hibernate expertise is required. By the end, you'll be equipped to confidently apply Hibernate to sophisticated data challenges.

External Procedures, Triggers, and User-Defined Functions on IBM DB2 for i

2016-04-25 O'Reilly Amazon

book

Fredy Cruz , Satid Singkorapoom , Hernando Bedoya , Daniel Lema

data data-engineering relational-databases ibm-db2 IBM Java

Procedures, triggers, and user-defined functions (UDFs) are the key database software features for developing robust and distributed applications. IBM Universal Database™ for i (IBM DB2® for i) supported these features for many years, and they were enhanced in V5R1, V5R2, and V5R3 of IBM® OS/400® and V5R4 of IBM i5/OS™. This IBM Redbooks® publication includes several of the announced features for procedures, triggers, and UDFs in V5R1, V5R2, V5R3, and V5R4. This book includes suggestions, guidelines, and practical examples to help you effectively develop IBM DB2 for i procedures, triggers, and UDFs. The following topics are covered in this book: External stored procedures and triggers Java procedures (both Java Database Connectivity (JDBC) and Structured Query Language for Java (SQLJ)) External triggers External UDFs This publication also offers examples that were developed in several programming languages, including RPG, COBOL, C, Java, and Visual Basic, by using native and SQL data access interfaces. This book is part of the original IBM Redbooks publication, Stored Procedures, Triggers, and User-Defined Functions on DB2 Universal Database for iSeries, SG24-6503-02, that covered external procedures, triggers, and functions, and also SQL procedures, triggers, and functions. All of the information that relates to external routines was left in this publication. All of the information that relates to SQL routines was rewritten and updated. This information is in the new IBM Redbooks publication, SQL Procedures, Triggers, and Functions on IBM DB2 for i, SG24-8326. This book is intended for anyone who wants to develop IBM DB2 for i procedures, triggers, and UDFs. Before you read this book, you need to know about relational database technology and the application development environment on the IBM i server.

Spark

2016-03-21 O'Reilly Amazon

book

Brennon York , Ema Orhian , Kai Sasaki , Ilya Ganelin

data data-engineering apache-spark AI/ML Big Data Hadoop

Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more. Spark has become the tool of choice for many Big Data problems, with more active contributors than any other Apache Software project. General introductory books abound, but this book is the first to provide deep insight and real-world advice on using Spark in production. Specific guidance, expert tips, and invaluable foresight make this guide an incredibly useful resource for real production settings. Review Spark hardware requirements and estimate cluster size Gain insight from real-world production use cases Tighten security, schedule resources, and fine-tune performance Overcome common problems encountered using Spark in production Spark works with other big data tools including MapReduce and Hadoop, and uses languages you already know like Java, Scala, Python, and R. Lightning speed makes Spark too good to pass up, but understanding limitations and challenges in advance goes a long way toward easing actual production implementation. Spark: Big Data Cluster Computing in Production tells you everything you need to know, with real-world production insight and expert guidance, tips, and tricks.

MongoDB Cookbook - Second Edition - Second Edition

2016-01-13 O'Reilly Amazon

book

Amol Nayak , Cyrus Dasadia

data data-engineering nosql-databases MongoDB Cloud Computing Data Management

Designed to help developers and administrators harness the full potential of MongoDB, this book provides clear instruction and practical guidance no matter your level. By exploring both fundamental aspects like installation and configuration, and advanced topics like using cloud services, this book serves as a comprehensive reference for anyone navigating the modern NoSQL database capabilities of MongoDB. What this Book will help me do Understand how to install and configure MongoDB for different environments, enabling efficient setup and operation. Master database administration skills, including monitoring and backup strategies, which are essential for stability and performance. Develop applications with MongoDB using Java and Python, allowing integration into modern tech stacks. Leverage advanced querying and indexing techniques, improving data retrieval and operational efficiency. Integrate MongoDB with cloud platforms and tools like Hadoop, enhancing scalability and expanded use cases. Author(s) None Dasadia and None Nayak are seasoned database professionals with extensive experience in MongoDB and NoSQL database systems. Their practical approach to technical writing focuses on real-world applications and providing solutions to complex challenges. With backgrounds in software development and data management, they ensure that readers have a hands-on learning experience. Their passion for spreading knowledge makes this book both instructional and engaging. Who is it for? This book is ideal for database administrators and software developers interested in adopting or expanding their knowledge of MongoDB. If you're a complete novice or someone with experience who seeks hands-on solutions and examples, this book offers value. It's particularly suited for professionals working with Java or Python, as examples focus on these programming languages. Whether you're enhancing your skills for personal projects or looking to implement MongoDB at work, this resource equips you with the know-how.

Apache Solr: A Practical Approach to Enterprise Search

2015-12-28 O'Reilly Amazon

book

Dikshant Shahi

data data-engineering search solr Java

Build an enterprise search engine using Apache Solr: index and search documents; ingest data from varied sources; apply various text processing techniques; utilize different search capabilities; and customize Solr to retrieve the desired results. Apache Solr: A Practical Approach to Enterprise Search explains each essential concept--backed by practical and industry examples--to help you attain expert-level knowledge. The book, which assumes a basic knowledge of Java, starts with an introduction to Solr, followed by steps to setting it up, indexing your first set of documents, and searching them. It then introduces you to information retrieval and its implementation in Apache Solr; this will help you understand your search problem, decide the approach to build an effective solution, and use various metrics to evaluate the results. The book next covers the schema design and techniques to build a text analysis chain for cleansing, normalizing and enriching your documents and addressing different types of search queries. It describes various popular matching techniques which are generally applied to improve the precision and recall of searches. You will learn the end-to-end process of data ingestion from varied sources, metadata extraction, pre-processing and transformation of content, various search components, query parsers and other advanced search capabilities. After covering out-of-the-box features, Solr expert Dikshant Shahi dives into ways you can customize Solr for your business and its specific requirements, along with ways to plug in your own components. Most important, you will learn about implementations for Solr scoring, factors affecting the document score, and tuning the score for the application at hand. The book explains why textual scoring is not sufficient for practical ranking of documents and ways to integrate real-world factors for contributing to the document ranking. You'll see how to influence user experience by providing suggestions and recommendations. You'll also see integration of Solr with important related technologies such as OpenNLP and Tika. Additionally, you will learn about scaling Solr using SolrCloud. This book concludes with coverage of semantic search capabilities, which is crucial for taking the search experience to the next level. By the end of Apache Solr, you will be proficient in designing and developing your search engine.

Introducing SQLite for Mobile Developers

2015-12-26 O'Reilly Amazon

book

Jesse Feiler

data data-engineering relational-databases sqlite Java SQL

This brief book is a basic introduction to SQLite for iOS and Android developers. The book includes a simple introduction to SQL, a discussion of when to use SQLite, and chapters devoted to using SQLite with the most likely programming languages: Java, PHP, Swift and Objective-C. It then goes through adding simple database functionality to an Android or iOS app and finally a chapter on managing the app’s life cycle.

Learning PostgreSQL

2015-11-30 O'Reilly Amazon

book

Achim Vannahme , Salahaldin Juba , Andrey Volkov

data data-engineering relational-databases postgresql Java RDBMS

Unlock the potential of PostgreSQL, a powerful open-source relational database system, with 'Learning PostgreSQL.' This book takes you through essential concepts of relational databases, SQL syntax, and the advanced features of PostgreSQL, equipping you to build and manage efficient database solutions. What this Book will help me do Learn the foundational concepts behind relational databases and relational algebra. Set up and configure a PostgreSQL server and client for development use. Develop SQL queries for robust data manipulation and retrieval. Implement advanced features of PostgreSQL, including procedural programming with PL/pgSQL. Integrate PostgreSQL with Java applications using JDBC and Hibernate frameworks. Author(s) The authors of 'Learning PostgreSQL,' None Juba, Achim Vannahme, and None Volkov, bring extensive experience and expertise in software development and database management. They have a deep understanding of PostgreSQL as well as its integration with applications. Their collective approach emphasizes practical techniques, real-world scenarios, and enriched learning, making this book a valuable resource for learners of all levels. Who is it for? This book is perfect for students, database developers, and administrators seeking to learn PostgreSQL. It suits beginners with no prior knowledge and helps intermediates deepen their expertise. Readers will learn how to develop, maintain, and optimize PostgreSQL databases, making it ideal for those aiming to advance their database development skills.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

IBM DB2 Web Query for i: The Nuts and Bolts

Learning Apache Cassandra - Second Edition

MQTT Essentials - A Lightweight IoT Protocol

Sams Teach Yourself Hadoop in 24 Hours

Elasticsearch 5.x Cookbook - Third Edition

Beginning Hibernate: For Hibernate 5

Spark in Action

Learning IBM Bluemix

Fast Data Processing with Spark 2 - Third Edition

Hadoop Blueprints

PostgreSQL Development Essentials

Hadoop: Data Processing and Modelling

Interactive Spark using PySpark

Architecting HBase Applications

Cassandra: The Definitive Guide, 2nd Edition

Java XML and JSON

Spring Persistence with Hibernate, Second Edition

Professional Hadoop

Mastering Hibernate

External Procedures, Triggers, and User-Defined Functions on IBM DB2 for i

Spark

MongoDB Cookbook - Second Edition - Second Edition

Apache Solr: A Practical Approach to Enterprise Search

Introducing SQLite for Mobile Developers

Learning PostgreSQL