talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3406

Collection of O'Reilly books on Data Engineering.

Filtering by: data ×

Sessions & talks

Showing 1776–1800 of 3406 · Newest first

Search within this event →
IBM Platform Computing Solutions Reference Architectures and Best Practices

This IBM® Redbooks® publication demonstrates and documents that the combination of IBM System x®, IBM GPFS™, IBM GPFS-FPO, IBM Platform Symphony®, IBM Platform HPC, IBM Platform LSF®, IBM Platform Cluster Manager Standard Edition, and IBM Platform Cluster Manager Advanced Edition deliver significant value to clients in need of cost-effective, highly scalable, and robust solutions. IBM depth of solutions can help the clients plan a foundation to face challenges in how to manage, maintain, enhance, and provision computing environments to, for example, analyze the growing volumes of data within their organizations. This IBM Redbooks publication addresses topics to educate, reiterate, confirm, and strengthen the widely held opinion of IBM Platform Computing as the systems software platform of choice within an IBM System x environment for deploying and managing environments that help clients solve challenging technical and business problems. This IBM Redbooks publication addresses topics to that help answer customer’s complex challenge requirements to manage, maintain, and analyze the growing volumes of data within their organizations and provide expert-level documentation to transfer the how-to-skills to the worldwide support teams. This IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective computing solutions that help optimize business results, product development, and scientific discoveries.

Leveraging the IBM BPM Coach Framework in Your Organization

The IBM® Coach Framework is a key element of the IBM Business Process Manager (BPM) product suite. With the Coach Framework, process authors can create and maintain custom web-based user interfaces that are embedded within their business process solutions. This ability to create and maintain custom user interfaces is a key factor in the successful deployment of business process solutions. Coaches have proven to be an extremely powerful element of IBM BPM solutions, and with the release of IBM BPM version 8.0 they were rejuvenated to incorporate the recent advances in browser-based user interfaces. This IBM Redbooks® publication focuses on the capabilities that Coach Framework delivers with IBM BPM version 8.5, but much of what is shared in these pages continues to be of value as IBM evolves coaches in the future. This book has been produced to help you fully benefit from the power of the Coach Framework.

Hadoop For Dummies

Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters. Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.

Anonymous Communication Networks

This book examines anonymous communication networks as a solution to Internet privacy concerns. It explores various anonymous communication networks as possible solutions to Internet privacy concerns and identifies specific scenarios where it is best to remain anonymous. The text details the two main approaches to anonymous communication networks: onion routing and mixed networks. Using examples and case studies, it illustrates the usefulness of anonymous communication networks for web browsing, email, e-banking, and e-voting. It also includes guidance to help readers download and install Tor, I2P, JAP/JonDo, and QuickSilver.

IBM Tivoli Storage Productivity Center V5.1 Technical Guide

IBM® Tivoli® Storage Productivity Center V5.1 products offer storage infrastructure management that helps optimize storage management by centralizing, simplifying, automating, and optimizing storage tasks associated with storage systems, data disaster recovery, storage networks, and capacity management. IBM Tivoli Storage Productivity Center V5.1 products include: IBM Tivoli Storage Productivity Center V5.1 IBM Tivoli Storage Productivity Center Select Edition V5.1 Tivoli Storage Productivity Center Select Edition V5.1 offers the same features as Tivoli Storage Productivity Center V5.1 but at attractive entry-level pricing for operations with smaller capacities. It is licensed per storage device, such as disk controllers and their respective expansion units. This IBM Redbooks® publication is intended for storage administrators and users who are installing and using the features and functions in IBM Tivoli Storage Productivity Center V5.1. The information in this book can be used to plan for, install, and customize the components of Tivoli Storage Productivity Center in your storage infrastructure.

Beginning Hibernate, Third Edition

Beginning Hibernate, Third Edition is ideal if you're experienced in Java with databases (the traditional, or "connected," approach), but new to open-source, lightweight Hibernate, a leading object-relational mapping and database-oriented application development framework. This book packs in information about the release of the Hibernate 4.x persistence layer and provides a clear introduction to the current standard for object-relational persistence in Java. And since the book keeps its focus on Hibernate without wasting time on nonessential third-party tools, you'll be able to immediately start building transaction-based engines and applications. Experienced authors Joseph Ottinger with Dave Minter and Jeff Linwood provide more in-depth examples than any other book for Hibernate beginners. The authors also present material in a lively, example-based manner—not a dry, theoretical, hard-to-read fashion. What you'll learn How to build enterprise Java-based transaction-type applications that access complex data with Hibernate How to work with Hibernate 4 Where to integrate into the persistence life cycle How to map using annotations, Hibernate XML files, and more How to search and query with the new version of Hibernate How to integrate with MongoDB using NoSQL Who this book is for This book is for Java developers who want to learn about Hibernate.

Think Bigger
Big data--the enormous amount of data that is created as virtually every movement, transaction, and choice we make becomes digitized--is revolutionizing business. Offering real-world insight and explanations, this book provides a roadmap for organizations looking to develop a profitable big data strategy...and reveals why it's not something they can leave to the I.T. department.

Sharing best practices from companies that have implemented a big data strategy including Walmart, InterContinental Hotel Group, Walt Disney, and Shell, Think Bigger covers the most important big data trends affecting organizations, as well as key technologies like Hadoop and MapReduce, and several crucial types of analyses. In addition, the book offers guidance on how to ensure security, and respect the privacy rights of consumers. It also examines in detail how big data is impacting specific industries--and where opportunities can be found.

Big data is changing the way businesses--and even governments--are operated and managed. Think Bigger is an essential resource for anyone who wants to ensure that their company isn't left in the dust.

IBM zEnterprise System Technical Introduction

In a smarter planet, information-centric processes are exploding in growth. The mainframe has always been the IT industry's leading platform for transaction processing, consolidated and secure data serving, and support for available enterprise-wide applications. IBM® has extended the mainframe platform to help large enterprises reshape their client experiences through information-centric computing and to deliver on key business initiatives. IBM zEnterprise® is recognized as the most reliable and trusted system, and the most secure environment for core business operations. The new zEnterprise System consists of the IBM zEnterprise EC12 (zEC12) or IBM zEnterprise BC12 (zBC12), the IBM zEnterprise Unified Resource Manager, and the IBM zEnterprise IBM BladeCenter® Extension (zBX) Model 003. This IBM Redbooks® publication describes the zEC12 and zBC12, with their improved scalability, performance, security, resiliency, availability, and virtualization. The zEnterprise System has no peer as a trusted platform that also provides the most efficient transaction processing and database management. With efficiency at scale delivering significant cost savings on core processes, resources can be freed up to focus on developing new services to drive growth. This book provides a technical overview of the zEC12, zBC12, zBX Model 003, and Unified Resource Manager. This publication is intended for IT managers, architects, consultants, and anyone else who wants to understand the elements of the zEnterprise System. For this introduction to the zEnterprise System, readers are not expected to be familiar with current IBM System z® technology and terminology.

Responsive Mobile User Experience Using MQTT and IBM MessageSight

IBM® MessageSight is an appliance-based messaging server that is optimized to address the massive scale requirements of machine-to-machine (m2m) and mobile user scenarios. IBM MessageSight makes it easy to connect mobile customers to your existing messaging enterprise system, enabling a substantial number of remote clients to be concurrently connected. The MQTT protocol is a lightweight messaging protocol that uses publish/subscribe architecture to deliver messages over low bandwidth or unreliable networks. A publish/subscribe architecture works well for HTML5, native, and hybrid mobile applications by removing the wait time of a request/response model. This creates a better, richer user experience. The MQTT protocol is simple, which results in a client library with a low footprint. MQTT was proposed as an Organization for the Advancement of Structured Information Standards (OASIS) standard. This book provides information about version 3.1 of the MQTT specification. This IBM Redbooks® publication provides information about how IBM MessageSight, in combination with MQTT, facilitates the expansion of enterprise systems to include mobile devices and m2m communications. This book also outlines how to connect IBM MessageSight to an existing infrastructure, either through the use of IBM WebSphere® MQ connectivity or the IBM Integration Bus (formerly known as WebSphere Message Broker). This book describes IBM MessageSight product features and facilities that are relevant to technical personnel, such as system architects, to help them make informed design decisions regarding the integration of the messaging appliance into their enterprise architecture. Using a scenario-based approach, you learn how to develop a mobile application, and how to integrate IBM MessageSight with other IBM products. This publication is intended to be of use to a wide-ranging audience.

Storm Blueprints: Patterns for Distributed Real-time Computation

"Storm Blueprints: Patterns for Distributed Real-time Computation" takes you on a hands-on journey into understanding and implementing distributed real-time processing with Apache Storm. Through real-world examples and projects, you'll gain a sound understanding of the fundamentals and learn to design systems capable of resilient, scalable, and fast computation. What this Book will help me do Understand the essentials of Apache Storm and its architecture. Learn to deploy and manage Storm in different modes, including distributed clusters. Discover design patterns for real-time data flow in distributed systems. Master the implementation of fault tolerance and continuous availability in processing. Analyze system performance insights through practical integrations and use cases. Author(s) The author(s) of 'Storm Blueprints' bring extensive experience in distributed systems engineering and real-time computations. Their passion for sharing knowledge is evident in this approachable yet comprehensive book. With years of practical experience, they offer insights and proven techniques to empower readers to build practical distributed systems. Who is it for? This book is designed for software engineers and developers working on data pipelines and real-time processing systems. Beginners to Storm will find it an excellent introduction, while those with experience will appreciate the advanced design patterns and use cases. If you aim to leverage Storm effectively in distributed architectures, this guide is tailored for you.

DFSMSrmm Primer

DFSMSrmm from IBM® is the full function tape management system available in IBM OS/390® and IBM z/OS®. With DFSMSrmm, you can manage all types of tape media at the shelf, volume, and data set level, simplifying the tasks of your tape librarian. Are you a new DFSMSrmm user? Then, this IBM Redbooks® publication introduces you to the DFSMSrmm basic concepts and functions. You learn how to manage your tape environment by implementing the DFSMSrmm management policies. Are you already using DFSMSrmm? In that case, this publication provides the most up-to-date information about the new functions and enhancements introduced with the latest release of DFSMSrmm. You will find useful information for implementing these new functions and getting more benefits from DFSMSrmm. Do you want to test DFSMSrmm functions? If you are using another tape management system and are thinking about converting to DFSMSrmm, you can start DFSMSrmm and run it in parallel with your current system for testing purposes. This book is intended to be a starting point for new professionals and a handbook for using the basic DFSMSrmm functions.

Solr in Action

Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities. About the Technology About the Book Whether you're handling big (or small) data, managing documents, or building a website, it is important to be able to quickly search through your content and discover meaning in it. Apache Solr is your tool: a ready-to-deploy, Lucene-based, open source, full-text search engine. Solr can scale across many servers to enable real-time queries and data analytics across billions of documents. Solr in Action teaches you to implement scalable search using Apache Solr. This easy-to-read guide balances conceptual discussions with practical examples to show you how to implement all of Solr's core capabilities. You'll master topics like text analysis, faceted search, hit highlighting, result grouping, query suggestions, multilingual search, advanced geospatial and data operations, and relevancy tuning. What's Inside How to scale Solr for big data Rich real-world examples Solr as a NoSQL data store Advanced multilingual, data, and relevancy tricks Coverage of versions through Solr 4.7 About the Reader This book assumes basic knowledge of Java and standard database technology. No prior knowledge of Solr or Lucene is required. About the Authors Trey Grainger is a director of engineering at CareerBuilder. Timothy Potter is a senior member of the engineering team at LucidWorks. The authors work on the scalability and reliability of Solr, as well as on recommendation engine and big data analytics technologies. Quotes The knowledge and techniques you need. - From the Foreword by Yonik Seeley, Creator of Solr Readable and immediately applicable ... an excellent book. - John Viviano, InterCorp, Inc. The go-to guide for Solr ... a definitive resource for both beginners and experts. - Scott Anthony, Business Instruments A well-dosed combination of deep technical knowledge and real-world experience. - Alexandre Madurell, Piksel, Inc.

Google Maps

Create custom applications with the Google Maps API Featuring step-by-step examples, this practical resource gets you started programming the Google Maps API with JavaScript in no time. Learn how to embed maps on web pages, annotate the embedded maps with your data, generate KML files to store and reuse your map data, and enable client applications to request spatial data through web services. Google Maps: Power Tools for Maximizing the API explains techniques for visualizing masses of data and animating multiple items on the map. You’ll also find out how to embed Google maps in desktop applications to combine the richness of the Windows interface with the unique features of the API. You can use the numerous samples included throughout this hands-on guide as your starting point for building customized applications. Create map-enabled web pages with a custom look Learn the JavaScript skills required to exploit the Google Maps API Create highly interactive interfaces for mapping applications Embed maps in desktop applications written in .NET Annotate maps with labels, markers, and shapes Understand geodesic paths and shapes and perform geodesic calculations Store geographical data in KML format Add GIS features to mapping applications Store large sets of geography data in databases and perform advanced spatial queries Use web services to request spatial data from within your script on demand Automate the generation of standalone web pages with annotated maps Use the Geocoding and Directions APIs Visualize large data sets using symbols and heatmaps Animate items on a map Bonus online content includes: A tutorial on The SQL Spatial application A bonus chapter on animating multiple airplanes Three appendices: debugging scripts in the browser; scalable vector graphics; and applying custom styles

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

“This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.” —From the Foreword by Raymie Stata, CEO of Altiscale The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances. Apache Hadoop™ YARN, YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it. Coverage includes YARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN

Deployment Guide for InfoSphere Guardium

IBM® InfoSphere® Guardium® provides the simplest, most robust solution for data security and data privacy by assuring the integrity of trusted information in your data center. InfoSphere Guardium helps you reduce support costs by automating the entire compliance auditing process across heterogeneous environments. InfoSphere Guardium offers a flexible and scalable solution to support varying customer architecture requirements. This IBM Redbooks® publication provides a guide for deploying the Guardium solutions. This book also provides a roadmap process for implementing an InfoSphere Guardium solution that is based on years of experience and best practices that were collected from various Guardium experts. We describe planning, installation, configuration, monitoring, and administrating an InfoSphere Guardium environment. We also describe use cases and how InfoSphere Guardium integrates with other IBM products. The guidance can help you successfully deploy and manage an IBM InfoSphere Guardium system. This book is intended for the system administrators and support staff who are responsible for deploying or supporting an InfoSphere Guardium environment.

The SAP Materials Management Handbook

This handbook provides a complete understanding of how to configure and implement the SAP materials management module across various types of projects. It uses system screenshots of SAP environments to illustrate the complete flow of business transactions involved with SAP MM. Supplying detailed explanations of the steps involved, it presents case studies from actual projects that demonstrate how to convert theory into powerful SAP MM solutions. The book explains how to use the SAP MM module to take care of the complete range of business functions related to purchasing and inventory management.

Beginning Oracle SQL: for Oracle Database 12c, Third Edition

Beginning Oracle SQL is your introduction to the interactive query tools and specific dialect of SQL used with Oracle Database. These tools include SQLPlus and SQL Developer. SQLPlus is the one tool any Oracle developer or database administrator can always count on, and it is widely used in creating scripts to automate routine tasks. SQL Developer is a powerful, graphical environment for developing and debugging queries. Oracle's is possibly the most valuable dialect of SQL from a career standpoint. Oracle's database engine is widely used in corporate environments worldwide. It is also found in many government applications. Oracle SQL implements many features not found in competing products. No developer or DBA working with Oracle can afford to be without knowledge of these features and how they work, because of the performance and expressiveness they bring to the table. Written in an easygoing and example-based style, Beginning Oracle SQL is the book that will get you started down the path to successfully writing SQL statements and getting results from Oracle Database. Takes an example-based approach, with clear and authoritative explanations Introduces both SQL and the query tools used to execute SQL statements Shows how to create tables, populate them with data, and then query that data to generate business results What you'll learn Create database tables and define their relationships. Add data to your tables. Then change and delete that data. Write database queries that generate accurate results. Avoid common traps and pitfalls in writing SQL queries, especially from nulls. Reap the performance and expressiveness of analytic and window functions. Make use of Oracle Database's support for object types. Write recursive queries to query hierarchical data. Who this book is for Beginning Oracle SQL is aimed at developers and database administrators who must write SQL statements to execute against an Oracle database. No prior knowledge of SQL is assumed.

Process Modeling Style

Process Modeling Style focuses on other aspects of process modeling beyond notation that are very important to practitioners. Many people who model processes focus on the specific notation used to create their drawings. While that is important, there are many other aspects to modeling, such as naming, creating identifiers, descriptions, interfaces, patterns, and creating useful process documentation. Experience author John Long focuses on those non-notational aspects of modeling, which practitioners will find invaluable. Gives solid advice for creating roles, work products, and processes Instucts on how to organize and structure the parts of a process Gives examples of documents you should use to define a set of processes

(MCTS) Microsoft BizTalk Server (70-595) Certification and Assessment Guide: Second Edition

This comprehensive guide prepares intermediate BizTalk developers to excel in the Microsoft BizTalk Server 2010 (70-595) certification exam. With in-depth coverage of essential concepts, practical examples, and end-to-end solutions, the book ensures you have the skills and knowledge necessary to become a BizTalk expert. What this Book will help me do Master the core architecture and functionalities of Microsoft BizTalk Server. Develop skills to create advanced schemas and maps with enhanced logic functionalities. Understand how to manage orchestrations, transactions, and handle exceptions efficiently. Learn administrative tasks, including configuration and troubleshooting, for BizTalk server environments. Explore integration with web services, WCF, and additional BizTalk features like EDI and BAM. Author(s) This book is written by a team of experienced BizTalk professionals who have hands-on working knowledge with Microsoft BizTalk Server. Their expertise encompasses enterprise-level solution architecture and implementation. They bring their comprehensive understanding and teaching aptitude together in this book, ensuring a balance of detailed technical content and accessible learning. Who is it for? This book is ideal for intermediate-level BizTalk developers focusing on obtaining the Microsoft BizTalk Server 2010 (70-595) certification. It is suitable for individuals with basic knowledge of BizTalk concepts and working with orchestrations. A foundation in WCF and understanding of EDI is recommended to benefit fully from the content of this book.

Microsoft Big Data Solutions

Tap the power of Big Data with Microsoft technologies Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies. Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop. Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools Explores both on-premises and cloud-based solutions Shows how to store, manage, analyze, and share Big Data through the enterprise Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more Helps you build and execute a Big Data plan Includes contributions from the Microsoft and HortonWorks Big Data product teams If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.

IBM High Performance Computing Cluster Health Check

This IBM® Redbooks® publication provides information about aspects of performing infrastructure health checks, such as checking the configuration and verifying the functionality of the common subsystems (nodes or servers, switch fabric, parallel file system, job management, problem areas, and so on). This IBM Redbooks publication documents how to monitor the overall health check of the cluster infrastructure, to deliver technical computing clients cost-effective, highly scalable, and robust solutions. This IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost-effective Technical Computing and IBM High Performance Computing (HPC) solutions to optimize business results, product development, and scientific discoveries. This book provides a broad understanding of a new architecture.

IBM Worklight Mobile Application Development Essentials

Discover how to develop robust mobile applications using IBM Worklight. This tutorial provides guided, hands-on practices to explore the capabilities of IBM Worklight and apply them to optimize your mobile development process. You will learn to leverage its tools for greater development efficiency and scalability. What this Book will help me do Understand the architecture and components of IBM Worklight. Learn to create and deploy mobile applications using IBM Worklight. Master techniques for optimizing the performance of your applications. Gain insights into integrating mobile applications with backend systems. Develop skills to troubleshoot and maintain mobile applications efficiently. Author(s) The author brings substantial technical expertise in mobile application development and a deep understanding of IBM Worklight. With years of experience teaching developers and working on cutting-edge projects, they have honed their ability to convey complex topics in an accessible manner. Their approach is practical, focusing on real-world application and problem-solving. Who is it for? If you are a developer or programmer looking to expand your skills into the mobile application domain, this book is for you. It's tailored for those with a basic programming background who want to learn IBM Worklight comprehensively. Whether you're an individual aiming to develop mobile apps for fun or a professional interested in integrating mobile solutions into your workplace, this book will meet your needs and boost your mobile development proficiency.

IBM XIV Storage System Copy Services and Migration

This IBM® Redbooks® publication provides a practical understanding of the IBM XIV® Storage System copy and migration functions. The XIV Storage System has a rich set of copy functions suited for various data protection scenarios, which enables clients to enhance their business continuance, data migration, and online backup solutions. These functions allow point-in-time copies, known as snapshots and full volume copies, and also include remote copy capabilities in either synchronous or asynchronous mode. These functions are included in the XIV software and all their features are available at no additional charge. The various copy functions are reviewed in separate chapters, which include detailed information about usage, and also practical illustrations. Finally, the book illustrates the use of IBM Tivoli® Storage Productivity Center for Replication to manage XIV Copy Services. This book is intended for anyone who needs a detailed and practical understanding of the XIV copy functions.

Optimizing Hadoop for MapReduce

"Optimizing Hadoop for MapReduce" is your comprehensive guide to getting the best performance out of your Hadoop-based big data processing jobs. With a focus on practical application rather than theory, this book delves into the nuances of MapReduce job design, execution, and optimization to help you harness the full power of this technology. What this Book will help me do Understand the internal workings of Hadoop MapReduce and how it executes jobs. Master key optimization techniques to improve Hadoop job efficiency and resource use. Learn advanced MapReduce programming concepts to handle complex data processing tasks. Analyze and monitor Hadoop job performance using practical tools and methods. Integrate best practices for scaling production workloads in a Hadoop cluster. Author(s) Khaled Tannir is a seasoned software engineer and an expert in distributed systems, big data, and cloud technologies. He has decades of experience designing and optimizing systems for high-performance data processing. Khaled's hands-on approach to explaining technical concepts ensures readers gain practical, applied knowledge that can be immediately implemented in real-world projects. Who is it for? This book is intended for developers, data engineers, and system architects who work with or are planning to work with Apache Hadoop. Ideal readers should have basic familiarity with Hadoop concepts and a foundational understanding of distributed systems. This book will benefit professionals looking to optimize their Hadoop-based applications or understand advanced usage of MapReduce. Whether you're aiming to improve your existing knowledge or implement high-performance data solutions, this book is tailored for you.

Mule in Action, Second Edition

Mule in Action, Second Edition is a totally-revised guide covering Mule 3 fundamentals and best practices. It starts with a quick ESB overview and then dives into rich examples covering core concepts like sending, receiving, routing, and transforming data. About the Technology An enterprise service bus is a way to integrate enterprise applications using a bus-like infrastructure. Mule is the leading open source Java ESB. It borrows from the Hohpe/Woolf patterns, is lightweight, can publish REST and SOAP services, integrates well with Spring, is customizable, scales well, and is cloud-ready. About the Book Mule in Action, Second Edition is a totally revised guide covering Mule 3 fundamentals and best practices. It starts with a quick ESB overview and then dives into rich examples covering core concepts like sending, receiving, routing, and transforming data. You'll get a close look at Mule's standard components and how to roll out custom ones. You'll also pick up techniques for testing, performance tuning, and BPM orchestration, and explore cloud API integration for SaaS applications. Written for developers, architects, and IT managers, this book requires familiarity with Java but no previous exposure to Mule or other ESBs. What's Inside Full coverage of Mule 3 Integration with cloud services Common transports, routers, and transformers Security, routing, orchestration, and transactions About the Reader Written for developers, architects, and IT managers, this book requires familiarity with Java but no previous exposure to Mule or other ESBs. About the Authors David Dossot is a software architect and has created numerous modules and transports for Mule. John D'Emic is a principal solutions architect and Victor Romero a solutions architect, both at MuleSoft, Inc. Quotes Captures the essence of pragmatism that is the founding principle of Mule. - From the Foreword by Ross Mason, Creator of Mule A new, in-depth perspective. - Dan Barber, Penn Mutual Excellent topic coverage and code examples. - Davide Piazza, Thread Solutions srl, MuleSoft Partner This edition has grown, with more real-world examples and a thorough grounding in messaging. - Keith McAlister, CGI