O'Reilly Data Engineering Books

Centrally Managing Access to Self-Encrypting Drives in Lenovo System x Servers Using IBM Security Key Lifecycle Manager

2015-03-16 O'Reilly Amazon

book

Angelo Parisi , Ryan Bradley

data data-engineering IBM Cyber Security

Data security is one of the paramount requirements for organizations of all sizes. Although many companies invested heavily in protection from network-based attacks and other threats, few effective safeguards are available to protect against potentially costly exposures of proprietary data that results from a hard disk drive being stolen, misplaced, retired, or redeployed. Self-encrypting drives (SEDs) can satisfy this need by providing the ultimate in security for data-at-rest and can help reduce IT drive retirement costs in the data center. Self-encrypting drives are also an excellent choice if you must comply with government or industry regulations for data privacy and encryption. To effectively manage a large deployment of SEDs in Lenovo® System x® servers, an organization must rely on a centralized key management solution. This IBM Redbooks® publication explains the technology behind SEDs and demonstrates how to deploy a key management solution that uses IBM Security Key Lifecycle Manager and properly setup your System x servers.

IBM DS8870 Copy Services for Open Systems

2015-03-16 O'Reilly Amazon

book

Warren Stanley , Mark Wells , Bertrand Dufrasne , Don Skilton , Sherri Brunson , Alexander Warmuth

data data-engineering IBM

This IBM® Redbooks® publication helps you plan, install, tailor, configure, and manage Copy Services for Open Systems environments on the IBM DS8870. This book helps you design and implement a new Copy Services installation or migrate from an existing installation. It includes hints and tips to maximize the effectiveness of your installation, and information about tools and products to automate Copy Services functions. It is intended for anyone who needs a detailed and practical understanding of the DS8870 Copy Services. There is a companion book that supports the configuration of the Copy Services functions in an IBM z/OS® environment, IBM System Storage DS8000 Copy Services for IBM z Systems™, SG24-6787.

Hadoop Virtualization

2015-03-15 O'Reilly Amazon

book

Courtney Webster

data data-engineering Hadoop Cloud Computing

Hadoop was built to use local data storage on a dedicated group of commodity hardware, but many organizations are choosing to save money (and operational headaches) by running Hadoop in the cloud. This O'Reilly report focuses on the benefits of deploying Hadoop to a private cloud environment, and provides an overview of best practices to maximize performance. Private clouds provide lower capital expenses than on-site clusters and offer lower operating expenses than public cloud deployment. Author Courtney Webster shows you what's involved in Hadoop virtualization, and how you can efficiently plan a private cloud deployment. Topics include: How Hadoop virtualization offers scalable capability for future growth and minimal downtime Why a private cloud offers unique benefits with comparable (and even improved) performance How you can literally set up Hadoop in a private cloud in minutes How aggregation can be used on top of (or instead of) virtualization Which resources and practices are best for a private cloud deployment How cloud-based management tools lower the complexity of initial configuration and maintenance

Big Data

2015-03-09 O'Reilly Amazon

book

Bernard Marr

data data-engineering Analytics Big Data Data Analytics

Convert the promise of big data into real world results There is so much buzz around big data. We all need to know what it is and how it works - that much is obvious. But is a basic understanding of the theory enough to hold your own in strategy meetings? Probably. But what will set you apart from the rest is actually knowing how to USE big data to get solid, real-world business results - and putting that in place to improve performance. Big Data will give you a clear understanding, blueprint, and step-by-step approach to building your own big data strategy. This is a well-needed practical introduction to actually putting the topic into practice. Illustrated with numerous real-world examples from a cross section of companies and organisations, Big Data will take you through the five steps of the SMART model: Start with Strategy, Measure Metrics and Data, Apply Analytics, Report Results, Transform. Discusses how companies need to clearly define what it is they need to know Outlines how companies can collect relevant data and measure the metrics that will help them answer their most important business questions Addresses how the results of big data analytics can be visualised and communicated to ensure key decisions-makers understand them Includes many high-profile case studies from the author's work with some of the world's best known brands

Pro T-SQL Programmer’s Guide, 4th Edition

2015-03-09 O'Reilly Amazon

book

Jay Natarajan , Michael Coles , Miguel Cebollero

data data-engineering SQL Microsoft SQL Server

Pro T–SQL Programmer’s Guide is your guide to making the best use of the powerful, Transact-SQL programming language that is built into Microsoft SQL Server's database engine. This edition is updated to cover the new, in-memory features that are part of SQL Server 2014. Discussing new and existing features, the book takes you on an expert guided tour of Transact–SQL functionality. Fully functioning examples and downloadable source code bring technically accurate and engaging treatment of Transact–SQL into your own hands. Step–by–step explanations ensure clarity, and an advocacy of best–practices will steer you down the road to success. Transact–SQL is the language developers and DBAs use to interact with SQL Server. It’s used for everything from querying data, to writing stored procedures, to managing the database. Support for in-memory stored procedures running queries against in-memory tables is new in the language and gets coverage in this edition. Also covered are must-know features such as window functions and data paging that help in writing fast-performing database queries. Developers and DBAs alike can benefit from the expressive power of T-SQL, and Pro T-SQL Programmer's Guide is your roadmap to success in applying this increasingly important database language to everyday business and technical tasks. Covers the newly-introduced, in-memory database features Shares the best practices used by experienced professionals Goes deeply into the subject matter - an advanced book for the serious reader

Implementing the IBM Storwize V3700

2015-03-05 O'Reilly Amazon

book

Lee Sirett , Jon Tate , Chris Tapsell , Adam Lyon-Jones , Paulo Tomiyoshi Takeda

data data-engineering IBM

Organizations of all sizes are faced with the challenge of managing massive volumes of increasingly valuable data. However, storing this data can be costly, and extracting value from the data is becoming more and more difficult. IT organizations have limited resources, but must stay responsive to dynamic environments and act quickly to consolidate, simplify, and optimize their IT infrastructures. The IBM® Storwize® V3700 system provides a solution that is affordable, easy to use, and self-optimizing, which enables organizations to overcome these storage challenges. Storwize V3700 delivers efficient, entry-level configurations that are specifically designed to meet the needs of small and midsize businesses. Designed to provide organizations with the ability to consolidate and share data at an affordable price, Storwize V3700 offers advanced software capabilities that are usually found in more expensive systems. Built on innovative IBM technology, Storwize V3700 addresses the block storage requirements of small and midsize organizations, Storwize V3700 is designed to accommodate the most common storage network technologies. This design enables easy implementation and management. Storwize V3700 includes the following features: Web-based GUI provides point-and-click management capabilities. Internal disk storage virtualization enables rapid, flexible provisioning and simple configuration changes. Thin provisioning enables applications to grow dynamically, but only use space they actually need. Enables simple data migration from external storage to Storwize V3700 storage (one-way from another storage device). Remote Mirror creates copies of data at remote locations for disaster recovery. IBM FlashCopy® creates instant application copies for backup or application testing. This IBM Redbooks® publication is intended for pre-sales and post-sales technical support professionals and storage administrators. The concepts in this book also relate to the IBM Storwize V3500. This book was written at a software level of version 7 release 4.

Beginning JSON

2015-03-04 O'Reilly Amazon

book

Ben Smith

data data-engineering storage-formats JSON JavaScript

Beginning JSON is the definitive guide to JSON - JavaScript Object Notation - today’s standard in data formatting for the web. The book starts with the basics, and walks you through all aspects of using the JSON format. Beginning JSON covers all areas of JSON from the basics of data formats to creating your own server to store and retrieve persistent data. Beginning JSON provides you with the skill set required for reading and writing properly validated JSON data. The first two chapters of the book will discuss the foundations of JavaScript for those who need it, and provide the necessary understandings for later chapters. Chapters 3 through 12 will uncover what data is, how to convert that data into a transmittable/storable format, how to use AJAX to send and receive JSON, and, lastly, how to reassemble that data back into a proper JavaScript object to be used by your program. The final chapters put everything you learned into practice.

Hibernate Recipes: A Problem-Solution Approach, Second Edition

2015-03-04 O'Reilly Amazon

book

Gary Mak , Joseph Ottinger , Srinivas Guruzu

data data-engineering database-management-tools object-relational-mapping hibernate Java

Hibernate Recipes, Second Edition contains a collection of code recipes and templates for learning and building Hibernate solutions for you and your clients, including how to work with the Spring Framework and the JPA. This book is your pragmatic day-to-day reference and guide for doing all things involving Hibernate. There are many books focused on learning Hibernate, but this book takes you further and shows how you can apply it practically in your daily work. Hibernate Recipes, Second Edition is a must have book for your library. Hibernate 4.x continues to be the most popular out-of-the-box, open source framework solution for Java persistence and data/database accessibility techniques and patterns and it works well with the most popular open source enterprise Java framework of all, the Spring Framework. Hibernate is used for e-commerce–based web applications as well as heavy-duty transactional systems for the enterprise.

Coordination Abilities in Volleyball

2015-03-03 O'Reilly Amazon

book

Jaromír Šimonek

data data-engineering zookeeper

The author presents a general view on sports training, its eriodization and the role of coordination in the initial stages of preparation in volleyball. He also deals with inter-gender differences in levels of such abilities, describing motor tests for the assessment of coordination potential and providing the reader with standards for the development of talented players. Based on the nature of volleyball, the author analyses key features of sports performance. Coordination abilities, especially in the period of puberty, play an important role in the creation of coordination basis – prerequisites for the development of physical fitness and acquisition of motor skills. Based on the results of his own research, as well as studies conducted by international sports scientists, he offers a model for the development of coordination abilities in volleyball. This method is recommended for coaches to improve their professional work in volleyball classes and schools, and in sports clubs. In the long-term, application of the proposed model should contribute to the improvement of players‘ performance in competition.

Big Data Revolution

2015-03-02 O'Reilly Amazon

book

Patrick McSharry , Rob Thomas

data data-engineering Big Data IBM

Exploit the power and potential of Big Data to revolutionize business outcomes Big Data Revolution is a guide to improving performance, making better decisions, and transforming business through the effective use of Big Data. In this collaborative work by an IBM Vice President of Big Data Products and an Oxford Research Fellow, this book presents inside stories that demonstrate the power and potential of Big Data within the business realm. Readers are guided through tried-and-true methodologies for getting more out of data, and using it to the utmost advantage. This book describes the major trends emerging in the field, the pitfalls and triumphs being experienced, and the many considerations surrounding Big Data, all while guiding readers toward better decision making from the perspective of a data scientist. Companies are generating data faster than ever before, and managing that data has become a major challenge. With the right strategy, Big Data can be a powerful tool for creating effective business solutions – but deep understanding is key when applying it to individual business needs. Big Data Revolution provides the insight executives need to incorporate Big Data into a better business strategy, improving outcomes with innovation and efficient use of technology. Examine the major emerging patterns in Big Data Consider the debate surrounding the ethical use of data Recognize patterns and improve personal and organizational performance Make more informed decisions with quantifiable results In an information society, it is becoming increasingly important to make sense of data in an economically viable way. It can drive new revenue streams and give companies a competitive advantage, providing a way forward for businesses navigating an increasingly complex marketplace. Big Data Revolution provides expert insight on the tool that can revolutionize industries.

Field Guide to Hadoop

2015-03-02 O'Reilly Amazon

book

Marshall Presser , Kevin Sitto

data data-engineering Hadoop Avro Big Data Cassandra

If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You’ll quickly understand how Hadoop’s projects, subprojects, and related technologies work together. Each chapter introduces a different topic—such as core technologies or data transfer—and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you’ll have a good grasp of the playing field. Topics include: Core technologies—Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark Database and data management—Cassandra, HBase, MongoDB, and Hive Serialization—Avro, JSON, and Parquet Management and monitoring—Puppet, Chef, Zookeeper, and Oozie Analytic helpers—Pig, Mahout, and MLLib Data transfer—Scoop, Flume, distcp, and Storm Security, access control, auditing—Sentry, Kerberos, and Knox Cloud computing and virtualization—Serengeti, Docker, and Whirr

GPS Satellite Surveying, 4th Edition

2015-03-02 O'Reilly Amazon

book

Dmitry Tatarnikov , Alfred Leick , Lev Rapoport

data data-engineering location-data geographic-information-system-gis geographic information system (gis)

Employ the latest satellite positioning tech with this extensive guide GPS Satellite Surveying is the classic text on the subject, providing the most comprehensive coverage of global navigation satellite systems applications for surveying. Fully updated and expanded to reflect the field's latest developments, this new edition contains new information on GNSS antennas, Precise Point Positioning, Real-time Relative Positioning, Lattice Reduction, and much more. New contributors offer additional insight that greatly expands the book's reach, providing readers with complete, in-depth coverage of geodetic surveying using satellite technologies. The newest, most cutting-edge tools, technologies, and applications are explored in-depth to help readers stay up to date on best practices and preferred methods, giving them the understanding they need to consistently produce more reliable measurement. Global navigation satellite systems have an array of uses in military, civilian, and commercial applications. In surveying, GNSS receivers are used to position survey markers, buildings, and road construction as accurately as possible with less room for human error. GPS Satellite Surveying provides complete guidance toward the practical aspects of the field, helping readers to: Get up to speed on the latest GPS/GNSS developments Understand how satellite technology is applied to surveying Examine in-depth information on adjustments and geodesy Learn the fundamentals of positioning, lattice adjustment, antennas, and more The surveying field has seen quite an evolution of technology in the decade since the last edition's publication. This new edition covers it all, bringing the reader deep inside the latest tools and techniques being used on the job. Surveyors, engineers, geologists, and anyone looking to employ satellite positioning will find GPS Satellite Surveying to be of significant assistance.

Neo4j High Performance

2015-03-02 O'Reilly Amazon

book

Sonal Raj

data data-engineering graph-databases Neo4j API

Dive into the world of graph databases with "Neo4j High Performance." This book takes you through the intricacies of designing, building, and maintaining robust and scalable graph-based systems tailored for your application's specific needs. Whether you're optimizing your database structures or exploring performance enhancements, this guide equips you with the skills to utilize Neo4j effectively. What this Book will help me do Understand the fundamentals of graph database principles and Neo4j's architecture. Learn how to design efficient graph data schemas to optimize performance. Develop the ability to customize Neo4j operations for high-traffic applications. Master advanced indexing and querying techniques to unlock the full potential of your data. Gain expertise in Neo4j's REST API and practical scenarios, including building recommendation systems. Author(s) Sonal Raj is a seasoned expert in graph databases and related technologies, specializing in Neo4j. With hands-on experience in solving complex data problems using graph systems, Sonal brings deep insights and practical usage paradigms to this book. Passionate about sharing knowledge, Sonal ensures this material bridges the gap from beginner understanding to expert application. Who is it for? This book is perfect for professionals and enthusiasts eager to excel in graph database technologies. If you're familiar with basic graph theory or have practical experience with Neo4j, you'll find this book insightful. Beginners seeking a structured introduction and advanced users pursuing optimization techniques will benefit equally. Ideal for developers aiming to scale their applications using graph data efficiently.

Mastering Elasticsearch - Second Edition

2015-02-27 O'Reilly Amazon

book

Marek Rogozinski

data data-engineering search elasticsearch ELK

Delve deeper into Elasticsearch in "Mastering Elasticsearch - Second Edition" to gain comprehensive insights into advanced querying, data indexing, and internal workings of Elasticsearch servers. With this book, you'll enhance your ability to implement powerful search solutions and optimize performance with confidence. What this Book will help me do Build advanced querying skills to utilize the Elasticsearch Query DSL effectively. Gain hands-on understanding of optimal data indexing for your Elasticsearch applications. Learn to improve user search experiences by tailoring Elasticsearch functionalities. Master Elasticsearch performance tuning and server optimization techniques. Develop custom Elasticsearch plugins to expand its core capabilities. Author(s) Marek Rogozinski, a seasoned Elasticsearch developer, brings years of professional expertise to this comprehensive guide. With a focus on practical and actionable knowledge, Marek has crafted this edition for users eager to deepen their Elasticsearch proficiency. His hands-on approach ensures you can apply the lessons directly and effectively. Who is it for? Ideal readers are those experienced with Elasticsearch, familiar with Query DSL and indexing techniques, and looking to expand their technical capabilities. Whether you're an Elasticsearch administrator, developer, or enthusiast, this book will enable you to master advanced topics and achieve your goals in search technology.

Apache Hive Essentials

2015-02-26 O'Reilly Amazon

book

Dayong Du

data data-engineering Hadoop apache-hive Analytics Big Data

Apache Hive Essentials is the perfect guide for understanding and mastering Hive, the SQL-like big data query language built on top of Hadoop. With this book, you will gain the skills to effectively use Hive to analyze and manage large data sets. Whether you're a developer, data analyst, or just curious about big data, this hands-on guide will enhance your capabilities. What this Book will help me do Understand the core concepts of Hive and its relation to big data and Hadoop. Learn how to set up a Hive environment and integrate it with Hadoop. Master the SQL-like query functionalities of Hive to select, manipulate, and analyze data. Develop custom functions in Hive to extend its functionality for your own specific use cases. Discover best practices for optimizing Hive performance and ensuring data security. Author(s) Dayong Du is an expert in big data analytics with extensive experience in implementing and using tools like Hive in professional settings. Having worked on practical big data solutions, Dayong brings a wealth of knowledge and insights to his writing. His clear, approachable style makes complex topics accessible to readers. Who is it for? This book is ideal for developers, data analysts, and data engineers looking to leverage Hive for big data analysis. If you are familiar with SQL and Hadoop basics and aim to enhance your understanding of Hive, this book is for you. Beginners with some programming background eager to dive into big data technologies will also benefit. It's tailored for learners wanting actionable knowledge to advance their data processing skills.

ArcPy and ArcGIS: Geospatial Analysis with Python

2015-02-26 O'Reilly Amazon

book

Silas Toms

data data-engineering location-data geographic-information-system-gis arcgis GIS

"ArcPy and ArcGIS: Geospatial Analysis with Python" introduces you to streamlining geospatial analysis using the ArcPy library in Python. You'll learn to automate repetitive GIS tasks, enhance your workflow in ArcGIS, and handle geospatial data programmatically to achieve efficient and accurate results in your projects. What this Book will help me do Master the use of the ArcPy library to automate and optimize GIS workflows. Learn techniques to efficiently handle geospatial data updates and analysis in Python. Understand how to use Python scripting to dynamically create and manage maps and analyses. Gain the skills to enhance repetitive GIS tasks into custom Python tools to increase productivity. Explore advanced geospatial analysis topics using Python's ArcPy module for complex problem-solving. Author(s) Silas Toms is a seasoned GIS professional with extensive experience in Python programming for geospatial applications. With years of hands-on work in automating GIS processes and teaching others, Silas excels at making technical concepts relatable and useful for real-world applications. His practical writing style ensures readers can effectively apply what they learn. Who is it for? This book is ideal for GIS students and professionals who wish to enhance their efficiency by automating tasks in ArcGIS using Python. It also suits Python developers keen on exploring geospatial data analysis and management workflows. Suitable for those with basic GIS knowledge, the book bridges the gap to advanced GIS automation techniques. It's perfect if you aim to streamline repetitive tasks and integrate programming into your geospatial projects.

PostgreSQL Server Programming - Second Edition

2015-02-26 O'Reilly Amazon

book

Kirk Roybal , Jim Mlodgenski , Usama Dar , Hannu Krosing

data data-engineering relational-databases postgresql Python SQL

Delve into the concepts and practices of PostgreSQL server-side programming with this practical and insightful guide. Learn how to extend PostgreSQL functionality through user-defined functions, various procedural languages, and effective debugging techniques. Gain a deeper understanding of PostgreSQL 9.4's features to optimize your database's capabilities. What this Book will help me do Master PostgreSQL's PL/pgSQL and other procedural languages for server-side programming. Craft powerful user-defined functions to provide database functionality specific to your needs. Explore debugging techniques and tools, including PL/pgSQL debugging extensions and NOTIFY. Scale and optimize databases effectively using tools like PL/Proxy. Leverage new features in PostgreSQL 9.4, such as event triggers, to enhance database performance. Author(s) The book is authored by experienced PostgreSQL professionals None Dar, None Krosing, and Jim Mlodgenski. Together, they bring years of expertise in database design, architecture, and development. Their combined backgrounds ensure a comprehensive and practical learning experience for readers. They aim to share practical insights and structured knowledge for database enthusiasts. Who is it for? This book is ideal for database professionals with a moderate to advanced understanding of PostgreSQL. Readers should have experience with SQL, query optimization concepts, and basic programming in languages like Python, Perl, or C. If you are aiming to enhance your knowledge of PostgreSQL in-depth capabilities and get hands-on with advanced features such as server programming and database scale optimization, this book is suitable for you.

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

2015-02-25 O'Reilly Amazon

book

Steven Hoffman

data data-engineering log-data Analytics Big Data ELK

"Apache Flume: Distributed Log Collection for Hadoop - Second Edition" is your hands-on guide to learning how to use Apache Flume to reliably collect and move logs and data streams into your Hadoop ecosystem. Through practical examples and real-world scenarios, this book will help you master the setup, configuration, and optimization of Flume for various data ingestion use cases. What this Book will help me do Understand the key concepts and architecture behind Apache Flume to build reliable and scalable data ingestion systems. Set up Flume agents to collect and transfer data into the Hadoop File System (HDFS) or other storage solutions effectively. Learn stream data processing techniques, such as filtering, transforming, and enriching data during transit to improve data usability. Integrate Flume with other tools like Elasticsearch and Solr to enhance analytics and search capabilities. Implement monitoring and troubleshooting workflows to maintain healthy and optimized Flume data pipelines. Author(s) Steven Hoffman, a seasoned software developer and data engineer, brings years of practical experience working with big data technologies to this book. He has a strong background in distributed systems and big data solutions, having implemented enterprise-scale analytics projects. Through clear and approachable writing, he aims to empower readers to successfully deploy reliable data pipelines using Apache Flume. Who is it for? This book is written for Hadoop developers, data engineers, and IT professionals who seek to build robust pipelines for streaming data into Hadoop environments. It is ideal for readers who have a basic understanding of Hadoop and HDFS but are new to Apache Flume. If you are looking to enhance your analytics capabilities by efficiently ingesting, routing, and processing streaming data, this book is for you. Beginners as well as experienced engineers looking to dive deeper into Flume will find it insightful.

Couchbase Essentials

2015-02-25 O'Reilly Amazon

book

John C Zablocki

data data-engineering nosql-databases couchbase API NoSQL

Couchbase Essentials is your gateway to mastering Couchbase, a powerful NoSQL database designed for building flexible and scalable applications. Through this book, you will understand Couchbase's key features, explore its indexing and querying capabilities, and learn to design schemas for its schemaless document model. What this Book will help me do Understand how to install and configure a single-node Couchbase environment. Master putting data into and retrieving data from Couchbase using its API. Develop skills in creating secondary and advanced indexes using Couchbase MapReduce views. Learn to design an efficient schema for Couchbase's schemaless document database. Create and query a functional application utilizing Couchbase and its N1QL query language. Author(s) John C Zablocki is an experienced software developer and technology enthusiast with a deep understanding of NoSQL databases like Couchbase. With years of practical experience, John has been instrumental in implementing Couchbase in scalable applications, and he shares actionable insights in this well-rounded book. Who is it for? This book is tailored for application developers aiming to enhance their applications with NoSQL capabilities. It is highly suitable for developers with backgrounds in relational databases, as well as those new to NoSQL systems. If you are interested in building modern, scalable applications, this comprehensive guide to Couchbase is for you.

Hadoop MapReduce v2 Cookbook - Second Edition

2015-02-25 O'Reilly Amazon

book

Thilina Gunarathne

data data-engineering Hadoop mapreduce Analytics Big Data

Explore insights from vast datasets with "Hadoop MapReduce v2 Cookbook - Second Edition." This book serves as a practical guide for developers and system administrators who aim to master big data processing using Hadoop v2. By engaging with its step-by-step recipes, you will learn to harness the Hadoop MapReduce ecosystem for scalable and efficient data solutions. What this Book will help me do Master the configuration and management of Hadoop YARN, MapReduce v2, and HDFS clusters. Integrate big data tools such as Hive, HBase, Pig, Mahout, and Nutch with Hadoop v2. Develop analytics solutions for large-scale datasets using MapReduce-based applications. Address specific challenges like data classification, recommendations, and text analytics leveraging Hadoop MapReduce. Deploy and manage big data clusters effectively, including options for cloud environments. Author(s) The authors behind "Hadoop MapReduce v2 Cookbook - Second Edition" combine their deep expertise in big data technology and years of experience working directly with Hadoop. They have helped numerous organizations implement scalable data processing solutions and are passionate about teaching others. Their approach ensures readers gain both foundational knowledge and practical skills. Who is it for? This book is perfect for developers and system administrators who want to learn Hadoop MapReduce v2, including configuring and managing big data clusters. Beginners with basic Java knowledge can follow along to advance their skills in big data processing. Ideal for those transitioning to Hadoop v2 or requiring practical recipes for immediate application. Great for professionals aiming to deepen their expertise in scalable data technologies.

Learning Apache Cassandra

2015-02-25 O'Reilly Amazon

book

Matthew Brown

data data-engineering nosql-databases Cassandra API MySQL

Learning Apache Cassandra is your comprehensive guide to mastering one of the most popular distributed databases for building scalable, fault-tolerant data layers. Through step-by-step examples and clear explanations, this book will help you understand Cassandra's architecture and how to use its features to design efficient applications. What this Book will help me do Successfully install and set up Apache Cassandra in your environment. Develop highly scalable data models for various application scenarios. Implement efficient query designs using Cassandra's specialized APIs. Maintain data consistency and handle concurrent updates in distributed systems. Apply best practices for securing Cassandra deployments and managing distributed data. Author(s) None Brown is an experienced software developer with a focus on database systems and distributed architectures. With years of hands-on experience working with SQL and NoSQL databases, they bring practical insights and clear instructions to their readers. Their writing aims to demystify complex topics and provide practical learning paths. Who is it for? This book is intended for software developers and database administrators looking to expand their knowledge of distributed databases. If you are familiar with SQL databases like MySQL or PostgreSQL and want to transition to Cassandra, this guide will help you. No prior experience with distributed databases is assumed. By following this book, you'll quickly become proficient in using Cassandra for your distributed application needs.

IBM Business Process Manager V8.5 Performance Tuning and Best Practices

2015-02-24 O'Reilly Amazon

book

Chris Richardson , Ben Hoflich , Andreas Fried , Torsten Wilms , Zi Hui Duan , Mike Collins

data data-engineering IBM Java

This IBM® Redbooks® publication provides performance tuning tips and best practices for IBM Business Process Manager (IBM BPM) V8.5.5 (all editions) and IBM Business Monitor V8.5.5. These products represent an integrated development and runtime environment based on a key set of service-oriented architecture (SOA) and business process management (BPM) technologies. Such technologies include Service Component Architecture (SCA), Service Data Object (SDO), Business Process Execution Language (BPEL) for web services, and Business Processing Modeling Notation (BPMN). Both IBM Business Process Manager and Business Monitor build on the core capabilities of the IBM WebSphere® Application Server infrastructure. As a result, Business Process Manager solutions benefit from tuning, configuration, and best practices information for WebSphere Application Server and the corresponding platform Java virtual machines (JVMs). This book targets a wide variety of groups, both within IBM (development, services, technical sales, and others) and customers. For customers who are either considering or are in the early stages of implementing a solution incorporating Business Process Manager and Business Monitor, this document proves a useful reference. The book is useful both in terms of best practices during application development and deployment and as a reference for setup, tuning, and configuration information. This book talks about many issues that can influence performance of each product and can serve as a guide for making rational first choices in terms of configuration and performance settings. Similarly, customers who already implemented a solution with these products can use the information presented here to gain insight into how their overall integrated solution performance can be improved.

NoSQL For Dummies

2015-02-24 O'Reilly Amazon

book

Adam Fowler

data data-engineering nosql-databases Analytics Big Data Cassandra

Get up to speed on the nuances of NoSQL databases and what they mean for your organization This easy to read guide to NoSQL databases provides the type of no-nonsense overview and analysis that you need to learn, including what NoSQL is and which database is right for you. Featuring specific evaluation criteria for NoSQL databases, along with a look into the pros and cons of the most popular options, NoSQL For Dummies provides the fastest and easiest way to dive into the details of this incredible technology. You'll gain an understanding of how to use NoSQL databases for mission-critical enterprise architectures and projects, and real-world examples reinforce the primary points to create an action-oriented resource for IT pros. If you're planning a big data project or platform, you probably already know you need to select a NoSQL database to complete your architecture. But with options flooding the market and updates and add-ons coming at a rapid pace, determining what you require now, and in the future, can be a tall task. This is where NoSQL For Dummies comes in! Learn the basic tenets of NoSQL databases and why they have come to the forefront as data has outpaced the capabilities of relational databases Discover major players among NoSQL databases, including Cassandra, MongoDB, MarkLogic, Neo4J, and others Get an in-depth look at the benefits and disadvantages of the wide variety of NoSQL database options Explore the needs of your organization as they relate to the capabilities of specific NoSQL databases Big data and Hadoop get all the attention, but when it comes down to it, NoSQL databases are the engines that power many big data analytics initiatives. With NoSQL For Dummies, you'll go beyond relational databases to ramp up your enterprise's data architecture in no time.

YARN Essentials

2015-02-24 O'Reilly Amazon

book

Nirmal Kumar , Amol Fasale

data data-engineering Hadoop yarn

"YARN Essentials" offers a practical introduction to Apache Hadoop YARN. With this book, you will acquire the skills to install, configure, and manage YARN clusters effectively. It provides hands-on guidance for deploying and managing applications and emerging frameworks, making this resource vital for mastering this key Hadoop technology. What this Book will help me do Learn how to install and configure Apache YARN from scratch. Understand YARN's architecture and its integration with the Hadoop ecosystem. Gain the ability to fine-tune a YARN cluster for optimal performance and scalability. Develop skills to create and run applications on a shared YARN cluster environment. Become proficient in managing, troubleshooting, and expanding YARN capabilities. Author(s) None Fasale and Nirmal Kumar are experienced professionals specializing in Hadoop and distributed systems. With years of hands-on experience in YARN and managing large-scale data processing frameworks, they bring their comprehensive expertise into this guide. Their focus on clarity and applicable knowledge ensures readers gain practical skills alongside theoretical understanding. Who is it for? This book is ideal for Hadoop administrators or developers with background knowledge of Hadoop 1.x, seeking to specialize in managing YARN clusters effectively. It assumes familiarity with basic Hadoop concepts while providing thorough explanations for YARN-specific features and topics. If you're looking to deploy scalable applications using YARN, this is the book for you.

Foundations of Linear and Generalized Linear Models

2015-02-23 O'Reilly Amazon

book

Alan Agresti

data data-engineering data-models

A valuable overview of the most important ideas and results in statistical modeling Written by a highly-experienced author, Foundations of Linear and Generalized Linear Models is a clear and comprehensive guide to the key concepts and results of linear statistical models. The book presents a broad, in-depth overview of the most commonly used statistical models by discussing the theory underlying the models, R software applications,and examples with crafted models to elucidate key ideas and promote practical modelbuilding. The book begins by illustrating the fundamentals of linear models, such as how the model-fitting projects the data onto a model vector subspace and how orthogonal decompositions of the data yield information about the effects of explanatory variables. Subsequently, the book covers the most popular generalized linear models, which include binomial and multinomial logistic regression for categorical data, and Poisson and negative binomial loglinear models for count data. Focusing on the theoretical underpinnings of these models, Foundations of Linear and Generalized Linear Models also features:

An introduction to quasi-likelihood methods that require weaker distributional assumptions, such as generalized estimating equation methods An overview of linear mixed models and generalized linear mixed models with random effects for clustered correlated data, Bayesian modeling, and extensions to handle problematic cases such as high dimensional problems Numerous examples that use R software for all text data analyses More than 400 exercises for readers to practice and extend the theory, methods, and data analysis A supplementary website with datasets for the examples and exercises

An invaluable textbook for upper-undergraduate and graduate-level students in statistics and biostatistics courses, Foundations of Linear and Generalized Linear Models is also an excellent reference for practicing statisticians and biostatisticians, as well as anyone who is interested in learning about the most important statistical models for analyzing data.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Centrally Managing Access to Self-Encrypting Drives in Lenovo System x Servers Using IBM Security Key Lifecycle Manager

IBM DS8870 Copy Services for Open Systems

Hadoop Virtualization

Big Data

Pro T-SQL Programmer’s Guide, 4th Edition

Implementing the IBM Storwize V3700

Beginning JSON

Hibernate Recipes: A Problem-Solution Approach, Second Edition

Coordination Abilities in Volleyball

Big Data Revolution

Field Guide to Hadoop

GPS Satellite Surveying, 4th Edition

Neo4j High Performance

Mastering Elasticsearch - Second Edition

Apache Hive Essentials

ArcPy and ArcGIS: Geospatial Analysis with Python

PostgreSQL Server Programming - Second Edition

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Couchbase Essentials

Hadoop MapReduce v2 Cookbook - Second Edition

Learning Apache Cassandra

IBM Business Process Manager V8.5 Performance Tuning and Best Practices

NoSQL For Dummies

YARN Essentials

Foundations of Linear and Generalized Linear Models