talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3432

Collection of O'Reilly books on Data Engineering.

Sessions & talks

Showing 1526–1550 of 3432 · Newest first

Search within this event →
Search and Foraging

This book examines how to program artificial search agents so that they act optimally or demonstrate the same behavior as predicted by the foraging theory for living organisms. It discusses foraging theory as well as search and screening theory in the same mathematical and algorithmic framework. It presents an overview of the main ideas and methods of foraging and search theories, making the concepts of one theory accessible to specialists of the other. Numerical examples illustrate the application of both theories.

Getting Started with MariaDB

Dive into the world of MariaDB with this comprehensive beginner's guide. From installation and configuration to advanced data handling, this book provides hands-on instructions on using MariaDB effectively. Tailored for newcomers, it ensures you can learn and apply database management in a practical way. What this Book will help me do Install MariaDB on various platforms like Windows, Mac OS X, and Linux to start working with databases. Optimize MariaDB for better performance by utilizing the advanced features available in version 10. Secure your databases effectively, ensuring sensitive data is protected from unauthorized access. Learn techniques to analyze and retrieve data efficiently using operators and sorting mechanisms. Perform database maintenance to ensure MariaDB functions optimally in the long run. Author(s) Daniel Bartholomew has extensive experience with open-source databases and has been a key advocate for MariaDB. With years of hands-on practice, Daniel helps simplify complex topics, making learning straightforward for beginners while ensuring robust coverage of advanced capabilities. Who is it for? This book is an excellent choice for those new to databases and wishing to start with MariaDB. Whether you're aiming to learn database basics or looking to expand your technical skillset, this guide provides the foundational knowledge you need. For IT learners or aspiring database managers, it's a perfect first step into database systems. Previous database experience is not necessary.

Implementing an IBM InfoSphere BigInsights Cluster using Linux on Power

This IBM® Redbooks® publication demonstrates and documents how to implement and manage an IBM PowerLinux™ cluster for big data focusing on hardware management, operating systems provisioning, application provisioning, cluster readiness check, hardware, operating system, IBM InfoSphere® BigInsights™, IBM Platform Symphony®, IBM Spectrum™ Scale (formerly IBM GPFS™), applications monitoring, and performance tuning. This publication shows that IBM PowerLinux clustering solutions (hardware and software) deliver significant value to clients that need cost-effective, highly scalable, and robust solutions for big data and analytics workloads. This book documents and addresses topics on how to use IBM Platform Cluster Manager to manage PowerLinux BigData data clusters through IBM InfoSphere BigInsights, Spectrum Scale, and Platform Symphony. This book documents how to set up and manage a big data cluster on PowerLinux servers to customize application and programming solutions, and to tune applications to use IBM hardware architectures. This document uses the architectural technologies and the software solutions that are available from IBM to help solve challenging technical and business problems. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering cost-effective Linux on IBM Power Systems™ solutions that help uncover insights among client's data so they can act to optimize business results, product development, and scientific discoveries.

IBM Spectrum Accelerate: Deployment, Usage, and Maintenance

IBM® Spectrum™ Accelerate, a member of the IBM Spectrum Storage™, is an agile software-defined storage solution for enterprise and cloud that builds on the customer-proven and mature IBM XIV® storage software. The key characteristic of Spectrum Accelerate is that it can be easily deployed and run on purpose-built or existing hardware chosen by the customer. IBM Spectrum Accelerate enables rapid deployment of high-performance and scalable block data storage infrastructure over commodity hardware, either on-premises or off-premises. This IBM Redbooks® publication provides a broad understanding of IBM Spectrum Accelerate. The book introduces Spectrum Accelerate and discusses planning and preparation that are essential for a successful deployment of the solution. The deployment itself is explained through a step-by-step approach, using either a graphical user interface (GUI) based method or a simple command-line interface (CLI) based procedure. Subsequent chapters explain the logical configuration of the system, host support and business continuity functions, and migration. Although it makes many references to the XIV storage software, the book also emphasizes where IBM Spectrum Accelerate differs from XIV. Finally, a substantial portion of the book is dedicated to maintenance and troubleshooting to provide detailed guidance for the customer support personnel.

Oracle Database Upgrade, Migration & Transformation Tips & Techniques

A practical roadmap for database upgrade, migration, and transformation This Oracle Press guide provides best practices for migrating between different operating systems and platforms, transforming existing databases to use different storage or enterprise systems, and upgrading databases from one release to the next. Based on the expert authors’ real-world experience, Oracle Database Upgrade, Migration & Transformation Tips & Techniques will help you choose the best migration path for your project and develop an effective methodology. Code examples and detailed checklists are included in this comprehensive resource. Leverage the features of Oracle Data Guard to migrate an Oracle Database Use Oracle Recovery Manager, transportable tablespace sets, and transportable database toolsets to migrate between platforms Migrate databases with export/import Use Oracle GoldenGate for zero or near-zero downtime migrations Take advantage of the Cross-Platform Transportable Tablespace Set utility Migrate to new storage platforms using the features of Oracle Automatic Storage Management Upgrade to Oracle Database 12c with the Database Upgrade Assistant tool Move seamlessly to Oracle's engineered systems Migrate to the cloud

SAP ERP Financial Accounting and Controlling: Configuration and Use Management

SAP ERP modules are notoriously hard to configure and use effectively without a lot of practice and experience. But as SAP ERP Financial Accounting and Controlling : Configuration and Use Management shows, it doesn't have to be so difficult. The book takes a systematic approach that leads SAP Financial Accounting and Controlling (FICO) users step by step through configuring and using all the program’s facets. This approach makes configuration complexities manageable. The book’s author—SAP expert, trainer, and accountant Andrew Okungbowa—ensures that both you and your end users are up and running quickly and confidently with FICO. He also provides sound and tested procedures that ensure your implementation works without error. SAP ERP Financial Accounting and Controlling : Configuration and Use Management is in fact the most comprehensive and easy-to-follow SAP FICO configuration book in the market. It incorporates a hands-on approach, with hundreds of screen shots and practical examples, that allows a person without prior configuration training to make SAP FICO ready for use in the enterprise. You’ll find that you don’t need to be a rocket scientist to grasp the concepts explained and apply them to your work—even when the finances are complicated, such as with the ins and outs of taxes, currency conversions, or special general ledger entries such as down payments or bills of exchange. Providing an in-depth coverage of both configuration and end user procedures, the book covers most aspects of the SAP FICO certification syllabus—SAP’s view of the module’s key tasks and procedures—including: Configuring and using the general ledger and accounts payable and receivable screens Configuring and completing closing procedures, asset accounting, and financial reporting Configuring global settings and enterprise variables Accounting for both profit and cost centers Creating a house bank Integrating FICO with other SAP modules Taking a jargon-free tone and providing an abundance of examples, Andrew Okungbowa provides a clear understanding of configuration techniques and the breadth of functionalities encompassed by SAP FICO. And as an accountant, Okungbowa understands the needs of end users as well as of those answering to the CIO.

The Next Generation of Distributed IBM CICS

This IBM® Redbooks® publication describes IBM TXSeries® for Multiplatforms, which is the premier IBM distributed transaction processing software for business-critical applications. Before describing distributed transaction processing in general, we introduce the most recent version of TXSeries for Multiplatforms. We focus on the following areas: The technical value of TXSeries for Multiplatforms New features in TXSeries for Multiplatforms Core components of TXSeries Common TXSeries deployment scenarios Deployment, development, and administrative choices Technical considerations It also demonstrates enterprise integration with products, such as relational database management system (RDBMS), IBM WebSphere® MQ, and IBM WebSphere Application Server. In addition, it describes system customization, reviewing several features, such as capacity planning, backup and recovery, and high availability (HA). We describe troubleshooting in TXSeries. We also provide details about migration from version to version for TXSeries. A migration checklist is included. We demonstrate a sample application that we created, called BigBlueBank, its installation, and the server-side and client-side programs. Other topics in this book include application development and system administration considerations. This book describes distributed IBM Customer Information Control System (IBM CICS®) solutions, and how best to develop distributed CICS applications.

Infinispan data grid platform definitive guide

Dive into creating highly scalable and performant applications with this comprehensive guide to the Infinispan data grid platform. Designed for Java enterprise developers, this book provides clear and approachable instructions for implementing sophisticated data management solutions using Infinispan. What this Book will help me do Install and configure Infinispan for optimized development environments. Understand and implement data caching topologies for diverse access patterns. Leverage scalable distributed transactions with detailed Apache JGroups integrations. Monitor and manage Infinispan instances using cutting-edge tools like RHQ and JMX. Develop a real-world application using Infinispan's APIs for practical insights. Author(s) The author(s) of this book are seasoned Java developers and experts in distributed caching and data grid technologies. With years of industry experience, they bring theoretical insights paired with pragmatic application know-how. Their approach emphasizes teaching through real-life use cases, practical applications, and clear explanations, making complex concepts accessible to all readers. Who is it for? This book is perfect for Java enterprise developers who are looking to elevate their architecture skills by building applications that demand scalability and high performance. Readers should have a solid understanding of Java, though no prior experience using Infinispan is required. Whether you're transitioning from traditional databases or improving your grasp of distributed caching, this book suits your needs.

Using IBM Enterprise Records

Records management helps users address evolving governance mandates to meet regulatory, legal, and fiduciary requirements. Proactive adherence to information retention policies and procedures is a critical facet of any compliance strategy. IBM® Enterprise Records helps organizations enforce centralized policy management for file plans, retention schedules, legal preservation holds, and auditing. IBM Enterprise Records enables your organization to securely capture, declare, classify, store, and dispose of electronic and physical records. In this IBM Redbooks® publication, we introduce the records management concept and provide an overview of IBM Enterprise Records. We address records management topics, including the retention schedule, file plan, records ingestion and declaration, records disposition, records hold, and Enterprise Records application programming interfaces (APIs). We also use a case study to describe step-by-step instructions to implement a sample records management solution using Enterprise Records. We provide concrete examples of how to perform tasks, such as file plan creation, records ingestion and declaration, records disposition, and records hold. This book helps you to understand the records management concept, the IBM Enterprise Records features and capabilities, and its use.

Neo4j Cookbook

Dive into Neo4j and uncover how to harness its powerful capabilities in graph data analysis with the Neo4j Cookbook. Across 75 well-structured recipes, you'll learn to apply practical techniques in modeling, querying, and visualizing graph databases, enabling you to address real-world challenges efficiently. What this Book will help me do Access Neo4j from popular programming languages such as Java, Python, and Scala, enabling easier integration into your projects. Migrate data seamlessly from various data stores, including SQL and NoSQL, into Neo4j, maintaining data consistency. Use best practices for data modeling with Neo4j to optimize performance and scalability for your applications. Analyze social data from sources like Facebook and Twitter, revealing valuable insights from connections and relationships. Integrate geospatial data to enable location-based queries and nearest-point searches, opening up advanced application features. Author(s) Ankur Goel, the author of Neo4j Cookbook, is an experienced technologist with an extensive background in handling database solutions and applications. Passionate about simplifying complex systems, Ankur excels in teaching essential database concepts through clear and actionable recipes. His writing is rooted in practical insights, reflecting his hands-on experience in the industry. Who is it for? This book is ideal for developers and data engineers who currently use or plan to integrate Neo4j into their workflows. If you are migrating from a traditional database system or delving into graph databases for the first time, this book offers structured guidance. Readers should have a fundamental understanding of programming and familiarity with database concepts for the best experience. It caters to individuals aiming to build or enhance data-driven applications using Neo4j's robust graph modeling.

Healthy SQL : A Comprehensive Guide to Healthy SQL Server Performance

Healthy SQL is about ensuring the ongoing performance health of a SQL Server database. An unhealthy database is not just an inconvenience; it can bring a business to its knees. And if you are the database administrator, the health of your SQL Server implementation can be a direct reflection on you. It's in everyone's best interest to have a healthy SQL implementation. Healthy SQL is built around the concept of a medical checkup, giving you the tools you need to assess the current health of your database and take action to improve upon that health and maintain good performance to your business. Healthy SQL aids in developing a rigorous routine so that you know how healthy your SQL Server machines are, and how you can keep those same servers healthy and fit for duty. The book is filled with practical advice and a time-tested strategy, helping you put together a regimen that will ensure your servers are healthy, your implementation is fully optimized, your services are redundant and highly available, and you have a plan for business continuity in the event of a disaster. If your current environment doesn't match up with these criteria, then pick up a copy of Healthy SQL today and start your journey on the road to a fit and tight SQL Server deployment.

Implementing IBM FlashSystem 900

Today's global organizations depend on being able to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem™ 900, powered by IBM FlashCore™ technology, they can make faster decisions based on real-time insights and unleash the power of the most demanding applications, including online transaction processing (OLTP) and analytics databases, virtual desktop infrastructures (VDIs), technical computing applications, and cloud environments. This IBM Redbooks® publication introduces clients to the IBM FlashSystem® 900. It provides in-depth knowledge of the product architecture, software and hardware, implementation, and hints and tips. Also illustrated are use cases that show real-world solutions for tiering, flash-only, and preferred-read, and also examples of the benefits gained by integrating the FlashSystem storage into business environments. This book is intended for pre-sales and post-sales technical support professionals and storage administrators, and for anyone who wants to understand how to implement this new and exciting technology. This book describes the following offerings of the IBM Spectrum™ Storage family: IBM Spectrum Storage™ IBM Spectrum Control IBM Spectrum Virtualize IBM Spectrum Scale IBM Spectrum Accelerate

Designing and Operating a Data Reservoir

Together, big data and analytics have tremendous potential to improve the way we use precious resources, to provide more personalized services, and to protect ourselves from unexpected and ill-intentioned activities. To fully use big data and analytics, an organization needs a system of insight. This is an ecosystem where individuals can locate and access data, and build visualizations and new analytical models that can be deployed into the IT systems to improve the operations of the organization. The data that is most valuable for analytics is also valuable in its own right and typically contains personal and private information about key people in the organization such as customers, employees, and suppliers. Although universal access to data is desirable, safeguards are necessary to protect people's privacy, prevent data leakage, and detect suspicious activity. The data reservoir is a reference architecture that balances the desire for easy access to data with information governance and security. The data reservoir reference architecture describes the technical capabilities necessary for a system of insight, while being independent of specific technologies. Being technology independent is important, because most organizations already have investments in data platforms that they want to incorporate in their solution. In addition, technology is continually improving, and the choice of technology is often dictated by the volume, variety, and velocity of the data being managed. A system of insight needs more than technology to succeed. The data reservoir reference architecture includes description of governance and management processes and definitions to ensure the human and business systems around the technology support a collaborative, self-service, and safe environment for data use. The data reservoir reference architecture was first introduced in Governing and Managing Big Data for Analytics and Decision Makers, REDP-5120, which is available at: http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html. This IBM® Redbooks publication, Designing and Operating a Data Reservoir, builds on that material to provide more detail on the capabilities and internal workings of a data reservoir.

IBM Spectrum Scale (formerly GPFS)

This IBM® Redbooks® publication updates and complements the previous publication: Implementing the IBM General Parallel File System in a Cross Platform Environment, SG24-7844, with additional updates since the previous publication version was released with IBM General Parallel File System (GPFS™). Since then, two releases have been made available up to the latest version of IBM Spectrum™ Scale 4.1. Topics such as what is new in Spectrum Scale, Spectrum Scale licensing updates (Express/Standard/Advanced), Spectrum Scale infrastructure support/updates, storage support (IBM and OEM), operating system and platform support, Spectrum Scale global sharing - Active File Management (AFM), and considerations for the integration of Spectrum Scale in IBM Tivoli® Storage Manager (Spectrum Protect) backup solutions are discussed in this new IBM Redbooks publication. This publication provides additional topics such as planning, usability, best practices, monitoring, problem determination, and so on. The main concept for this publication is to bring you up to date with the latest features and capabilities of IBM Spectrum Scale as the solution has become a key component of the reference architecture for clouds, analytics, mobile, social media, and much more. This publication targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost effective cloud services and big data solutions on IBM Power Systems™ helping to uncover insights among clients' data so they can take actions to optimize business results, product development, and scientific discoveries.

Oracle SQL Developer Data Modeler for Database Design Mastery

Design Databases with Oracle SQL Developer Data Modeler In this practical guide, Oracle ACE Director Heli Helskyaho explains the process of database design using Oracle SQL Developer Data Modeler—the powerful, free tool that flawlessly supports Oracle and other database environments, including Microsoft SQL Server and IBM DB2. Oracle SQL Developer Data Modeler for Database Design Mastery covers requirement analysis, conceptual, logical, and physical design, data warehousing, reporting, and more. Create and deploy high-performance enterprise databases on any platform using the expert tips and best practices in this Oracle Press book. Configure Oracle SQL Developer Data Modeler Perform requirement analysis Translate requirements into a formal conceptual data model and process models Transform the conceptual (logical) model into a relational model Manage physical database design Generate data definition language (DDL) scripts to create database objects Design a data warehouse database Use subversion for version control and to enable a multiuser environment Document an existing database Use the reporting tools in Oracle SQL Developer Data Modeler Compare designs and the database

Analysis Patterns: Reusable Object Models

This innovative book recognizes the need within the object-oriented community for a book that goes beyond the tools and techniques of the typical methodology book. In Analysis Patterns: Reusable Object Models, Martin Fowler focuses on the end result of object-oriented analysis and design—the models themselves. He shares with you his wealth of object modeling experience and his keen eye for identifying repeating problems and transforming them into reusable models. Analysis Patterns provides a catalogue of patterns that have emerged in a wide range of domains including trading, measurement, accounting and organizational relationships. Recognizing that conceptual patterns cannot exist in isolation, the author also presents a series of "support patterns" that discuss how to turn conceptual models into software that in turn fits into an architecture for a large information system. Included in each pattern is the reasoning behind their design, rules for when they should and should not be used, and tips for implementation. The examples presented in this book comprise a cookbook of useful models and insight into the skill of reuse that will improve analysis, modeling and implementation.

Beginning Oracle PL/SQL, Second Edition

Beginning Oracle PL/SQL gets you started in using the built-in language that every Oracle developer and database administrator must know. Oracle Database is chock-full of built-in application features that are free for the using, and PL/SQL is your ticket to learning about and using those features from your own code. With it, you can centralize business logic in the database, you can offload application logic, and you can automate database- and application-administration tasks. Author Don Bales provides in Beginning Oracle PL/SQL a fast-paced and example-filled tutorial. Learn from Don’s extensive experience to discover the most commonly used aspects of PL/SQL, without wasting time on obscure and obsolete features. The author takes his 20+ years of experience and a wealth of statistics he's gathered on PL/SQL usage over those years and applies the 80/20 rule: cover what's most needed and used by PL/SQL professionals and avoid what's not necessary! The result is a book that covers all the key features of PL/SQL without wasting your time discussing esoteric and obsolete parts of the language. Learn what really matters, so that you can get to work feeling confident with what you know about PL/SQL. Covers the key topics that matter, including variables and datatypes, executing statements, working with cursors, bulk operations, real-world objects, debugging, testing, and more. Teaches you to write production-level, object-oriented PL/SQL. You'll explore relational PL/SQL, but unlike most other books on the subject, this one emphasizes the use of PL/SQLs object-oriented features as well. Guides you in working through real examples of using of PL/SQL. You'll learn PL/SQL by applying it to real-world business problems, not by heavy theory.

FileMaker Pro 14: The Missing Manual

You don’t need a technical background to build powerful databases with FileMaker Pro 14. This crystal-clear, objective guide shows you how to create a database that lets you do almost anything with your data so you can quickly achieve your goals. Whether you’re creating catalogs, managing inventory and billing, or planning a wedding, you’ll learn how to customize your database to run on a PC, Mac, web browser, or iOS device. The important stuff you need to know: Dive into relational data. Solve problems quickly by connecting and combining data from different tables. Create professional documents. Publish reports, charts, invoices, catalogs, and other documents with ease. Access data anywhere. Use FileMaker Go on your iPad or iPhone—or share data on the Web. Harness processing power. Use new calculation and scripting tools to crunch numbers, search text, and automate tasks. Run your database on a secure server. Learn the high-level features of FileMaker Pro Advanced. Keep your data safe. Set privileges and allow data sharing with FileMaker’s streamlined security features.

Apache Oozie

Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases. Once you set up your Oozie server, you’ll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie’s security capabilities. Install and configure an Oozie server, and get an overview of basic concepts Journey through the world of writing and configuring workflows Learn how the Oozie coordinator schedules and executes workflows based on triggers Understand how Oozie manages data dependencies Use Oozie bundles to package several coordinator apps into a data pipeline Learn about security features and shared library management Implement custom extensions and write your own EL functions and actions Debug workflows and manage Oozie’s operational details

Implementation Best Practices for IBM DB2 BLU Acceleration with SAP BW on IBM Power Systems

BLU Acceleration is a new technology that has been developed by IBM® and integrated directly into the IBM DB2® engine. BLU Acceleration is a new storage engine along with integrated run time (directly into the core DB2 engine) to support the storage and analysis of column-organized tables. The BLU Acceleration processing is parallel to the regular, row-based table processing found in the DB2 engine. This is not a bolt-on technology nor is it a separate analytic engine that sits outside of DB2. Much like when IBM added XML data as a first class object within the database along with all the storage and processing enhancements that came with XML, now IBM has added column-organized tables directly into the storage and processing engine of DB2. This IBM Redbooks® publication shows examples on an IBM Power Systems™ entry server as a starter configuration for small organizations, and build larger configurations with IBM Power Systems larger servers. This publication takes you through how to build a BLU Acceleration solution on IBM POWER® having SAP Landscape integrated to it. This publication implements SAP NetWeaver Business Warehouse Systems as part of the scenario using another DB2 Feature called Near-Line Storage (NLS), on IBM POWER virtualization features to develop and document best recommendation scenarios. This publication is targeted towards technical professionals (DBAs, data architects, consultants, technical support staff, and IT specialists) responsible for delivering cost-effective data management solutions to provide the best system configuration for their clients' data analytics on Power Systems.

Current State of Big Data Use in Retail Supply Chains

Innovation, consisting of invention, adoption, and deployment of new technology and associated process improvements, is a key source of competitive advantages. Big Data is an innovation that has been gaining prominence in retailing and other industries. In fact, managers working in retail supply chain member firms (that is, retailers, manufacturers, distributors, wholesalers, logistics providers, and other service providers) have increasingly been trying to understand what Big Data entails, what it may be used for, and how to make it an integral part of their businesses. This report covers Big Data use, with focus on applications for retail supply chains. The authors’ findings suggest that Big Data use in retail supply chains is still generally elusive. Although most managers have reported initial, and in some cases some significant efforts in analyzing large sets of data for decision making, various challenges confine these data to a range of use spanning traditional, transactional data.

Big Data

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built. About the Technology About the Book Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive. Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm, and NoSQL databases. What's Inside Introduction to big data systems Real-time processing of web-scale data Tools like Hadoop, Cassandra, and Storm Extensions to traditional database skills About the Reader This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful. About the Authors Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing. Quotes Transcends individual tools or platforms. Required reading for anyone working with big data systems. - Jonathan Esterhazy, Groupon A comprehensive, example-driven tour of the Lambda Architecture with its originator as your guide. - Mark Fisher, Pivotal Contains wisdom that can only be gathered after tackling many big data projects. A must-read. - Pere Ferrera Bertran, Datasalt The de facto guide to streamlining your data pipeline in batch and near-real time. - Alex Holmes, Author of "Hadoop in Practice"

DS8870 Data Migration Techniques

This IBM® Redbooks® publication describes data migrations between IBM DS8000® storage systems, where in most cases one or more older DS8000 models are being replaced by the newer DS8870 model. Most of the migration methods are based on the DS8000 Copy Services. The book includes considerations for solutions such as IBM Tivoli® Productivity Center for Replication and the IBM Geographically Dispersed Parallel Sysplex™ (GDPS®) used in IBM z/OS® environments. Both offerings are primarily designed to enable a disaster recovery using DS8000 Copy Services. In most data migration cases, Tivoli Productivity Center for Replication or GDPS will not directly provide functions for the data migration itself. However, this book explains how to bring the new migrated environment back into the control of GDPS or Tivoli Productivity Center for Replication. In addition to the Copy Services based migrations, the book also covers host-based mirroring techniques, using IBM Transparent Data Migration Facility (TDMF®) for z/OS and the z/OS Dataset Mobility Facility (zDMF).

PostgreSQL 9 Administration Cookbook - Second Edition

Master PostgreSQL 9.4 with this hands-on cookbook featuring over 150 practical and easy-to-follow recipes that will bring you up to speed with PostgreSQL's latest features. You'll learn how to create, manage, and optimize a PostgreSQL-based database, focusing on vital aspects like performance and reliability. What this Book will help me do Efficiently configure PostgreSQL databases for optimal performance. Deploy robust backup and recovery strategies to ensure data reliability. Utilize PostgreSQL's replication features for improved high availability. Implement advanced queries and analyze large datasets effectively. Optimize database structure and functionality for application needs. Author(s) Simon Riggs, Gianni Ciolli, and their co-authors are seasoned database professionals with extensive experience in PostgreSQL administration and development. They have a complementary blend of skills, comprising practical system knowledge, teaching, and authoritative writing. Their hands-on experience translates seamlessly into accessible yet informative content. Who is it for? This book is ideal for database administrators and developers who are looking to enhance their skills with PostgreSQL, especially version 9. If you have some prior experience with relational databases and want practical guidance on optimizing, managing, and mastering PostgreSQL, this resource is tailored for you.

Hadoop Essentials

In 'Hadoop Essentials,' you'll embark on an engaging journey to master the Hadoop ecosystem. This book covers fundamental to advanced topics, from HDFS and MapReduce to real-time analytics with Spark, empowering you to handle modern data challenges efficiently. What this Book will help me do Understand the core components of Hadoop, including HDFS, YARN, and MapReduce, for foundational knowledge. Learn to optimize Big Data architectures and improve application performance. Utilize tools like Hive and Pig for efficient data querying and processing. Master data ingestion technologies like Sqoop and Flume for seamless data management. Achieve fluency in real-time data analytics using modern tools like Apache Spark and Apache Storm. Author(s) None Achari is a seasoned expert in Big Data and distributed systems with in-depth knowledge of the Hadoop ecosystem. With years of experience in both development and teaching, they craft content that bridges practical know-how with theoretical insights in a highly accessible style. Who is it for? This book is perfect for system and application developers aiming to learn practical applications of Hadoop. It suits professionals seeking solutions to real-world Big Data challenges as well as those familiar with distributed systems basics and looking to deepen their expertise in advanced data analysis.