O'Reilly Data Engineering Books

Cassandra 3.x High Availability - Second Edition

2016-08-29 O'Reilly Amazon

book

Robbie Strickland

data data-engineering nosql-databases Cassandra DevOps

Cassandra 3.x High Availability is an in-depth guide to mastering the high availability features of Apache Cassandra. This book takes you through its architecture, implementing solutions to achieve zero downtime, and configuring clusters for fault tolerance and scalability. With practical examples and tips, it is a go-to resource for designing robust Cassandra-powered systems. What this Book will help me do Understand the architecture of Apache Cassandra and its high availability mechanisms. Master replication and tunable consistency levels for optimal data distribution. Learn to scale out your Cassandra deployments with multiple data centers. Acquire skills in creating efficient and scalable data models for fault-tolerant systems. Prevent system failures by avoiding anti-patterns and managing graceful failover scenarios. Author(s) None Strickland has extensive experience working as a developer and architect with distributed database systems. Specializing in Apache Cassandra, Strickland focuses on designing systems with high availability, scalability, and fault tolerance. Their practical teaching style ensures readers gain actionable knowledge to build robust database solutions. Who is it for? This book is ideal for developers and DevOps engineers familiar with Cassandra basics who wish to deepen their expertise. If your goal is to build highly available and fault-tolerant systems, this book will guide you step by step. It suits professionals managing data-intensive applications and looking to optimize their database strategy using Cassandra.

IBM System z Personal Development Tool Messages and Codes

2016-08-29 O'Reilly Amazon

book

Bill Ogden

data data-engineering IBM

This IBM® Redbooks® publication provides all the messages that are associated with IBM System z® Personal Development Tool (IBM zPDT®) operation in a single reference source. This edition is intended for zPDT Version 1 Release 6 (commonly known as GA6), but should be useful for all zPDT releases.

Practical Hive: A Guide to Hadoop's Data Warehouse System

2016-08-27 O'Reilly Amazon

book

Scott Shaw , David Kjerrumgaard , Andreas François Vermeulen , Ankur Gupta

data data-engineering Hadoop Big Data DWH Hive

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software. In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will Learn Install and configure Hive for new and existing datasets Perform DDL operations Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices Who This Book Is For Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.

Big Data War

2016-08-26 O'Reilly Amazon

book

Patrick H. Park

data data-engineering Analytics Big Data Data Analytics

This book mainly focuses on why data analytics fails in business. It provides an objective analysis and root causes of the phenomenon, instead of abstract criticism of utility of data analytics. The author, then, explains in detail on how companies can survive and win the global big data competition, based on actual cases of companies. Having established the execution and performance-oriented big data methodology based on over 10 years of experience in the field as an authority in big data strategy, the author identifies core principles of data analytics using case analysis of failures and successes of actual companies. Moreover, he endeavors to share with readers the principles regarding how innovative global companies became successful through utilization of big data. This book is a quintessential big data analytics, in which the author’s knowhow from direct and indirect experiences is condensed. How do we survive at this big data war in which Facebook in SNS, Amazon in e-commerce, Google in search, expand their platforms to other areas based on their respective distinct markets? The answer can be found in this book.

IBM Spectrum Archive Enterprise Edition V1.2.1: Installation and Configuration Guide

2016-08-25 O'Reilly Amazon

book

Illarion Borisevich , Larry Coyne , Khanh Ngo , Stefan Neff

data data-engineering IBM

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive (formerly IBM Linear Tape File System™ (LTFS)) Enterprise Edition (EE) V1.2.1.0 for the IBM TS3310, IBM TS3500, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of the LTFS for the policy management of tape as a storage tier in a IBM Spectrum Scale™ (formerly IBM General Parallel File System (IBM GPFS™)) based environment and helps encourage the use of tape as a critical tier in the storage environment. This is the second edition of IBM Spectrum Archive V1.2 (SG24-8333-00) although it is based on the prior editions of IBM Linear Tape File System Enterprise Edition V1.1.1.2: Installation and Configuration Guide, SG24-8143. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 7, 6, and 5 tape drives in IBM TS3310, TS3500, and TS4500 tape libraries. Also, IBM TS1140 and IBM TS1150 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in Tier 2 and Tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

IBM Data Engine for Hadoop and Spark

2016-08-24 O'Reilly Amazon

book

Dino Quintero , Reinaldo Tetsuo Katahira , Aditya Gandakusuma Sutandyo , Nicolas Joly , Luis Bolinches

data data-engineering IBM Analytics Big Data Hadoop

This IBM® Redbooks® publication provides topics to help the technical community take advantage of the resilience, scalability, and performance of the IBM Power Systems™ platform to implement or integrate an IBM Data Engine for Hadoop and Spark solution for analytics solutions to access, manage, and analyze data sets to improve business outcomes. This book documents topics to demonstrate and take advantage of the analytics strengths of the IBM POWER8® platform, the IBM analytics software portfolio, and selected third-party tools to help solve customer's data analytic workload requirements. This book describes how to plan, prepare, install, integrate, manage, and show how to use the IBM Data Engine for Hadoop and Spark solution to run analytic workloads on IBM POWER8. In addition, this publication delivers documentation to complement available IBM analytics solutions to help your data analytic needs. This publication strengthens the position of IBM analytics and big data solutions with a well-defined and documented deployment model within an IBM POWER8 virtualized environment so that customers have a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads. This book is targeted at technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering analytics solutions and support on IBM Power Systems.

Pro Oracle GoldenGate for the DBA

2016-08-24 O'Reilly Amazon

book

Bobby Curtis

data data-engineering oracle-database-solutions Oracle

Take a simple approach to learning the Oracle GoldenGate product. This approach provides the in-depth perspective of GoldenGate from an implementer's viewpoint; however, also addresses why the management viewpoint is important as well. Your journey through this book includes and architecture discussion of GoldenGate and the benefits of purchasing GoldenGate from a management perspective. Then the book quickly moves into advanced implementation components associated with GoldenGate. You'll find many use-cases and instructions throughout the book to help with everything from easy to complex GoldenGate implementations. An Oracle GoldenGate implementation generally consists of a group project, involving both business and technical resources. provides the viewpoint from the DBA's vantage point. This approach provides the components of who, what, why, when, and how in defining the implementation and support of a GoldenGate project. The success of most technical projects require the support of multiple resource groups, and Pro Oracle GoldenGate for the DBA supplies the insight for the DBA member to understand the implementation and support process. Pro Oracle GoldenGate for the DBA Takes you through justification, installation, and support. Provides the DBA perspective toward a successful a result. Covers from basic toward increasingly advanced implementations What You Will Learn Understand the core architecture of data replication using Oracle GoldenGate Implement a one-way setup of a classic capture and an integrated capture and replication Design, architect and implement a multi-master replication model Replicate unsupported data types using tokens Manage and troubleshoot multiple GoldenGate implementations New features of GoldenGate supported in Oracle 12c Who this Book is For Pro Oracle GoldenGate for the DBA is aimed squarely at Oracle database administrators who find themselves involved in GoldenGate integration projects. The book provides the DBA view into such projects, helping database administrators toward successful implementations and solid business results.

Real World SQL and PL/SQL: Advice from the Experts

2016-08-22 O'Reilly Amazon

book

Arup Nanda , Alex Nuitjen , Heli Helskyaho , Martin Widlake , Brendan Tierney

data data-engineering SQL pl-sql pl/sql Analytics

Master the Underutilized Advanced Features of SQL and PL/SQL This hands-on guide from Oracle Press shows how to fully exploit lesser known but extremely useful SQL and PL/SQL features―and how to effectively use both languages together. Written by a team of Oracle ACE Directors, Real-World SQL and PL/SQL: Advice from the Experts features best practices, detailed examples, and insider tips that clearly demonstrate how to write, troubleshoot, and implement code for a wide variety of practical applications. The book thoroughly explains underutilized SQL and PL/SQL functions and lays out essential development strategies. Data modeling, advanced analytics, database security, secure coding, and administration are covered in complete detail. Learn how to: • Apply advanced SQL and PL/SQL tools and techniques • Understand SQL and PL/SQL functionality and determine when to use which language • Develop accurate data models and implement business logic • Run PL/SQL in SQL and integrate complex datasets • Handle PL/SQL instrumenting and profiling • Use Oracle Advanced Analytics and Oracle R Enterprise • Build and execute predictive queries • Secure your data using encryption, hashing, redaction, and masking • Defend against SQL injection and other code-based attacks • Work with Oracle Virtual Private Database Code examples in the book are available for download at www.MHProfessional.com. TAG: For a complete list of Oracle Press titles, visit www.OraclePressBooks.com

IBM TS4500 R3 Tape Library Guide

2016-08-20 O'Reilly Amazon

book

Larry Coyne , Michael Engelbrecht , Simon Browne

data data-engineering IBM ELK Cyber Security

The IBM® TS4500 tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth require, with the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 5.5 petabytes (PBs) of data in a single 10-square foot library frame, which is up to 3.4 times more capacity than the IBM TS3500 tape library. The TS4500 offers these benefits: High availability dual active accessors with integrated service bays to reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from both the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to an additional 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the existing TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for the IBM TS1150 tape drive: The TS1150 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1150 offers high-performance, flexible data storage with support for data encryption. Also, this fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. Support of the IBM Linear Tape-Open (LTO) Ultrium 7 tape drive: The LTO Ultrium 7 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 6, while they still protect your investment in the previous technology. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), and command-line interface (CLI). You learn how to accomplish several specific tasks: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame. Manage storage by using the ALMS feature. Improve business continuity and disaster recovery with dual active accessor, automatic control path failover, and data path failover. Help ensure security and regulatory compliance with tape-drive encryption and Write Once Read Many (WORM) media. Support IBM LTO Ultrium 7, 6, and 5, IBM TS1150, and TS1140 tape drives. Provide a flexible upgrade path for users who want to expand their tape storage as their needs grow. Reduce the storage footprint and simplify cabling with 10 U of rack space on top of the library. This guide is for anyone who wants to understand more about the IBM TS4500 tape library. It is particularly suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Sams Teach Yourself Apache Spark™ in 24 Hours

2016-08-17 O'Reilly Amazon

book

Jeffrey Aven

data data-engineering apache-spark AI/ML API Big Data

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility. This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data. Learn how to • Discover what Apache Spark does and how it fits into the Big Data landscape • Deploy and run Spark locally or in the cloud • Interact with Spark from the shell • Make the most of the Spark Cluster Architecture • Develop Spark applications with Scala and functional Python • Program with the Spark API, including transformations and actions • Apply practical data engineering/analysis approaches designed for Spark • Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output • Optimize Spark solution performance • Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra) • Leverage cutting-edge functional programming techniques • Extend Spark with streaming, R, and Sparkling Water • Start building Spark-based machine learning and graph-processing applications • Explore advanced messaging technologies, including Kafka • Preview and prepare for Spark’s next generation of innovations Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.

Architecting for Access

2016-08-15 O'Reilly Amazon

book

Rich Morrow

data data-engineering Analytics Hadoop Looker NoSQL

Fragmented, disparate backend data systems have become the norm in today’s enterprise, where you’ll find a mix of relational databases, Hadoop stores, and NoSQL engines, with access and analytics tools bolted on every which way. This mishmash of options presents a real challenge when it comes to choosing frontend analytics and visualization tools. How did we get here? In this O’Reilly report, IT veteran Rich Morrow takes you through the rapid changes to both backend storage and frontend analytics over the past decade, and provides a pragmatic list of requirements for an analytics stack that will centralize access to all of these data systems. You’ll examine current analytics platforms, including Looker—a new breed of analytics and visualization tools built specifically to handle our fragmented data space. Understand why and how data became so fractured so quickly Explore the tangled web of data and backend tools in today’s enterprises Learn the tool requirements for accessing and analyzing the full spectrum of data Examine the relative strengths of popular analytics and visualization tools, including Looker, Tableau, and MicroStrategy Inspect Looker’s unique focus on both the frontend and backend

In Search of Database Nirvana

2016-08-15 O'Reilly Amazon

book

Rohit Jain

data data-engineering search BI Big Data Hadoop

The database pendulum is in full swing. Ten years ago, web-scale companies began moving away from proprietary relational databases to handle big data use cases with NoSQL and Hadoop. Now, for a variety of reasons, the pendulum is swinging back toward SQL-based solutions. What many companies really want is a system that can handle all of their operational, OLTP, BI, and analytic workloads. Could such an all-in-one database exist? This O’Reilly report examines this quest for database nirvana, or what Gartner recently dubbed Hybrid Transaction/Analytical Processing (HTAP). Author Rohit Jain takes an in-depth look at the possibilities and the challenges for companies that long for a single query engine to rule them all. With this report, you’ll explore: The challenges of having one query engine support operational, BI, and analytical workloads Efforts to produce a query engine that supports multiple storage engines Attempts to support multiple data models with the same query engine Why an HTAP database engine needs to provide enterprise-caliber capabilities, including high availability, security, and manageability How to assess various options for meeting workload requirements with one database engine, or a combination of query and storage engines

Interactive Spark using PySpark

2016-08-15 O'Reilly Amazon

book

Benjamin Bengfort , Jenny Kim

data data-engineering apache-spark PySpark AI/ML Analytics

Apache Spark is an in-memory framework that allows data scientists to explore and interact with big data much more quickly than with Hadoop. Python users can work with Spark using an interactive shell called PySpark. Why is it important? PySpark makes the large-scale data processing capabilities of Apache Spark accessible to data scientists who are more familiar with Python than Scala or Java. This also allows for reuse of a wide variety of Python libraries for machine learning, data visualization, numerical analysis, etc. What you'll learn—and how you can apply it Compare the different components provided by Spark, and what use cases they fit. Learn how to use RDDs (resilient distributed datasets) with PySpark. Write Spark applications in Python and submit them to the cluster as Spark jobs. Get an introduction to the Spark computing framework. Apply this approach to a worked example to determine the most frequent airline delays in a specific month and year. This lesson is for you because… You're a data scientist, familiar with Python coding, who needs to get up and running with PySpark You're a Python developer who needs to leverage the distributed computing resources available on a Hadoop cluster, without learning Java or Scala first Prerequisites Familiarity with writing Python applications Some familiarity with bash command-line operations Basic understanding of how to use simple functional programming constructs in Python, such as closures, lambdas, maps, etc. Materials or downloads needed in advance Apache Spark This lesson is taken from by Jenny Kim and Benjamin Bengfort. Data Analytics with Hadoop

IBM Tape Library Guide for Open Systems

2016-08-11 O'Reilly Amazon

book

Larry Coyne , Michael Engelbrecht , Simon Browne

data data-engineering IBM ELK

This IBM® Redbooks® publication presents a general introduction to the latest IBM tape and tape library technologies. Featured tape technologies include the IBM LTO Ultrium and Enterprise 3592 tape drives, and their implementation in IBM tape libraries. This 13th edition includes information about the latest enhancements to the IBM TS4500 enterprise tape library. In particular, it includes details about the latest TS4500 High Availability feature and its elastic capacity option. This book also provides details about the new TS7650G IBM ProtecTIER® gateway model DD6, contains technical information about each IBM tape product for open systems, and includes generalized sections about Small Computer System Interface (SCSI) and Fibre Channel connections and multipath architecture configurations. This book also covers tools and techniques for library management. It is intended for anyone who wants to understand more about IBM tape products and their implementation. It is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists. If you do not have a background in computer tape storage products, you might need to read other sources of information. In the interest of being concise, topics that are generally understood are not covered in detail.

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

2016-08-10 O'Reilly Amazon

book

Bhushan Lakhe

data data-engineering Hadoop AWS Lambda Big Data Data Lake

Re-architect relational applications to NoSQL, integrate relational database management systems with the Hadoop ecosystem, and transform and migrate relational data to and from Hadoop components. This book covers the best-practice design approaches to re-architecting your relational applications and transforming your relational data to optimize concurrency, security, denormalization, and performance. Winner of IBM's 2012 Gerstner Award for his implementation of big data and data warehouse initiatives and author of Practical Hadoop Security, author Bhushan Lakhe walks you through the entire transition process. First, he lays out the criteria for deciding what blend of re-architecting, migration, and integration between RDBMS and HDFS best meets your transition objectives. Then he demonstrates how to design your transition model. Lakhe proceeds to cover the selection criteria for ETL tools, the implementation steps for migration with SQOOP- and Flume-based data transfers, and transition optimization techniques for tuning partitions, scheduling aggregations, and redesigning ETL. Finally, he assesses the pros and cons of data lakes and Lambda architecture as integrative solutions and illustrates their implementation with real-world case studies. Hadoop/NoSQL solutions do not offer by default certain relational technology features such as role-based access control, locking for concurrent updates, and various tools for measuring and enhancing performance. Practical Hadoop Migration shows how to use open-source tools to emulate such relational functionalities in Hadoop ecosystem components. What You'll Learn Decide whether you should migrate your relational applications to big data technologies or integrate them Transition your relational applications to Hadoop/NoSQL platforms in terms of logical design and physical implementation Discover RDBMS-to-HDFS integration, data transformation, and optimization techniques Consider when to use Lambda architecture and data lake solutions Select and implement Hadoop-based components and applications to speed transition, optimize integrated performance, and emulate relational functionalities Who This Book Is For Database developers, database administrators, enterprise architects, Hadoop/NoSQL developers, and IT leaders. Its secondary readership is project and program managers and advanced students of database and management information systems.

Enabling Real-time Analytics on IBM z Systems Platform

2016-08-08 O'Reilly Amazon

book

Cedrine Madera , Ravi Kumar , Steven LaFalce , Sebastian Muszytowski , Oliver Benke , Lydia Parziale , Willie Favero

data data-engineering IBM AI/ML Analytics Data Modelling

Regarding online transaction processing (OLTP) workloads, IBM® z Systems™ platform, with IBM DB2®, data sharing, Workload Manager (WLM), geoplex, and other high-end features, is the widely acknowledged leader. Most customers now integrate business analytics with OLTP by running, for example, scoring functions from transactional context for real-time analytics or by applying machine-learning algorithms on enterprise data that is kept on the mainframe. As a result, IBM adds investment so clients can keep the complete lifecycle for data analysis, modeling, and scoring on z Systems control in a cost-efficient way, keeping the qualities of services in availability, security, reliability that z Systems solutions offer. Because of the changed architecture and tighter integration, IBM has shown, in a customer proof-of-concept, that a particular client was able to achieve an orders-of-magnitude improvement in performance, allowing that client’s data scientist to investigate the data in a more interactive process. Open technologies, such as Predictive Model Markup Language (PMML) can help customers update single components instead of being forced to replace everything at once. As a result, you have the possibility to combine your preferred tool for model generation (such as SAS Enterprise Miner or IBM SPSS® Modeler) with a different technology for model scoring (such as Zementis, a company focused on PMML scoring). IBM SPSS Modeler is a leading data mining workbench that can apply various algorithms in data preparation, cleansing, statistics, visualization, machine learning, and predictive analytics. It has over 20 years of experience and continued development, and is integrated with z Systems. With IBM DB2 Analytics Accelerator 5.1 and SPSS Modeler 17.1, the possibility exists to do the complete predictive model creation including data transformation within DB2 Analytics Accelerator. So, instead of moving the data to a distributed environment, algorithms can be pushed to the data, using cost-efficient DB2 Accelerator for the required resource-intensive operations. This IBM Redbooks® publication explains the overall z Systems architecture, how the components can be installed and customized, how the new IBM DB2 Analytics Accelerator loader can help efficient data loading for z Systems data and external data, how in-database transformation, in-database modeling, and in-transactional real-time scoring can be used, and what other related technologies are available. This book is intended for technical specialists and architects, and data scientists who want to use the technology on the z Systems platform. Most of the technologies described in this book require IBM DB2 for z/OS®. For acceleration of the data investigation, data transformation, and data modeling process, DB2 Analytics Accelerator is required. Most value can be archived if most of the data already resides on z Systems platforms, although adding external data (like from social sources) poses no problem at all.

Implementing or Migrating to an IBM Gen 5 b-type SAN

2016-08-05 O'Reilly Amazon

book

Paulo Neto , Megan Gilge , Gaston Rius , Mirza Baig , Silviano Gaona , Liam Dowds

data data-engineering IBM Cloud Computing Cloud Storage Fabric

The IBM® b-type Gen 5 Fibre Channel directors and switches provide reliable, scalable, and secure high-performance foundations for high-density server virtualization, cloud architectures, and next generation flash and SSD storage. They are designed to meet the demands of highly virtualized private cloud storage and data center environments. This IBM Redbooks® publication helps administrators learn how to implement or migrate to an IBM Gen 5 b-type SAN. It provides an overview of the key hardware and software products and explains how to install, monitor, tune, and troubleshoot your storage area network (SAN). Read this publication to learn about fabric design, managing and monitoring your network, key tools such as IBM Network Advisor and Fabric Vision, and troubleshooting.

T-SQL Fundamentals, Third Edition

2016-08-04 O'Reilly Amazon

book

Itzik Ben-Gan

data data-engineering relational-databases microsoft-sql-server transact-sql Azure

Effectively query and modify data using Transact-SQL Master T-SQL fundamentals and write robust code for Microsoft SQL Server and Azure SQL Database. Itzik Ben-Gan explains key T-SQL concepts and helps you apply your knowledge with hands-on exercises. The book first introduces T-SQL’s roots and underlying logic. Next, it walks you through core topics such as single-table queries, joins, subqueries, table expressions, and set operators. Then the book covers more-advanced data-query topics such as window functions, pivoting, and grouping sets. The book also explains how to modify data, work with temporal tables, and handle transactions, and provides an overview of programmable objects. Related Content Book: T-SQL Fundamentals, 4th Edition Microsoft Data Platform MVP Itzik Ben-Gan shows you how to: Review core SQL concepts and its mathematical roots Create tables and enforce data integrity Perform effective single-table queries by using the SELECT statement Query multiple tables by using joins, subqueries, table expressions, and set operators Use advanced query techniques such as window functions, pivoting, and grouping sets Insert, update, delete, and merge data Use transactions in a concurrent environment Get started with programmable objects–from variables and batches to user-defined functions, stored procedures, triggers, and dynamic SQL

IBM GDPS Family: An Introduction to Concepts and Capabilities

2016-08-01 O'Reilly Amazon

book

John Thompson , Sim Schindel , David Clitherow

data data-engineering IBM

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings, and the additional planning and implementation services available from IBM are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you do read all the chapters, be aware that some information is intentionally repeated.

Expert Scripting and Automation for SQL Server DBAs

2016-07-28 O'Reilly Amazon

book

Peter A Carter

data data-engineering SQL DWH PowerShell

Automate your workload and manage more databases and instances with greater ease and efficiency by combining metadata-driven automation with powerful tools like PowerShell and SQL Server Agent. Automate your new instance-builds and use monitoring to drive ongoing automation, with the help of an inventory database and a management data warehouse. The market has seen a trend towards there being a much smaller ratio of DBAs to SQL Server instances. Automation is the key to responding to this challenge and continuing to run a reliable database platform service. guides you through the process of automating the maintenance of your SQL Server enterprise. Expert Scripting and Automation for SQL Server DBAs shows how to automate the SQL Server build processes, monitor multiple instances from a single location, and automate routine maintenance tasks throughout your environment. You will also learn how to create automated responses to common or time consuming break/fix scenarios. The book helps you become faster and better at what you do for a living, and thus more valuable in the job market. Expert Scripting and Automation for SQL Server DBAs Extensive coverage of automation using PowerShell and T-SQL Detailed discussion and examples on metadata-driven automation Comprehensive coverage of automated responses to break/fix scenarios What You Will Learn Automate the SQL Server build process Create intelligent, metadata-drive routines Automate common maintenance tasks Create automated responses to common break/fix scenarios Monitor multiple instance from a central location Utilize T-SQL and PowerShell for administrative purposes Who This Book Is For is a book for SQL Server database administrators responsible for managing increasingly large numbers of databases across their business enterprise. The book is also useful for any database administrator looking to ease their workload through automation. The book addresses the needs of these audiences by showing how to get more done through less effort by implementing an intelligent, automated-processes service model using tools such as T-SQL, PowerShell, Server Agent, and the Management Data Warehouse. Expert Scripting and Automation for SQL Server DBAs

Monitoring Elasticsearch

2016-07-27 O'Reilly Amazon

book

Dan Noble

data data-engineering search elasticsearch API ELK

"Monitoring Elasticsearch" focuses on teaching readers how to manage and monitor the health and performance of Elasticsearch clusters. Through practical steps and real-world examples, this book ensures that users can diagnose, resolve, and prevent common issues to optimize system reliability and performance. What this Book will help me do Obtain a clear understanding of Elasticsearch monitoring tools and their features. Learn how to diagnose and troubleshoot common Elasticsearch performance issues. Master the use of Elasticsearch APIs for monitoring and analysis. Explore the best practices for effectively maintaining cluster reliability. Understand the features of tools like Kibana, Marvel, and BigDesk for Elasticsearch monitoring. Author(s) The authors of "Monitoring Elasticsearch" are experts in distributed systems and database management, with extensive experience in Elasticsearch deployment and monitoring. They bring their practical knowledge, teaching readers clear and actionable techniques. Their approachable style makes complex systems accessible, helping professionals and aficionados alike. Who is it for? This book is ideal for developers and system administrators who work with Elasticsearch, regardless of their industry. Whether you're new to Elasticsearch or aiming to deepen your expertise, you will find practical solutions and helpful tools. The content suits a range of experiences, from beginners curious about cluster monitoring to experts needing solutions for specific issues. If you use Elasticsearch or plan to, this book is for you.

IBM Netcool Operations Insight Version 1.4: Deployment Guide

2016-07-26 O'Reilly Amazon

book

Steven Shuman , Vasfi Gucer , Fernando de Andrade Cavalcanti , Mario Schuerewegen , Shaker Al-Muaber

data data-engineering IBM Cyber Security

IBM® Netcool® Operations Insight integrates infrastructure and operations management into a single coherent structure across business applications, virtualized servers, network devices and protocols, internet protocols, and security and storage devices. This IBM Redbooks® publication will help you install, tailor, and configure Netcool Operations Insight Version 1.4. Netcool Operations Insight consists of several products and components that can be installed on many servers in many combinations. You must make many decisions, both critical and personal preference. The purpose of this document is to accelerate the initial deployment of Netcool Operations Insight by making preferred practice choices. The target audience of this book is Netcool Operations Insight deployment specialists.

Implementing an IBM High-Performance Computing Solution on IBM Power System S822LC

2016-07-25 O'Reilly Amazon

book

Dino Quintero , Wainer dos Santos Moschetta , Georgy E Pavlov , Mauricio Faria de Oliveira , Tsuyoshi Kamenoue , Luis Carlos Cruz Huertas , Alexander Pozdneev

data data-engineering IBM Analytics Data Analytics Linux

This IBM® Redbooks® publication demonstrates and documents that IBM Power Systems™ high-performance computing and technical computing solutions deliver faster time to value with powerful solutions. Configurable into highly scalable Linux clusters, Power Systems offer extreme performance for demanding workloads such as genomics, finance, computational chemistry, oil and gas exploration, and high-performance data analytics. This book delivers a high-performance computing solution implemented on the IBM Power System S822LC. The solution delivers high application performance and throughput based on its built-for-big-data architecture that incorporates IBM POWER8® processors, tightly coupled Field Programmable Gate Arrays (FPGAs) and accelerators, and faster I/O by using Coherent Accelerator Processor Interface (CAPI). This solution is ideal for clients that need more processing power while simultaneously increasing workload density and reducing datacenter floor space requirements. The Power S822LC offers a modular design to scale from a single rack to hundreds, simplicity of ordering, and a strong innovation roadmap for graphics processing units (GPUs). This publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost effective high-performance computing (HPC) solutions that help uncover insights from their data so they can optimize business results, product development, and scientific discoveries

The Language of SQL, Second Edition

2016-07-23 O'Reilly Amazon

book

Larry Rockoff

data data-engineering SQL Microsoft MySQL Oracle

The Language of SQL, Second Edition Many SQL texts attempt to serve as an encyclopedic reference on SQL syntax -- an approach that is often counterproductive, because that information is readily available in online references published by the major database vendors. For SQL beginners, it’s more important for a book to focus on general concepts and to offer clear explanations and examples of what various SQL statements can accomplish. This is that book. A number of features make The Language of SQL unique among introductory SQL books. First, you will not be required to download software or sit with a computer as you read the text. The intent of this book is to provide examples of SQL usage that can be understood simply by reading. Second, topics are organized in an intuitive and logical sequence. SQL keywords are introduced one at a time, allowing you to grow your understanding as you encounter new terms and concepts. Finally, this book covers the syntax of three widely used databases: Microsoft SQL Server, MySQL, and Oracle. Special “Database Differences” sidebars clearly show you any differences in syntax among these three databases, and instructions are included on how to obtain and install free versions of the databases. This is the only book you need to gain a quick working knowledge of SQL and relational databases. Learn How To... Use SQL to retrieve data from relational databases Apply functions and calculations to data Group and summarize data in a variety of useful ways Use complex logic to retrieve only the data you need Update data and create new tables Design relational databases so that data retrieval is easy and intuitive Use spreadsheets to transform your data into meaningful displays Retrieve data from multiple tables via joins, subqueries, views, and set logic Create, modify, and execute stored procedures Install Microsoft SQL Server, MySQL, or Oracle Contents at a Glance 1 Relational Databases and SQL 2 Basic Data Retrieval 3 Calculated Fields and Aliases 4 Using Functions 5 Sorting Data 6 Selection Criteria 7 Boolean Logic 8 Conditional Logic 9 Summarizing Data 10 Subtotals and Crosstabs 11 Inner Joins 12 Outer Joins 13 Self Joins and Views 14 Subqueries 15 Set Logic 16 Stored Procedures and Parameters 17 Modifying Data 18 Maintaining Tables 19 Principles of Database Design 20 Strategies for Displaying Data A Getting Started with Microsoft SQL Server B Getting Started with MySQL C Getting Started with Oracle

IBM Netcool Operations Insight: A Scenarios Guide

2016-07-20 O'Reilly Amazon

book

Lanny Short , Manzoor Farid , Maciej Olejniczak , Vasfi Gucer , Ahmed A Saleh , Zane Bray , Steve Shuman , Jeff Ditto , Rob Clark

data data-engineering IBM Analytics Cloud Computing

IBM® Netcool® Operations Insight empowers your IT operations to use real-time and historical analytics to identify, isolate, and resolve problems before they affect your business. Powered by IBM Tivoli® Netcool/OMNIbus and the transformative capabilities of cognitive analytics, Netcool Operations Insight consolidates millions of alerts from across local, cloud, and hybrid environments into a few actionable problems. This IBM Redbooks® publication gives a broad understanding of Netcool Operations Insight and describes several scenarios that show the capabilities of this solution in a real-life environment. Each scenario features a different capability of Netcool Operations Insight. The scenarios are documented by using step-by-step figures with explanations to make them easier to implement in your own environment. The scenarios in this book are broken into the following categories: - Network Management-related scenarios - Network Event and cognitive-related scenarios - Network Event-related scenarios The target audience of this book is network specialists, network administrators, and network operators.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Cassandra 3.x High Availability - Second Edition

IBM System z Personal Development Tool Messages and Codes

Practical Hive: A Guide to Hadoop's Data Warehouse System

Big Data War

IBM Spectrum Archive Enterprise Edition V1.2.1: Installation and Configuration Guide

IBM Data Engine for Hadoop and Spark

Pro Oracle GoldenGate for the DBA

Real World SQL and PL/SQL: Advice from the Experts

IBM TS4500 R3 Tape Library Guide

Sams Teach Yourself Apache Spark™ in 24 Hours

Architecting for Access

In Search of Database Nirvana

Interactive Spark using PySpark

IBM Tape Library Guide for Open Systems

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

Enabling Real-time Analytics on IBM z Systems Platform

Implementing or Migrating to an IBM Gen 5 b-type SAN

T-SQL Fundamentals, Third Edition

IBM GDPS Family: An Introduction to Concepts and Capabilities

Expert Scripting and Automation for SQL Server DBAs

Monitoring Elasticsearch

IBM Netcool Operations Insight Version 1.4: Deployment Guide

Implementing an IBM High-Performance Computing Solution on IBM Power System S822LC

The Language of SQL, Second Edition

IBM Netcool Operations Insight: A Scenarios Guide