O'Reilly Data Engineering Books

IBM PowerHA SystemMirror V7.2.1 for IBM AIX Updates

2017-05-23 O'Reilly Amazon

book

Dino Quintero , Bunphot Chuprasertsuk , Fabio Martins , Bernhard Buehler , Matthew W Radford , Shawn Bodily , Maria-Katharina Esser , Anthony Steel , Bing He

data data-engineering IBM

Abstract This IBM® Redbooks® publication helps strengthen the position of the IBM PowerHA® SystemMirror® solution with a well-defined and documented deployment models within an IBM Power Systems™ virtualized environment, which provides customers with a planned foundation for business resilience and disaster recovery for their IBM Power Systems infrastructure solutions. This publication addresses topics to help meet customers' complex high availability and disaster recovery requirements on IBM Power Systems servers to help maximize their systems' availability and resources, and provide technical documentation to transfer the how-to-skills to users and support teams. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing high availability and disaster recovery solutions and support with IBM PowerHA SystemMirror Standard and Enterprise Editions on IBM Power Systems servers.

Oracle on IBM z Systems

2017-05-22 O'Reilly Amazon

book

Helene Grosch , David J Simpson , Armelle Chevé , Moshe Reder , Narjisse Zaki , Lydia Parziale , Sam Amsavelu

data data-engineering oracle-database-solutions Cloud Computing IBM Linux

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® z Systems®. The enterprise-grade Linux on IBM z Systems solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from IBM z Systems®. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V7.8

2017-05-16 O'Reilly Amazon

book

Jon Tate , Frank Enders , Catarina Castro , Giulio Fiscella , Dharmesh Kamdar , Paulo Tomiyoshi Takeda

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller

Abstract This IBM® Redbooks® publication is a detailed technical guide to the IBM System Storage® SAN Volume Controller, which is powered by IBM Spectrum Virtualize™ Version 7.8. IBM SAN Volume Controller is a virtualization appliance solution, which maps virtualized volumes that are visible to hosts and applications to physical volumes on storage devices. Each server within the storage area network (SAN) has its own set of virtual storage addresses that are mapped to physical addresses. If the physical addresses change, the server continues running by using the same virtual addresses that it had before. Therefore, volumes or storage can be added or moved while the server is still running. The IBM virtualization technology improves the management of information at the "block" level in a network, which enables applications and servers to share storage devices on a network.

POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition

2017-05-14 O'Reilly Amazon

book

Dino Quintero , Wainer dos Santos Moschetta , Joseph Apuzzo , John Dunham , Mauricio Faria de Oliveira , Desnes Augusto Nunes Rosario , Markus Hilger , Alexander Pozdneev

data data-engineering IBM

Abstract This IBM® Redbooks® publication documents and addresses topics to provide step-by-step customizable application and programming solutions to tune application and workloads to use IBM Power Systems™ hardware architecture. This publication explores, tests, and documents the solution to use the architectural technologies and the software solutions that are available from IBM to help solve challenging technical and business problems. This publication also demonstrates and documents that the combination of IBM high-performance computing (HPC) solutions (hardware and software) delivers significant value to technical computing clients who are in need of cost-effective, highly scalable, and robust solutions. First, the book provides a high-level overview of the HPC solution, including all of the components that makes the HPC cluster: IBM Power System S822LC (8335-GTB), software components, interconnect switches, and the IBM Spectrum™ Scale parallel file system. Then, the publication is divided in three parts: Part 1 focuses on the developers, Part 2 focuses on the administrators, and Part 3 focuses on the evaluators and planners of the solution. The IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights from vast amounts of client’s data so they can optimize business results, product development, and scientific discoveries.

IBM DB2 Web Query for i: The Nuts and Bolts

2017-05-11 O'Reilly Amazon

book

Rob Bestgen , Doug Mack , Lin Su , Simona Pacchiarini , Kathryn Steinbrink , Hernando Bedoya , Kevin Trisko , Jim Bainbridge , Mike Cain

data data-engineering relational-databases ibm-db2 BI HTML

Abstract Business Intelligence (BI) is a broad term that relates to applications that analyze data to understand and act on the key metrics that drive profitability in an enterprise. Key to analyzing that data is providing fast, easy access to it while delivering it in formats or tools that best fit the needs of the user. At the core of any BI solution are user query and reporting tools that provide intuitive access to data supporting a spectrum of users from executives to “power users,” from spreadsheet aficionados to the external Internet consumer. IBM® DB2® Web Query for i offers a set of modernized tools for a more robust, extensible, and productive reporting solution than the popular IBM Query for System i® tool (also known as IBM Query/400). IBM DB2 Web Query for i preserves investments in the reports that are developed with Query/400 by offering a choice of importing definitions into the new technology or continuing to run existing Query/400 reports as is. But, it also offers significant productivity and performance enhancements by leveraging the latest in DB2 for i query optimization technology. The DB2 Web Query for i product is a web-based query and report writing product that offers enhanced capabilities over the IBM Query for iSeries product (also commonly known as Query/400). IBM DB2 Web Query for i includes Query for iSeries technology to assist customers in their transition to DB2 Web Query. It offers a more modernized, Java based solution for a more robust, extensible, and productive reporting solution. DB2 Web Query provides the ability to query or build reports against data that is stored in DB2 for i (or Microsoft SQL Server) databases through browser-based user interface technologies: Build reports with ease through the web-based, ribbon-like InfoAssist tool that leverages a common look and feel that can extend the number of personnel that can generate their own reports. Simplify the management of reports by significantly reducing the number of report definitions that are required through the use of parameter driven reports. Deliver data to users in many different formats, including directly into spreadsheets, or in boardroom-quality PDF format, or viewed from the browser in HTML. Leverage advanced reporting functions, such as matrix reporting, ranking, color coding, drill-down, and font customization to enhance the visualization of DB2 data. DB2 Web Query offers features to import Query/400 definitions and enhance their look and functions. By using it, you can add OLAP-like slicing and dicing to the reports or view reports in disconnected mode for users on the go. This IBM Redbooks® publication provides a broad understanding of what can be done with the DB2 Web Query product. This publication is a companion of DB2 Web Query Tutorials, SG24-8378, which has a group of self-explanatory tutorials to help you get up to speed quickly.

Oracle on LinuxONE

2017-05-11 O'Reilly Amazon

book

Helene Grosch , David J Simpson , Armelle Chevé , Moshe Reder , Narjisse Zaki , Lydia Parziale , Sam Amsavelu

data data-engineering oracle-database-solutions Cloud Computing IBM Linux

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® LinuxONE. The enterprise-grade Linux on LinuxONE solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from LinuxONE. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Geographic Information Systems in Action

2017-05-08 O'Reilly Amazon

book

Michael N. DeMers

data data-engineering location-data geographic-information-system-gis geographic information system (gis) GIS

TRY (FREE for 14 days), OR RENT this title offers content that not only teaches GIS techniques, the ideas behind them, and how they work, but also—through a series of graded, hands-on content oriented activities--challenges students to think through what they are doing and why before going on to practical ArcGIS exercises. This deeper understanding, and the superior problem-solving skills students gain from using the text, will also make them highly valuable employees, in addition to well-informed students. : www.wileystudentchoice.com Geographic Information Systems in Action , 1st Edition

Exam Ref 70-768 Developing SQL Data Models, First Edition

2017-05-05 O'Reilly Amazon

book

Stacia Varga

data data-engineering data-models BI DAX Microsoft

Prepare for Microsoft Exam 70-768–and help demonstrate your real-world mastery of Business Intelligence (BI) solutions development with SQL Server 2016 Analysis Services (SSAS), including modeling and queries. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: • Design a multidimensional BI semantic model • Design a tabular BI semantic model • Develop queries using Multidimensional Expressions (MDX) and Data Analysis Expressions (DAX) • Configure and maintain SSAS This Microsoft Exam Ref: • Organizes its coverage by exam objectives • Features strategic, what-if scenarios to challenge you • Assumes you are a database or BI professional with experience creating models, writing MDX or DAX queries, and using SSAS

Oracle Application Express: Build Powerful Data-Centric Web Apps with APEX

2017-05-05 O'Reilly Amazon

book

Brian Spendolini , Arie Geller

data data-engineering oracle-database-solutions JavaScript Oracle SQL

This Oracle Press guide shows how to build and deploy powerful Web applications with Oracle Application Express and features full coverage of the latest version, APEX 5.0 This comprehensive volume from Oracle Press offers up-to-date coverage of Oracle Application Express (APEX), Oracle’s rapid development tool for the Web developer. APEX is an entirely Web-based framework that comes built into every edition of Oracle Database—its backbone is Oracle’s powerful PL/SQL language, alongside the most advanced Web development technologies like HTML5, mobile development, and full support of CSS and JavaScript. APEX enables anyone—from novice user to seasoned developer—to easily create Web applications that are powerful, reliable, and highly scalable. Oracle Application Express: Build Powerful Data‐Centric Web Apps lays out basic information about APEX concepts before delving into the unparalleled power of the platform and describing the new features in version 5.0. You will discover how to install and configure APEX, work with the Application Builder and Page Designer, use built-in wizards, and design custom Web apps. Teaches the cleanest and fastest builds for high-performance, secure web applications Shows how to effectively migrate legacy applications into a modern Web-based environment Authored by early adopters of APEX 5.0 who have been active in the APEX community for years

IBM z Systems Qualified DWDM Ciena 6500 Packet-Optical Platform Platform Release 10.21

2017-05-03 O'Reilly Amazon

book

Andrew Crimmins , Octavian Lascu , Pasquale PJ Catalano

data data-engineering IBM

This IBM® Redpaper™ publication is one in a series that describes IBM z Systems® qualified dense wavelength division multiplexing (DWDM) vendor products for IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) solutions with Server Time Protocol (STP). The protocols that are described in this paper are used for IBM supported solutions that require cross-site connectivity of a multisite Parallel Sysplex or remote copy technologies, which can include GDPS and non GDPS applications. GDPS qualification testing is conducted at the IBM Vendor Solutions Connectivity (VSC) Lab in Poughkeepsie, NY. IBM and Ciena completed qualification testing of the Ciena 6500 Packet-Optical Packet-Optical platform. This paper describes the applicable environments, protocols, and topologies that are qualified for and supported by z Systems for connecting through the Ciena 6500 Packet-Optical platform hardware and software, release level 10.21. This paper is intended for anyone who wants to learn more about Ciena 6500 Packet-Optical release level 10.21. This document is not meant to determine qualified products. To ensure that the planned products to be implemented are qualified, registered users can see the IBM Resource Link® for current information about qualified DWDM vendor products. For more information about IBM Redbooks® publications for z Systems qualified DWDM vendor products, see the IBM Redbooks website.

IBM Geographically Dispersed Resiliency for IBM Power Systems

2017-05-02 O'Reilly Amazon

book

Dino Quintero , Bunphot Chuprasertsuk , Fabio Martins , Bernhard Buehler , Matthew W Radford , Shawn Bodily , Maria-Katharina Esser , Anthony Steel , Bing He

data data-engineering IBM

Abstract This IBM® Redbooks® publication introduces and provides a broad understanding of the new IBM Geographically Dispersed Resiliency for IBM Power Systems™ solution. The IBM Geographically Dispersed Resiliency for Power Systems solution is a set of software components that together provide a disaster recovery (DR) mechanism for virtual machines (VMs) running on an IBM POWER7® processor-based server or later. This document describes various components, subsystems, and tasks that are associated with the IBM Geographically Dispersed Resiliency for Power Systems solution. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for providing high availability (HA) and DR solutions and support on IBM Power Systems servers.

IBM zPDT Guide and Reference: System z Personal Development Tool

2017-05-02 O'Reilly Amazon

book

Bill Ogden

data data-engineering IBM Linux

Abstract This IBM® Redbooks® publication provides both introductory information and technical details about the IBM System z® Personal Development Tool (IBM zPDT®), which produces a small System z environment suitable for application development. zPDT is a PC Linux application. When zPDT is installed (on Linux), normal System z operating systems (such as IBM z/OS®) can be run on it. zPDT provides the basic System z architecture and emulated IBM 3390 disk drives, 3270 interfaces, OSA interfaces, and so on. The systems that are discussed in this document are complex. They have elements of Linux (for the underlying PC machine), IBM z/Architecture® (for the core zPDT elements), System z I/O functions (for emulated I/O devices), z/OS (the most common System z operating system), and various applications and subsystems under z/OS. The reader is assumed to be familiar with general concepts and terminology of System z hardware and software elements, and with basic PC Linux characteristics. This book provides the primary documentation for zPDT.

Machine Learning with Spark - Second Edition

2017-04-28 O'Reilly Amazon

book

Rajdeep Dua , Brian O'Neill , Manpreet Singh Ghotra , Stephen Boesch , Nick Pentreath

data data-engineering apache-spark AI/ML Big Data Python

Dive into the world of distributed machine learning with Apache Spark, a powerful framework for handling, processing, and analyzing big data. This book will take you through implementing popular machine learning algorithms using Spark ML, covering end-to-end workflows such as data preparation, model building, predictive analysis, and text processing. What this Book will help me do Learn to implement scalable machine learning solutions using Spark ML. Develop the skills to set up and configure Apache Spark environments. Master the application of machine learning techniques like clustering, classification, and regression with Spark. Efficiently handle and process large-scale datasets using Spark tools. Put Spark's capabilities to work in building real-world distributed data processing solutions. Author(s) None Dua and None Ghotra bring a wealth of experience in big data and machine learning to this book. They have been involved in building scalable data systems and implementing machine learning solutions in various industry scenarios. Their approach is hands-on and focused on teaching practical, actionable knowledge. Who is it for? This book is perfect for data enthusiasts, data engineers, and machine learning practitioners who are familiar with Python and Scala, eager to apply machine learning concepts in distributed environments. It's aimed at professionals looking to develop their skills in building scalable data systems and implementing advanced machine learning workflows in Spark.

PostgreSQL Administration Cookbook, 9.5/9.6 Edition - Third Edition

2017-04-27 O'Reilly Amazon

book

Simon Riggs , Gabriele Bartolini , Gianni Ciolli

data data-engineering relational-databases postgresql Cyber Security

Dive into the world of PostgreSQL database management with this hands-on guide. This book takes you through essential administration tasks and advanced features of PostgreSQL 9.5 and 9.6, equipping you with the tools to efficiently manage and optimize your databases. What this Book will help me do Set up and configure PostgreSQL servers for optimal performance and reliability. Implement robust backup and disaster recovery strategies tailored to your needs. Master replication techniques including high availability and logical replication. Analyze and troubleshoot performance issues with advanced diagnostics tools. Secure and protect your databases using best practices and security features. Author(s) Simon Riggs, Gianni Ciolli, and None Bartolini are leading figures in the PostgreSQL community. With extensive experience in database architecture and system administration, they have guided numerous professionals in mastering PostgreSQL. Their practical insights and clear instructions make this book an invaluable resource. Who is it for? This book is ideal for system administrators, database administrators, and developers who are responsible for database management. Whether you're aspiring to deepen your expertise in PostgreSQL or are already working with databases and seeking advanced knowledge, this guide caters to intermediate to advanced skill levels.

Learning Apache Cassandra - Second Edition

2017-04-25 O'Reilly Amazon

book

Sandeep Yarabarla , Graham Doman

data data-engineering nosql-databases Cassandra Java NoSQL

Learning Apache Cassandra is an engaging and in-depth guide to understanding the concepts and practical applications of Apache Cassandra, one of the most robust distributed NoSQL databases available. By the end of this book, you will have the necessary skills to design and manage scalable, high-performance database solutions tailored for modern applications. What this Book will help me do Set up Apache Cassandra and its multi-node clusters confidently and efficiently. Master schema design principles, including the use of composite keys, collections, and user-defined types. Implement efficient query strategies with secondary indexes and materialized views. Understand data distribution strategies and tune consistency levels for different application requirements. Dive into advanced topics like user-defined functions, batch operations, and Java client optimizations for scalable database architecture. Author(s) None Yarabarla brings practical expertise and deep knowledge to the subject of Apache Cassandra. With hands-on industry experience designing scalable database solutions, the author ensures complex topics are presented through clear and actionable insights. This is coupled with real-world scenarios to help you apply your learning effectively. Who is it for? This book is ideal for developers and IT professionals interested in learning Apache Cassandra from scratch or enhancing their NoSQL database expertise. It is particularly suited for those transitioning from relational databases to NoSQL systems. Even without prior coding experience, readers can expect to follow along and achieve practical results.

Oracle Database 12c Release 2 New Features

2017-04-24 O'Reilly Amazon

book

Robert G. Freeman , Bob Bryla

data data-engineering oracle-database-solutions BI DWH Oracle

Leverage the New and Improved Features of Oracle Database 12c Written by Oracle experts Bob Bryla and Robert G. Freeman, this Oracle Press guide describes the myriad new and enhanced capabilities available in the latest Oracle Database release. Inside, you’ll find everything you need to know to get up and running quickly on Oracle Database 12c Release 2. Supported by contributions from Oracle expert Eric Yen, Oracle Database 12c Release 2 New Features offers detailed coverage of: • Installing Oracle Database 12c and Grid Infrastructure • Architectural changes, such as Oracle Multitenant • The most current information on upgrading and migrating to Oracle Database 12c • The pre-upgrade information tool and parallel processing for database upgrades • Oracle Real Application Clusters new features, such as Oracle Flex Cluster, Oracle Flex Automatic Storage Management, and Oracle Automatic Storage Management Cluster File System • Enhanced and new online operations: tables, indexes, and PDBs • Oracle RMAN enhancements, including cross-platform backup and recovery • Oracle Data Guard improvements, such as Fast Sync, and Oracle Active Data Guard new features, such as Far Sync • SQL, PL/SQL, DML, and DDL new features • Improvements to partitioning manageability, performance, and availability • Advanced business intelligence and data warehousing capabilities • Security enhancements, including privileges analysis, data redaction, and new administrative-level privileges • Manageability, performance, and optimization improvements

IBM GDPS Family: An introduction to Concepts and Capabilities

2017-04-17 O'Reilly Amazon

book

John Thompson , Sim Schindel , David Clitherow , Marie-France Narbey

data data-engineering IBM

Abstract This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings, and the additional planning and implementation services available from IBM are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you do read all the chapters, be aware that some information is intentionally repeated.

MQTT Essentials - A Lightweight IoT Protocol

2017-04-14 O'Reilly Amazon

book

Gastón C. Hillar

data data-engineering streaming-messaging rabbitmq IoT Java

Dive into the world of MQTT, the preferred protocol for IoT and M2M communication. This book provides a comprehensive guide to understanding, implementing, and securing MQTT-based systems, enabling readers to create efficient and lightweight communication networks for their connected devices. What this Book will help me do Understand the underlying principles and protocol structure of MQTT. Securely configure and deploy an MQTT broker for communication. Develop Python, Java, and JavaScript-based MQTT client applications. Utilize MQTT for real-world IoT use cases such as sensor data interchange. Optimize MQTT usage for low-latency and lightweight communication scenarios. Author(s) Gastón C. Hillar is an experienced IoT developer and author with a deep understanding of IoT protocols and technologies. With years of practical experience in designing and deploying secure IoT systems, Gastón specializes in breaking down complex topics into digestible and actionable insights. Through his books, he aims to empower developers to effectively integrate IoT technologies into their work. Who is it for? The book is tailored for software developers and engineers who are looking to integrate MQTT into their IoT solutions. It's ideal for individuals with pre-existing knowledge in IoT concepts who want to deepen their understanding of MQTT. Readers seeking to secure, optimize, and utilize MQTT for communication and automation tasks will find it especially useful. It's a perfect fit for those working with Python, Java, and web technologies in IoT contexts.

Tabular Modeling in Microsoft SQL Server Analysis Services, Second Edition

2017-04-12 O'Reilly Amazon

book

Alberto Ferrari , Marco Russo

data data-engineering relational-databases microsoft-sql-server Agile/Scrum Analytics

Build agile and responsive business intelligence solutions Create a semantic model and analyze data using the tabular model in SQL Server 2016 Analysis Services to create corporate-level business intelligence (BI) solutions. Led by two BI experts, you will learn how to build, deploy, and query a tabular model by following detailed examples and best practices. This hands-on book shows you how to use the tabular model’s in-memory database to perform rapid analytics—whether you are new to Analysis Services or already familiar with its multidimensional model. Discover how to: • Determine when a tabular or multidimensional model is right for your project • Build a tabular model using SQL Server Data Tools in Microsoft Visual Studio 2015 • Integrate data from multiple sources into a single, coherent view of company information • Choose a data-modeling technique that meets your organization’s performance and usability requirements • Implement security by establishing administrative and data user roles • Define and implement partitioning strategies to reduce processing time • Use Tabular Model Scripting Language (TMSL) to execute and automate administrative tasks • Optimize your data model to reduce the memory footprint for VertiPaq • Choose between in-memory (VertiPaq) and pass-through (DirectQuery) engines for tabular models • Select the proper hardware and virtualization configurations • Deploy and manipulate tabular models from C# and PowerShell using AMO and TOM libraries Get code samples, including complete apps, at: https://aka.ms/tabular/downloads About This Book • For BI professionals who are new to SQL Server 2016 Analysis Services or already familiar with previous versions of the product, and who want the best reference for creating and maintaining tabular models. • Assumes basic familiarity with database design and business analytics concepts.

Implementing the IBM Storwize V7000 and IBM Spectrum Virtualize V7.8

2017-04-10 O'Reilly Amazon

book

Jon Tate , Frank Enders , Catarina Castro , Giulio Fiscella , Dharmesh Kamdar , Paulo Tomiyoshi Takeda

data data-engineering IBM Marketing

Abstract Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® introduces the IBM Storwize® V7000 solution powered by IBM Spectrum Virtualize™, which is an innovative storage offering that delivers essential storage efficiency technologies and exceptional ease of use and performance, all integrated into a compact, modular design that is offered at a competitive, midrange price. The IBM Storwize V7000 solution incorporates some of the top IBM technologies that are typically found only in enterprise-class storage systems, raising the standard for storage efficiency in midrange disk systems. This cutting-edge storage system extends the comprehensive storage portfolio from IBM and can help change the way organizations address the ongoing information explosion. This IBM Redbooks® publication introduces the features and functions of the IBM Storwize V7000 and IBM Spectrum Virtualize V7.8 system through several examples. This book is aimed at pre-sales and post-sales technical support and marketing and storage administrators. It helps you understand the architecture of the Storwize V7000, how to implement it, and how to take advantage of its industry-leading functions and features.

Sams Teach Yourself Hadoop in 24 Hours

2017-04-07 O'Reilly Amazon

book

Jeffrey Aven

data data-engineering Hadoop API Big Data Cloud Computing

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

2017-04-07 O'Reilly Amazon

book

George Tillmann

data data-engineering data-models Big Data Cassandra Data Modelling

Design great databases—from logical data modeling through physical schema definition. You will learn a framework that finally cracks the problem of merging data and process models into a meaningful and unified design that accounts for how data is actually used in production systems. Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models. The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS. Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience. Presents an end-to-end framework from logical data modeling through physical schema definition. Includes lessons learned, techniques, and tricks that can turn a database disaster into a success. Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL Server What You'll Learn Create logical data models that accurately reflect the real world of the user Create usage scenarios reflecting how applications will use a new database Merge static data models with dynamic process models to create resilient yet flexible database designs Support application requirements by creating responsive database schemas in any database architecture Cope with big data and unstructured data for transaction processing and decision support systems Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop Who This Book Is For System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

2017-04-06 O'Reilly Amazon

book

Itzik Ben-Gan

data data-engineering relational-databases microsoft-sql-server transact-sql Azure

Prepare for Microsoft Exam 70-761–and help demonstrate your real-world mastery of SQL Server 2016 Transact-SQL data management, queries, and database programming. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: Filter, sort, join, aggregate, and modify data Use subqueries, table expressions, grouping sets, and pivoting Query temporal and non-relational data, and output XML or JSON Create views, user-defined functions, and stored procedures Implement error handling, transactions, data types, and nulls This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have experience working with SQL Server as a database administrator, system engineer, or developer Includes downloadable sample database and code for SQL Server 2016 SP1 (or later) and Azure SQL Database Querying Data with Transact-SQL About the Exam Exam 70-761 focuses on the skills and knowledge necessary to manage and query data and to program databases with Transact-SQL in SQL Server 2016. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of essential skills for building and implementing on-premises and cloud-based databases across organizations. Exam 70-762 (Developing SQL Databases) is also required for MCSA: SQL 2016 Database Development certification. See full details at: microsoft.com/learning

Oracle SQL Tuning with Oracle SQLTXPLAIN: Oracle Database 12c Edition, Second Edition

2017-04-06 O'Reilly Amazon

book

Stelios Charalambides

data data-engineering SQL Oracle

Learn through this practical guide to SQL tuning how Oracle's own experts do it, using a freely downloadable tool called SQLTXPLAIN. This new edition has been expanded to include AWR, Oracle 12c Statistics, interpretation of SQL Monitor reports, Parallel execution, and Exadata-related features. Reading this book and using SQL helps you learn to tune even the most complex SQL, and you'll learn to do it quickly, without the huge learning curve usually associated with tuning as a whole. Firmly based in real-world problems, this book helps you reclaim system resources and avoid the most common bottleneck in overall performance, badly tuned SQL. You'll learn how the optimizer works, how to take advantage of its latest features, and when it's better to turn them off. Best of all, the book is updated to cover the very latest feature set in Oracle Database 12c. Covers AWR report integration Helps with SQL Monitor Report Interpretation Provides a reliable method that is repeatable Shows the very latest tuning features in Oracle Database 12c Enables the building of test cases without affecting production What You Will Learn Identify how and why complex SQL has gone wrong Correctly interpret AWR reports generated via SQLTXPLAIN Collect the best statistics for your environment Know when to invoke built-in tuning facilities Recognize when tuning is not the solution Spot the steps in a SQL statement's execution plan that are critical to performance of that statement Modify your SQL to solve performance problems and increase the speed and throughput of production database systems Who This Book Is For Anyone who deals with SQL and SQL tuning. Both developers and DBAs will benefit from learning how to use the SQLTXPLAIN tool, and from the problem solving methodology in this book.

Mastering Spark for Data Science

2017-03-29 O'Reilly Amazon

book

Matthew Hallett , David George , Antoine Amend , Andrew Morgan

data data-engineering apache-spark AI/ML Analytics API

Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products About This Book Develop and apply advanced analytical techniques with Spark Learn how to tell a compelling story with data science using Spark’s ecosystem Explore data at scale and work with cutting edge data science methods Who This Book Is For This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, especially those who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes. What You Will Learn Learn the design patterns that integrate Spark into industrialized data science pipelines See how commercial data scientists design scalable code and reusable code for data science services Explore cutting edge data science methods so that you can study trends and causality Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs Find out how Spark can be used as a universal ingestion engine tool and as a web scraper Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams Study advanced Spark concepts, solution design patterns, and integration architectures Demonstrate powerful data science pipelines In Detail Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs. This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more. You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly. Style and approach This is an advanced guide for those with beginner-level familiarity with the Spark architecture and working with Data Science applications. Mastering Spark for Data Science is a practical tutorial that uses core Spark APIs and takes a deep dive into advanced libraries including: Spark SQL, visual streaming, and MLlib. This book expands on titles like: Machine Learning with Spark and Learning Spark. It is the next learning curve for those comfortable with Spark and looking to improve their skills.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

IBM PowerHA SystemMirror V7.2.1 for IBM AIX Updates

Oracle on IBM z Systems

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V7.8

POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition

IBM DB2 Web Query for i: The Nuts and Bolts

Oracle on LinuxONE

Geographic Information Systems in Action

Exam Ref 70-768 Developing SQL Data Models, First Edition

Oracle Application Express: Build Powerful Data-Centric Web Apps with APEX

IBM z Systems Qualified DWDM Ciena 6500 Packet-Optical Platform Platform Release 10.21

IBM Geographically Dispersed Resiliency for IBM Power Systems

IBM zPDT Guide and Reference: System z Personal Development Tool

Machine Learning with Spark - Second Edition

PostgreSQL Administration Cookbook, 9.5/9.6 Edition - Third Edition

Learning Apache Cassandra - Second Edition

Oracle Database 12c Release 2 New Features

IBM GDPS Family: An introduction to Concepts and Capabilities

MQTT Essentials - A Lightweight IoT Protocol

Tabular Modeling in Microsoft SQL Server Analysis Services, Second Edition

Implementing the IBM Storwize V7000 and IBM Spectrum Virtualize V7.8

Sams Teach Yourself Hadoop in 24 Hours

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

Oracle SQL Tuning with Oracle SQLTXPLAIN: Oracle Database 12c Edition, Second Edition

Mastering Spark for Data Science