O'Reilly Data Engineering Books

PostgreSQL High Performance Cookbook

2017-03-29 O'Reilly Amazon

book

Dinesh Kumar , Chitij Chauhan

data data-engineering relational-databases postgresql

This book is your definitive guide to understanding and improving PostgreSQL database performance. You'll learn about query optimization, database monitoring, and advanced memory and configuration techniques. With examples and clear explanations, you'll gain the skills to identify performance bottlenecks and make your database system highly efficient. What this Book will help me do Effectively optimize PostgreSQL queries to enhance response times. Utilize robust server monitoring techniques to identify and address inefficiencies. Implement memory optimization strategies to maximize server performance. Master replication and failover methods for high availability. Build strategies for secure and efficient database migrations. Author(s) Dinesh Kumar and None Chauhan are experienced database professionals with years of expertise in working with PostgreSQL. They have been involved in database design, optimization, and innovations in open-source database technologies. Their teaching approach is clear and actionable, making the topics accessible to both beginners and seasoned professionals. Who is it for? This book is ideally suited for developers and database administrators with a basic understanding of PostgreSQL. If you're seeking practical guidance to enhance your PostgreSQL performance tuning and maintenance skills, this book is designed for you. It covers concepts for professionals looking to advance their database expertise. Beginners in database management who are motivated to learn advanced techniques will also find it approachable.

Learning Apache Spark 2

2017-03-28 O'Reilly Amazon

book

Muhammad Asif Abbasi

data data-engineering apache-spark AI/ML Analytics Big Data

Dive into the world of Big Data with "Learning Apache Spark 2". This book introduces you to the powerful Apache Spark framework, tailored for real-time data analytics and machine learning. Through practical examples and real-world use-cases, you'll gain hands-on experience in leveraging Spark's capabilities for your data processing needs. What this Book will help me do Master the fundamentals of Apache Spark 2 and its new features. Effectively use Spark SQL, MLlib, RDDs, GraphX, and Spark Streaming to tackle real-world challenges. Gain skills in data processing, transformation, and analysis with Spark. Deploy and operate your Spark applications in clustered environments. Develop your own recommendation engines and predictive analytics models with Spark. Author(s) None Abbasi brings a wealth of expertise in Big Data technologies with a keen focus on simplifying complex concepts for learners. With substantial experience working in data processing frameworks, their approach to teaching creates an engaging and practical learning experience. With "Learning Apache Spark 2", None empowers readers to confidently tackle challenges in Big Data processing and analytics. Who is it for? This book is ideal for aspiring Big Data professionals seeking an accessible introduction to Apache Spark. Beginners in Spark will find step-by-step guidance, while those familiar with earlier versions will appreciate the insights into Spark 2's new features. Familiarity with Big Data concepts and Scala programming is recommended for optimal understanding.

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

2017-03-22 O'Reilly Amazon

book

Richard Niemiec

data data-engineering oracle-database-solutions Cloud Computing Oracle Cyber Security

Proven Database Optimization Solutions―Fully Updated for Oracle Database 12c Release 2 Systematically identify and eliminate database performance problems with help from Oracle Certified Master Richard Niemiec. Filled with real-world case studies and best practices, Oracle Database 12c Release 2 Performance Tuning Tips and Techniques details the latest monitoring, troubleshooting, and optimization methods. Find out how to identify and fix bottlenecks on premises and in the cloud, configure storage devices, execute effective queries, and develop bug-free SQL and PL/SQL code. Testing, reporting, and security enhancements are also covered in this Oracle Press guide. • Properly index and partition Oracle Database 12c Release 2 • Work effectively with Oracle Cloud, Oracle Exadata, and Oracle Enterprise Manager • Efficiently manage disk drives, ASM, RAID arrays, and memory • Tune queries with Oracle SQL hints and the Trace utility • Troubleshoot databases using V$ views and X$ tables • Create your first cloud database service and prepare for hybrid cloud • Generate reports using Oracle’s Statspack and Automatic Workload Repository tools • Use sar, vmstat, and iostat to monitor operating system statistics

SQL Server 2016 Developer's Guide

2017-03-22 O'Reilly Amazon

book

Milo≈° Radivojeviƒá , William Durkin , Dejan Sarka

data data-engineering relational-databases microsoft-sql-server Analytics JSON

SQL Server 2016 Developer's Guide provides an in-depth overview of the new features and enhancements introduced in SQL Server 2016 that can significantly improve your development process. This book covers robust techniques for building high-performance, secure database applications while leveraging cutting-edge functionalities such as Stretch Database, temporal tables, and enhanced In-Memory OLTP capabilities. What this Book will help me do Master the new development features introduced in SQL Server 2016 and understand their applications. Use In-Memory OLTP enhancements to significantly boost application performance. Efficiently manage and analyze data using temporal tables and JSON integration. Explore SQL Server security enhancements to ensure data safety and access control. Gain insights into integrating R with SQL Server 2016 for advanced analytics. Author(s) None Radivojević, Dejan Sarka, and William Durkin are experienced database developers and architects with a strong focus on SQL Server technologies. They bring years of practical experience and a clear, insightful approach to teaching complex concepts. Their expertise shines in this comprehensive guide, providing readers with both foundational knowledge and advanced techniques. Who is it for? This guide is perfect for database developers and solution architects looking to harness the full potential of SQL Server 2016's new features. It's intended for professionals with prior experience in SQL Server or similar platforms who aim to develop efficient, high-performance applications. You'll benefit from this book if you are keen to master SQL Server 2016 and elevate your development skills.

Introduction to Bayesian Estimation and Copula Models of Dependence

2017-03-20 O'Reilly Amazon

book

Arkady Shemyakin , Alexander Kniazev

data data-engineering data-models Microsoft Monte Carlo

Presents an introduction to Bayesian statistics, presents an emphasis on Bayesian methods (prior and posterior), Bayes estimation, prediction, MCMC,Bayesian regression, and Bayesian analysis of statistical modelsof dependence, and features a focus on copulas for risk management Introduction to Bayesian Estimation and Copula Models of Dependence emphasizes the applications of Bayesian analysis to copula modeling and equips readers with the tools needed to implement the procedures of Bayesian estimation in copula models of dependence. This book is structured in two parts: the first four chapters serve as a general introduction to Bayesian statistics with a clear emphasis on parametric estimation and the following four chapters stress statistical models of dependence with a focus of copulas. A review of the main concepts is discussed along with the basics of Bayesian statistics including prior information and experimental data, prior and posterior distributions, with an emphasis on Bayesian parametric estimation. The basic mathematical background of both Markov chains and Monte Carlo integration and simulation is also provided. The authors discuss statistical models of dependence with a focus on copulas and present a brief survey of pre-copula dependence models. The main definitions and notations of copula models are summarized followed by discussions of real-world cases that address particular risk management problems. In addition, this book includes: • Practical examples of copulas in use including within the Basel Accord II documents that regulate the world banking system as well as examples of Bayesian methods within current FDA recommendations • Step-by-step procedures of multivariate data analysis and copula modeling, allowing readers to gain insight for their own applied research and studies • Separate reference lists within each chapter and end-of-the-chapter exercises within Chapters 2 through 8 • A companion website containing appendices: data files and demo files in Microsoft® Office Excel®, basic code in R, and selected exercise solutions Introduction to Bayesian Estimation and Copula Models of Dependence is a reference and resource for statisticians who need to learn formal Bayesian analysis as well as professionals within analytical and risk management departments of banks and insurance companies who are involved in quantitative analysis and forecasting. This book can also be used as a textbook for upper-undergraduate and graduate-level courses in Bayesian statistics and analysis. ARKADY SHEMYAKIN, PhD, is Professor in the Department of Mathematics and Director of the Statistics Program at the University of St. Thomas. A member of the American Statistical Association and the International Society for Bayesian Analysis, Dr. Shemyakin's research interests include informationtheory, Bayesian methods of parametric estimation, and copula models in actuarial mathematics, finance, and engineering. ALEXANDER KNIAZEV, PhD, is Associate Professor and Head of the Department of Mathematics at Astrakhan State University in Russia. Dr. Kniazev's research interests include representation theory of Lie algebras and finite groups, mathematical statistics, econometrics, and financial mathematics.

Designing Data-Intensive Applications

2017-03-16 O'Reilly Amazon

book

Martin Kleppmann

data data-engineering NoSQL RDBMS

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

DS8000 Copy Services

2017-03-15 O'Reilly Amazon

book

Lukasz Drózda , Warren Stanley , Roland Wolf , Lisa Gundy , Alcides Bertazi , Axel Westphal , Michael Frankenberg , Bert Dufrasne , Cay-Uwe Kulzer

data data-engineering IBM

Abstract This IBM® Redbooks® publication helps you plan, install, tailor, configure, and manage Copy Services on the IBM DS8000® operating in an IBM z Systems® or Open Systems environment. This book helps you design and implement a new Copy Services installation or migrate from an existing installation. It includes hints and tips to maximize the effectiveness of your installation, and information about tools and products to automate Copy Services functions. It is intended for anyone who needs a detailed and practical understanding of the DS8000 Copy Services.

Understanding Metadata

2017-03-15 O'Reilly Amazon

book

Scott Gidley , Federico Castanedo

data data-engineering metadata Big Data Data Governance Data Lake

One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging. This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture. This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include: Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab Tooling from open source projects, including Teradata Kylo and Informatica Startups such as Trifacta and Zaloni that provide best of breed technology

Oracle Database Upgrade and Migration Methods: Including Oracle 12c Release 2

2017-03-01 O'Reilly Amazon

book

Nassyam Basha , K M Krishnakumar , Y V Ravikumar

data data-engineering oracle-database-solutions Oracle

Learn all of the available upgrade and migration methods in detail to move to Oracle Database version 12c. You will become familiar with database upgrade best practices to complete the upgrade in an effective manner and understand the Oracle Database 12c patching process. So it’s time to upgrade Oracle Database to version 12c and you need to choose the appropriate method while considering issues such as downtime. This book explains all of the available upgrade and migration methods so you can choose the one that suits your environment. You will be aware of the practical issues and proactive measures to take to upgrade successfully and reduce unexpected issues. With every release of Oracle Database there are new features and fixes to bugs identified in previous versions. As each release becomes obsolete, existing databases need to be upgraded. explains each method along with its strategy, requirements, steps, and known issues that have been seen so far. This book also compares the methods to help you choose the proper method according to your constraints. Oracle Database Upgrade and Migration Methods Also included in this book: Pre-requisite patches and pre-upgrade steps Patching to perform changes at the binary and database level to apply bug fixes What You Will Learn: Understand the need and importance of database upgrading and migration Be aware of the challenges associated with database upgrade decision making Compare all upgrade/migration methods Become familiar with database upgrade best practices and recommendations Understand database upgrade concepts in high availability and multi-tenant environments Know the database downgrade steps in case the upgraded database isn’t compatible with the environment Discover the features and benefits to the organization when it moves from the old database version to the latest database version Understand Oracle 12c patching concepts Who This Book Is For: Core database administrators, solution architects, business consultants, and database architects

Mastering Elastic Stack

2017-02-28 O'Reilly Amazon

book

Ravi Kumar Gupta , Yuvraj Gupta

data data-engineering search elasticsearch elastic-stack-elk-stack elastic stack (elk stack)

Mastering Elastic Stack is your complete guide to advancing your data analytics expertise using the ELK Stack. With detailed coverage of Elasticsearch, Logstash, Kibana, Beats, and X-Pack, this book equips you with the skills to process and analyze any type of data efficiently. Through practical examples and real-world scenarios, you'll gain the ability to build end-to-end pipelines and create insightful dashboards. What this Book will help me do Build and manage log pipelines using Logstash, Beats, and Elasticsearch for real-time analytics. Develop advanced Kibana dashboards to visualize and interpret complex datasets. Efficiently utilize X-Pack features for alerting, monitoring, and security in the Elastic Stack. Master plugin customization and deployment for a tailored Elastic Stack environment. Apply Elastic Stack solutions to real-world cases for centralized logging and actionable insights. Author(s) The authors, None Kumar Gupta and None Gupta, are experienced technologists who have spent years working at the forefront of data processing and analytics. They are well-versed in Elasticsearch, Logstash, Kibana, and the Elastic ecosystem, having worked extensively in enterprise environments where these tools have transformed operations. Their passion for teaching and thorough understanding of the tools culminate in this comprehensive resource. Who is it for? The ideal reader is a developer already familiar with Elasticsearch, Logstash, and Kibana who wants to deepen their understanding of the stack. If you're involved in creating scalable data pipelines, analyzing complex datasets, or looking to implement centralized logging solutions in your work, this book is an excellent resource. It bridges the gap from intermediate to expert knowledge, allowing you to use the Elastic Stack effectively in various scenarios. Whether you are transitioning from a beginner or enhancing your skill set, this book meets your needs.

QGIS: Becoming a GIS Power User

2017-02-28 O'Reilly Amazon

book

Alex Mandel , Víctor Olaya Ferrero , Ben Mearns , Alexander Bruy , Anita Graser

data data-engineering location-data geographic-information-system-gis geographic information system (gis) Data Management

Master data management, visualization, and spatial analysis techniques in QGIS and become a GIS power user About This Book Learn how to work with various types of data and create beautiful maps using this easy-to-follow guide Give a touch of professionalism to your maps, both for functionality and look and feel, with the help of this practical guide This progressive, hands-on guide builds on a geo-spatial data and adds more reactive maps using geometry tools. Who This Book Is For If you are a user, developer, or consultant and want to know how to use QGIS to achieve the results you are used to from other types of GIS, then this learning path is for you. You are expected to be comfortable with core GIS concepts. This Learning Path will make you an expert with QGIS by showing you how to develop more complex, layered map applications. It will launch you to the next level of GIS users. What You Will Learn Create your first map by styling both vector and raster layers from different data sources Use parameters such as precipitation, relative humidity, and temperature to predict the vulnerability of fields and crops to mildew Re-project vector and raster data and see how to convert between different style formats Use a mix of web services to provide a collaborative data system Use raster analysis and a model automation tool to model the physical conditions for hydrological analysis Get the most out of the cartographic tools to in QGIS to reveal the advanced tips and tricks of cartography In Detail The first module Learning QGIS, Third edition covers the installation and configuration of QGIS. You'll become a master in data creation and editing, and creating great maps. By the end of this module, you'll be able to extend QGIS with Python, getting in-depth with developing custom tools for the Processing Toolbox. The second module QGIS Blueprints gives you an overview of the application types and the technical aspects along with few examples from the digital humanities. After estimating unknown values using interpolation methods and demonstrating visualization and analytical techniques, the module ends by creating an editable and data-rich map for the discovery of community information. The third module QGIS 2 Cookbook covers data input and output with special instructions for trickier formats. Later, we dive into exploring data, data management, and preprocessing steps to cut your data to just the important areas. At the end of this module, you will dive into the methods for analyzing routes and networks, and learn how to take QGIS beyond the out-of-the-box features with plug-ins, customization, and add-on tools. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Learning QGIS, Third Edition by Anita Graser QGIS Blueprints by Ben Mearns QGIS 2 Cookbook by Alex Mandel, Víctor Olaya Ferrero, Anita Graser, Alexander Bruy Style and approach This Learning Path will get you up and running with QGIS. We start off with an introduction to QGIS and create maps and plugins. Then, we will guide you through Blueprints for geographic web applications, each of which will teach you a different feature by boiling down a complex workflow into steps you can follow. Finally, you'll turn your attention to becoming a QGIS power user and master data management, visualization, and spatial analysis techniques of QGIS. Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Learning PySpark

2017-02-27 O'Reilly Amazon

book

Denny Lee , Tomasz Drabas

data data-engineering apache-spark PySpark AI/ML Big Data

"Learning PySpark" guides you through mastering the integration of Python with Apache Spark to build scalable and efficient data applications. You'll delve into Spark 2.0's architecture, efficiently process data, and explore PySpark's capabilities ranging from machine learning to structured streaming. By the end, you'll be equipped to craft and deploy robust data pipelines and applications. What this Book will help me do Master the Spark 2.0 architecture and its Python integration with PySpark. Leverage PySpark DataFrames and RDDs for effective data manipulation and analysis. Develop scalable machine learning models using PySpark's ML and MLlib libraries. Understand advanced PySpark features such as GraphFrames for graph processing and TensorFrames for deep learning models. Gain expertise in deploying PySpark applications locally and on the cloud for production-ready solutions. Author(s) Authors None Drabas and None Lee bring extensive experience in data engineering and Python programming. They combine a practical, example-driven approach with deep insights into Apache Spark's ecosystem. Their expertise and clarity in writing make this book accessible for individuals aiming to excel in big data technologies with Python. Who is it for? This book is best suited for Python developers who want to integrate Apache Spark 2.0 into their workflow to process large-scale data. Ideal readers will have foundational knowledge of Python and seek to build scalable data-intensive applications using Spark, regardless of prior experience with Spark itself.

Oracle Database 12c Release 2 In-Memory: Tips and Techniques for Maximum Performance

2017-02-22 O'Reilly Amazon

book

Joyjeet Banerjee

data data-engineering oracle-database-solutions Oracle

Master Oracle Database 12c Release 2’s powerful In-Memory option This Oracle Press guide shows, step-by-step, how to optimize database performance and cut transaction processing time using Oracle Database 12c Release 2 In-Memory. Oracle Database 12c Release 2 In-Memory: Tips and Techniques for Maximum Performance features hands-on instructions, best practices, and expert tips from an Oracle enterprise architect. You will learn how to deploy the software, use In-Memory Advisor, build queries, and interoperate with Oracle RAC and Multitenant. A complete chapter of case studies illustrates real-world applications. • Configure Oracle Database 12c and construct In-Memory enabled databases • Edit and control In-Memory options from the graphical interface • Implement In-Memory with Oracle Real Application Clusters • Use the In-Memory Advisor to determine what objects to keep In-Memory • Optimize In-Memory queries using groups, expressions, and aggregations • Maximize performance using Oracle Exadata Database Machine and In-Memory option • Use Swingbench to create data and simulate real-life system workloads

Mastering Elasticsearch 5.x - Third Edition

2017-02-21 O'Reilly Amazon

book

Bharvi Dixit

data data-engineering search elasticsearch Analytics Big Data

This comprehensive guide dives deep into the functionalities of Elasticsearch 5, the widely-used search and analytics engine. Leveraging the power of Apache Lucene, this book will help you understand advanced concepts like querying, indexing, and cluster management to build efficient and scalable search solutions. What this Book will help me do Master advanced features of Elasticsearch such as text scoring, sharding, and aggregation. Understand how to handle big data efficiently using Elasticsearch's architecture. Learn practical implementation techniques for Elasticsearch features through hands-on examples. Develop custom plugins for Elasticsearch to tailor its functionalities to specific needs. Scale and optimize Elasticsearch clusters for high performance in production environments. Author(s) Bharvi Dixit is an experienced software engineer and a recognized expert in implementing Elasticsearch solutions. With a strong background in distributed systems and database management, Bharvi's writing is informed by real-world experience and a focus on practical applications. Who is it for? This book is ideal for developers and data engineers with existing experience in Elasticsearch who wish to deepen their knowledge. It serves as a valuable resource for professionals tasked with creating scalable search applications. A working understanding of Elasticsearch basics and query DSL is recommended to fully benefit from this guide.

IBM Power Systems L and LC Server Positioning Guide

2017-02-16 O'Reilly Amazon

book

Scott Vetter , Andrew Laidlaw , Tonny Bastiaans

data data-engineering IBM ibm-power-systems

This IBM® Redpaper™ publication is written to assist you in locating the optimal server/workload fit within the IBM Power Systems™ L and IBM OpenPOWER LC product lines. IBM has announced several scale-out servers, and as a partner in the OpenPOWER organization, unique design characteristics that are engineered into the LC line have broadened the suite of available workloads beyond typical client OS hosting. This paper looks at the benefits of the Power Systems L servers and OpenPOWER LC servers, and how they are different, providing unique benefits for Enterprise workloads and use cases.

Big Data Now: 2016 Edition

2017-02-15 O'Reilly Amazon

book

O'Reilly Media, Inc.

data data-engineering AI/ML Big Data Cloud Computing

Now in its sixth edition, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve examined throughout 2016. This collection of blog posts, authored by leading thinkers and experts in the field, reflects a unique set of themes we’ve identified as gaining significant attention and traction. Our list of topics for 2016 includes: Careers in data Tools and architecture for big data Intelligent real-time applications Cloud infrastructure Machine learning: models and training Deep learning and artificial intelligence

Geospatial Data and Analysis

2017-02-15 O'Reilly Amazon

book

Jon Bruner , Bill Day , Aurelia Moser

data data-engineering location-data geographic-information-system-gis geographic information system (gis) Analytics

Geospatial data, or data with location information, is generated in huge volumes every day by billions of mobile phones, IoT sensors, drones, nanosatellites, and many other sources in an unending stream. This practical ebook introduces you to the landscape of tools and methods for making sense of all that data, and shows you how to apply geospatial analytics to a variety of issues, large and small. Authors Aurelia Moser, Jon Bruner, and Bill Day provide a complete picture of the geospatial analysis options available, including low-scale commercial desktop GIS tools, medium-scale options such as PostGIS and Lucene-based searching, and true big data solutions built on technologies such as Hadoop. You’ll learn when it makes sense to move from one type of solution to the next, taking increased costs and complexity into account. Explore the structure of basic webmaps, and the challenges and constraints involved when working with geo data Dive into low- to medium-scale mapping tools for use in backend and frontend web development Focus on tools for robust medium-scale geospatial projects that don’t quite justify a big data solution Learn about innovative platforms and software packages for solving issues of processing and storage of large-scale data Examine geodata analysis use cases, including disaster relief, urban planning, and agriculture and environmental monitoring

Cloud Data Sharing with IBM Spectrum Scale

2017-02-14 O'Reilly Amazon

book

Rob Basham , Amey Gokhale , Alexander Safonov , Ranjith Rajagopalan Nair , Ryan Marchese , Nikhil Khandelwal , Larry Coyne , Rishika Kedia , Arend Dittmer , Stan Li

data data-engineering IBM Cloud Computing

This IBM® Redpaper™ publication provides information to help you with the sizing, configuration, and monitoring of hybrid cloud solutions using the Cloud data sharing feature of IBM Spectrum Scale™. IBM Spectrum Scale, formerly IBM General Parallel File System (IBM GPFS™), is a scalable data and file management solution that provides a global namespace for large data sets along with several enterprise features. Cloud data sharing allows for the sharing and use of data between various cloud object storage types and IBM Spectrum Scale. Cloud data sharing can help with the movement of data in both directions, between file systems and cloud object storage, so that data is where it needs to be, when it needs to be there. This paper is intended for IT architects, IT administrators, storage administrators, and those who want to learn more about sizing, configuration, and monitoring of hybrid cloud solutions using IBM Spectrum Scale and Cloud data sharing.

IBM DS8880 Integrated Copy Services Manager and LDAP Client on the HMC

2017-02-09 O'Reilly Amazon

book

Bert Dufrasne , Jean Iyabi

data data-engineering IBM

IBM® Copy Services Manager (CSM) is a replication management solution that is based on the IBM Tivoli® Productivity Center for Replication technology. CSM inherits all the Tivoli Productivity Center for Replication capabilities and continues to provide Copy Services solutions for most IBM storage offerings. The IBM DS8880, starting with firmware Release 8.1, Licensed Machine Code (LMC) 8.8.10.xx.xx, includes CSM for the IBM System Storage® DS8000®, which is pre-installed on the Hardware Management Console (HMC). If you ordered the CSM feature code as part of your IBM DS8000 system configuration, you only need to activate CSM. CSM as installed on the HMC, or acquired separately, includes a lightweight build of the IBM WebSphere® Liberty server code, to use to authenticate CSM users through a Lightweight Directory Access Protocol (LDAP). The same integrated LDAP support can be used for remote authentication of DS8000 users. Furthermore, if you simply want to take advantage of the CSM LDAP client for DS8000 LDAP authentication, the CSM license and CSM activation are not required. This IBM Redpaper™ publication describes the requirements for setup and usage of CSM on the DS8000 HMC, for both Copy Services management and LDAP authentication.

PostgreSQL High Availability Cookbook - Second Edition

2017-02-08 O'Reilly Amazon

book

Shaun Thomas

data data-engineering relational-databases postgresql Linux

Master the essential strategies for ensuring high availability in PostgreSQL with this practical cookbook. You'll learn how to build resilient PostgreSQL database clusters that can withstand failures, safely replicate data, and scale to meet increasing demands, ensuring your application's reliability. What this Book will help me do Understand and apply replication techniques in PostgreSQL to protect your data and ensure consistency. Set up a robust database cluster using tools like Patroni or Pacemaker to automate failover and maintain availability. Learn hardware configuration best practices for building a strong database platform. Optimize resource usage in your PostgreSQL clusters with connection pooling techniques using pgpool and PgBouncer. Implement advanced monitoring and alerting solutions to effectively track and respond to potential issues in real-time. Author(s) Shaun Thomas is a seasoned database administrator and consultant specializing in PostgreSQL high availability and clustering solutions. With years of hands-on experience in building resilient and scalable database systems, Shaun shares actionable insights and methodologies in a clear and accessible manner. His real-world knowledge and passion for database reliability shine through in his practical and effective writing style, making this book an invaluable resource. Who is it for? This book is perfect for Linux system administrators and PostgreSQL DBAs seeking to enhance the reliability and resilience of their database systems. If you're responsible for reducing downtime, improving failover processes, or managing databases in high-demand scenarios, this book provides the tools and techniques you need. It's especially helpful for professionals looking to deepen their understanding of PostgreSQL-specific solutions to high availability challenges.

IBM Hyper-Scale Manager for IBM Spectrum Accelerate Family: IBM XIV, IBM FlashSystem A9000 and A9000R, and IBM Spectrum Accelerate

2017-02-07 O'Reilly Amazon

book

Lisa Martinez , Markus Oscheka , Bertrand Dufrasne , Roger Eriksson

data data-engineering IBM Cyber Security

This IBM® Redbooks® publication describes storage management functions and their configuration and usage with the IBM Hyper-Scale Manager management graphical user interface (GUI) for IBM FlashSystem® A9000 and A9000R, IBM XIV® Gen3, and IBM Spectrum™ Accelerate software. The web-based GUI provides a revolutionary object-centered interface design that is aimed toward ease of use together with enhanced efficiency for storage administrators. The first chapter describes general features of the GUI and installation of the IBM Hyper-Scale Manager server. Subsequent chapters illustrate some typical GUI actions, among many other possibilities, to manage and configure the storage systems, to define security roles, to set up multitenancy. For most of the GUI-based actions that are illustrated in this book, the corresponding XIV Storage System command-line interface (XCLI) commands are also shown. IBM Hyper-Scale Manager based GUI information regarding host attachment and replication is covered in IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Storage System: Host Attachment and Interoperability, SG24-8368 and IBM FlashSystem A9000 and A9000R Replication Solutions, REDP-5401.

Elasticsearch 5.x Cookbook - Third Edition

2017-02-06 O'Reilly Amazon

book

Alberto Paro

data data-engineering search elasticsearch Analytics Big Data

Elasticsearch 5.x Cookbook is a comprehensive guide that teaches you how to leverage the full power of Elasticsearch for high-performance search and analytics. Through step-by-step recipes, you'll explore deployment, query building, plugin integration, and advanced analytics, ensuring you can manage and scale Elasticsearch like a pro. What this Book will help me do Understand and deploy complex Elasticsearch cluster topologies for optimal performance. Create tailored mappings to gain finer control over data indexing and retrieval. Design and execute advanced queries and analytics using Elasticsearch capabilities. Integrate Elasticsearch with popular programming languages and big data platforms. Monitor and improve Elasticsearch cluster health using the best practices and tools. Author(s) Alberto Paro is a seasoned software engineer and data scientist with extensive experience in distributed systems and search technologies. Having worked on numerous search-related projects, he brings practical, real-world insights to his writing. Alberto is passionate about teaching and simplifying complex concepts, making this book both approachable and expertly detailed. Who is it for? This book is ideal for developers or data engineers seeking to utilize Elasticsearch for advanced search and analytics tasks. If you have some prior knowledge of JSON and programming concepts, particularly Java, you will benefit most from this material. Whether you're looking to integrate Elasticsearch into your systems or to optimize its usage, this book caters to your needs.

Professional Microsoft SQL Server 2016 Reporting Services and Mobile Reports

2017-02-06 O'Reilly Amazon

book

Riccardo Muti , Paul Turley , Christopher Finlan

data data-engineering relational-databases microsoft-sql-server BI Dashboard

Optimize reporting and BI with Microsoft SQL Server 2016 Professional Microsoft SQL Server 2016 Reporting Services and Mobile Reports provides a comprehensive lesson in business intelligence (BI), operational reporting and Reporting Services architecture using a clear, concise tutorial approach. You'll learn effective report solution design based upon many years of experience with successful report solutions. Improve your own reports with advanced, best-practice design, usability, query design, and filtering techniques. Expert guidance provides insight into common report types and explains where each could be made more efficient, while providing step-by step instruction on Microsoft SQL Server 2016. All changes to the 2016 release are covered in detail, including improvements to the Visual Studio Report Designer (SQL Server Data Tools) and Report Builder, Mobile Dashboard Designer, the new Report Portal Interface, HTML-5 Rendering, Power BI integration, Custom Parameters Pane, and more. The Microsoft SQL Server 2016 release will include significant changes. New functionality, new capabilities, re-tooled processes, and changing support require a considerable update to existing knowledge. Whether you're starting from scratch or simply upgrading, this book is an essential guide to report design and business intelligence solutions. Understand BI fundamentals and Reporting Services architecture Learn the ingredients to a successful report design Get up to speed on Microsoft SQL Server 2016 Grasp the purpose behind common designs to optimize your reporting Microsoft SQL Server Reporting Services makes reporting faster, easier, and more powerful than ever in web, desktop and portal solutions. Compatibility with an extensive variety of data sources makes it a go-to solution for organizations across the globe. The 2016 release brings some of the biggest changes in years, and the full depth and breadth of these changes can create a serious snag in your workflow. For a clear tutorial geared toward the working professional, Professional Microsoft SQL Server 2016 Reporting Services and Mobile Reports is the ideal guide for getting up to speed and producing successful reports.

HBase High Performance Cookbook

2017-01-31 O'Reilly Amazon

book

Ruchir Choudhry

data data-engineering nosql-databases Apache HBase Big Data Cloud Computing

"HBase High Performance Cookbook" is your guide to mastering the optimization, scaling, and tuning of HBase systems. Covering everything from configuring HBase clusters to designing scalable table structures and performance tuning, this comprehensive book provides practical advice and strategies for leveraging HBase's full potential. By following this book's recipes, you'll supercharge your HBase expertise. What this Book will help me do Understand how to configure HBase for optimal performance, improving your data system's efficiency. Learn to design table structures to maximize scalability and functionality in HBase. Gain skills in performing CRUD operations and using advanced features like MapReduce within HBase. Discover practices for integrating HBase with other technologies such as ElasticSearch. Master the steps involved in setting up and optimizing HBase in cloud environments for enhanced performance. Author(s) Ruchir Choudhry is a seasoned data management professional with extensive experience in distributed database systems. He possesses deep expertise in HBase, Hadoop, and other big data technologies. His practical and engaging writing style aims to demystify complex technical topics, making them accessible to developers and architects alike. Who is it for? This book is tailored for developers and system architects looking to deepen their understanding of HBase. Whether you are experienced with other NoSQL databases or are new to HBase, this book provides extensive practical knowledge. Ideal for professionals working in big data applications or those eager to optimize and scale their database systems effectively.

IBM FlashSystem A9000 and IBM FlashSystem A9000R Architecture and Implementation

2017-01-30 O'Reilly Amazon

book

Lisa Martinez , Markus Oscheka , Roman Fridli , Detlef Helmbrecht , Stephen Solewin , Andrew Greenfield , Bert Dufrasne , Roger Eriksson , Jana Jamsek , Bruce Spell

data data-engineering IBM

This IBM® Redbooks® publication presents the architecture, design, concepts, and technology that are used in IBM FlashSystem® A9000 and IBM FlashSystem A9000R. FlashSystem A9000 and FlashSystem A9000R deliver the microsecond latency and high availability of IBM FlashCore® technology with grid architecture, simple scalability, and industry-leading IBM software that is designed to drive your business into the cognitive era. Comprehensive data reduction capabilities, including inline deduplication and a new compression engine, help lower total cost of ownership, and a highly intuitive user interface simplifies management. FlashSystem A9000 and FlashSystem A9000R transform technology infrastructure into business innovation. From a functional standpoint, FlashSystem A9000 and FlashSystem A9000R take advantage of most of the software-defined storage features that are offered by the IBM Spectrum™ Accelerate software, including multi-tenancy and business continuity functions. This publication is intended for those individuals who need to plan, install, tailor, and configure FlashSystem A9000 and FlashSystem A9000R. For detailed information about configuration, management, and replication functions and their usage, refer to the following publications: , SG24-8376 IBM Spectrum Accelerate Family Storage Configuration and Usage for IBM FlashSystem A9000, IBM Flashsystem A9000R and IBM XIV Gen3 , REDP-5401 IBM FlashSystem A9000 and A9000R Replication Solutions , SG24-8368 IBM FlashSystem A9000, IBM FlashSystem A9000R and IBM XIV Storage System Host Attachment and Interoperability This IBM® Redbooks® publication presents the architecture, design, concepts, and technology that are used in IBM FlashSystem® A9000 and IBM FlashSystem A9000R. FlashSystem A9000 and FlashSystem A9000R deliver the microsecond latency and high availability of IBM FlashCore® technology with grid architecture, simple scalability, and industry-leading IBM software that is designed to drive your business into the cognitive era. Comprehensive data reduction capabilities, including inline deduplication and a new compression engine, help lower total cost of ownership, and a highly intuitive user interface simplifies management. FlashSystem A9000 and FlashSystem A9000R transform technology infrastructure into business innovation. From a functional standpoint, FlashSystem A9000 and FlashSystem A9000R take advantage of most of the software-defined storage features that are offered by the IBM Spectrum™ Accelerate software, including multi-tenancy and business continuity functions. This publication is intended for those individuals who need to plan, install, tailor, and configure FlashSystem A9000 and FlashSystem A9000R. For detailed information about configuration, management, and replication functions and their usage, see the following publications: IBM Spectrum Accelerate Family Storage Configuration and Usage for IBM FlashSystem A9000, IBM Flashsystem A9000R and IBM XIV Gen3, SG24-8376 IBM FlashSystem A9000 and A9000R Replication Solutions, REDP-5401 IBM FlashSystem A9000, IBM FlashSystem A9000R and IBM XIV Storage System Host Attachment and Interoperability, SG24-8368

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

PostgreSQL High Performance Cookbook

Learning Apache Spark 2

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

SQL Server 2016 Developer's Guide

Introduction to Bayesian Estimation and Copula Models of Dependence

Designing Data-Intensive Applications

DS8000 Copy Services

Understanding Metadata

Oracle Database Upgrade and Migration Methods: Including Oracle 12c Release 2

Mastering Elastic Stack

QGIS: Becoming a GIS Power User

Learning PySpark

Oracle Database 12c Release 2 In-Memory: Tips and Techniques for Maximum Performance

Mastering Elasticsearch 5.x - Third Edition

IBM Power Systems L and LC Server Positioning Guide

Big Data Now: 2016 Edition

Geospatial Data and Analysis

Cloud Data Sharing with IBM Spectrum Scale

IBM DS8880 Integrated Copy Services Manager and LDAP Client on the HMC

PostgreSQL High Availability Cookbook - Second Edition

IBM Hyper-Scale Manager for IBM Spectrum Accelerate Family: IBM XIV, IBM FlashSystem A9000 and A9000R, and IBM Spectrum Accelerate

Elasticsearch 5.x Cookbook - Third Edition

Professional Microsoft SQL Server 2016 Reporting Services and Mobile Reports

HBase High Performance Cookbook

IBM FlashSystem A9000 and IBM FlashSystem A9000R Architecture and Implementation