O'Reilly Data Engineering Books

Mastering The Faster Web with PHP, MySQL, and JavaScript

2018-06-11 O'Reilly Amazon

book

Andrew Caya

data data-engineering relational-databases MySQL JavaScript SQL

Explore cutting-edge web optimization techniques in 'Mastering The Faster Web with PHP, MySQL, and JavaScript'. This comprehensive guide equips developers with the tools and knowledge to create lightning-fast web applications using modern technologies, including PHP 7, asynchronous programming, advanced SQL, and efficient JavaScript. What this Book will help me do Efficiently use profiling and benchmarking tools to identify performance bottlenecks. Optimize PHP 7 applications through efficient data structures and logical improvements. Enhance database performance by identifying and solving inefficient SQL queries. Incorporate modern asynchronous programming and functional programming techniques into your workflow. Integrate seamless UI designs that prioritize application responsiveness and user experience. Author(s) None Caya is a seasoned web developer with extensive experience in PHP, MySQL, and JavaScript. Through their career, they have delved deep into profiling, optimization techniques, and modern web technologies to deliver high-performance web solutions. This book reflects their commitment to providing actionable insights and practical advice to fellow developers. Who is it for? Ideal readers of this book are PHP developers with foundational knowledge in programming and web technologies who aspire to build and optimize modern web applications. Experience in JavaScript is not required, as the book covers essential aspects needed for performance enhancements. If you're aiming to hone your skills in creating faster web solutions, this book suits your goals perfectly.

Microsoft SQL Server 2017 on Linux

2018-06-08 O'Reilly Amazon

book

Benjamin Nevarez

data data-engineering relational-databases microsoft-sql-server Docker Linux

Essential Microsoft® SQL Server® 2017 installation, configuration, and management techniques for Linux Foreword by Kalen Delaney, Microsoft SQL Server MVP This comprehensive guide shows, step-by-step, how to set up, configure, and administer SQL Server 2017 on Linux for high performance and high availability. Written by a SQL Server expert and respected author, Microsoft SQL Server 2017 on Linux teaches valuable Linux skills to Windows-based SQL Server professionals. You will get clear coverage of both Linux and SQL Server and complete explanations of the latest features, tools, and techniques. The book offers clear instruction on adaptive query processing, automatic tuning, disaster recovery, security, and much more. •Understand how SQL Server 2017 on Linux works •Install and configure SQL Server on Linux •Run SQL Server on Docker containers •Learn Linux Administration •Troubleshoot and tune query performance in SQL Server •Learn what is new in SQL Server 2017 •Work with adaptive query processing and automatic tuning techniques •Implement high availability and disaster recovery for SQL Server on Linux •Learn the security features available in SQL Server

MySQL and JSON: A Practical Programming Guide

2018-06-08 O'Reilly Amazon

book

David Stokes

data data-engineering storage-formats JSON JavaScript MySQL

Practical instruction on using JavaScript Object Notation (JSON) with MySQL This hands-on guide teaches, step by step, how to use JavaScript Object Notation (JSON) with MySQL. Written by a MySQL Community Manager for Oracle , MySQL and JSON: A Practical Programming Guide shows how to quickly get started using JSON with MySQL and clearly explains the latest tools and functions. All content is based on the author’s years of interaction with MySQL professionals. Throughout, real-world examples and sample code guide you through the syntax and application of each method. You will get in-depth coverage of programming with the MySQL Document Store. •See how JavaScript Object Notation (JSON) works with MySQL •Use JSON as string data and JSON as a data type •Find the path, load data, and handle searches with REGEX •Work with JSON and non-JSON output •Build virtual generated columns and stored generated columns •Generate complex geometries using GeoJSON •Convert and manage data with JSON functions •Access JSON data, collections, and tables through MySQL Document Store

IBM z14 Model ZR1 Technical Guide

2018-06-06 O'Reilly Amazon

book

Hervey Kamga Octavian Lascu Frank Packheiser, Martijn Raave, John Troy, Bill White

data data-engineering IBM Analytics Cloud Computing Cyber Security

Abstract This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™ Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, in an industry standard footprint. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 ZR1 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 ZR1 servers to deliver a record level of capacity over the previous IBM Z platforms. In its maximum configuration, z14 ZR1 is powered by up to 30 client characterizable microprocessors (cores) running at 4.5 GHz. This configuration can run more than 29,000 million instructions per second and up to 8 TB of client memory. The IBM z14 Model ZR1 is estimated to provide up to 54% more total system capacity than the IBM z13s® Model N20. This Redbooks publication provides information about IBM z14 ZR1 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with IBM Z technology and terminology.

Data Analytics with Spark Using Python, First edition

2018-06-04 O'Reilly Amazon

book

Jeffrey Aven

data data-engineering apache-spark AI/ML Analytics Cloud Computing

Spark for Data Professionals introduces and solidifies the concepts behind Spark 2.x, teaching working developers, architects, and data professionals exactly how to build practical Spark solutions. Jeffrey Aven covers all aspects of Spark development, including basic programming to SparkSQL, SparkR, Spark Streaming, Messaging, NoSQL and Hadoop integration. Each chapter presents practical exercises deploying Spark to your local or cloud environment, plus programming exercises for building real applications. Unlike other Spark guides, Spark for Data Professionals explains crucial concepts step-by-step, assuming no extensive background as an open source developer. It provides a complete foundation for quickly progressing to more advanced data science and machine learning topics. This guide will help you: Understand Spark basics that will make you a better programmer and cluster “citizen” Master Spark programming techniques that maximize your productivity Choose the right approach for each problem Make the most of built-in platform constructs, including broadcast variables, accumulators, effective partitioning, caching, and checkpointing Leverage powerful tools for managing streaming, structured, semi-structured, and unstructured data

Decarbonizing Logistics

2018-06-03 O'Reilly Amazon

book

Prof. Alan McKinnon

data data-engineering log-data

Learn how to cut logistics-related carbon emissions with this essential guide based on cutting edge research

Big Data Analytics with Hadoop 3

2018-05-31 O'Reilly Amazon

book

Sridhar Alla

data data-engineering Hadoop Analytics Flink AWS

Big Data Analytics with Hadoop 3 is your comprehensive guide to understanding and leveraging the power of Apache Hadoop for large-scale data processing and analytics. Through practical examples, it introduces the tools and techniques necessary to integrate Hadoop with other popular frameworks, enabling efficient data handling, processing, and visualization. What this Book will help me do Understand the foundational components and features of Apache Hadoop 3 such as HDFS, YARN, and MapReduce. Gain the ability to integrate Hadoop with programming languages like Python and R for data analysis. Learn the skills to utilize tools such as Apache Spark and Apache Flink for real-time data analytics within the Hadoop ecosystem. Develop expertise in setting up a Hadoop cluster and performing analytics in cloud environments such as AWS. Master the process of building practical big data analytics pipelines for end-to-end data processing. Author(s) Sridhar Alla is a seasoned big data professional with extensive industry experience in building and deploying scalable big data analytics solutions. Known for his expertise in Hadoop and related ecosystems, Sridhar combines technical depth with clear communication in his writing, providing practical insights and hands-on knowledge. Who is it for? This book is tailored for data professionals, software engineers, and data scientists looking to expand their expertise in big data analytics using Hadoop 3. Whether you're an experienced developer or new to the big data ecosystem, this book provides the step-by-step guidance and practical examples needed to advance your skills and achieve your analytical goals.

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering

2018-05-31 O'Reilly Amazon

book

Rob Basham , Amey Gokhale , Jinesh Shah , Anbazhagan Mani , Kedar Karmarkar , Nikhil Khandelwal , Larry Coyne , Sandeep R Patil , Donald Mathisen , Arend Dittmer

data data-engineering storage-repositories cloud-storage Cloud Computing Cloud Storage

This IBM® Redbooks® publication provides information to help you with the sizing, configuration, and monitoring of hybrid cloud solutions using the transparent cloud tiering (TCT) functionality of IBM Spectrum™ Scale. IBM Spectrum Scale™ is a scalable data, file, and object management solution that provides a global namespace for large data sets and several enterprise features. The IBM Spectrum Scale feature called transparent cloud tiering allows cloud object storage providers, such as IBM Cloud™ Object Storage, IBM Cloud, and Amazon S3, to be used as a storage tier for IBM Spectrum Scale. Transparent cloud tiering can help cut storage capital and operating costs by moving data that does not require local performance to an on-premise or off-premise cloud object storage provider. Transparent cloud tiering reduces the complexity of cloud object storage by making data transfers transparent to the user or application. This capability can help you adapt to a hybrid cloud deployment model where active data remains directly accessible to your applications and inactive data is placed in the correct cloud (private or public) automatically through IBM Spectrum Scale policies. This publication is intended for IT architects, IT administrators, storage administrators, and those wanting to learn more about sizing, configuration, and monitoring of hybrid cloud solutions using IBM Spectrum Scale and transparent cloud tiering.

Hands-On Data Warehousing with Azure Data Factory

2018-05-31 O'Reilly Amazon

book

Christian Cote , Giuseppe Ciaburro , Michelle Gutzait

data data-engineering storage-repositories data-warehouse AI/ML Analytics

Dive into the world of ETL (Extract, Transform, Load) with 'Hands-On Data Warehousing with Azure Data Factory'. This book guides readers through the essential techniques for working with Azure Data Factory and SQL Server Integration Services to design, implement, and optimize ETL solutions for both on-premises and cloud data environments. What this Book will help me do Understand and utilize Azure Data Factory and SQL Server Integration Services to build ETL solutions. Design scalable and high-performance ETL architectures tailored to modern data problems. Integrate various Azure services, such as Azure Data Lake Analytics, Machine Learning, and Databricks Spark, into your workflows. Troubleshoot and optimize ETL pipelines and address common challenges in data processing. Create insightful Power BI dashboards to visualize and interact with data from your ETL workflows. Author(s) Authors None Cote, Michelle Gutzait, and Giuseppe Ciaburro bring a wealth of experience in data engineering and cloud technologies to this practical guide. Combining expertise in Azure ecosystem and hands-on Data Warehousing, they deliver actionable insights for working professionals. Who is it for? This book is crafted for software professionals working in data engineering, especially those specializing in ETL processes. Readers with a foundational knowledge of SQL Server and cloud infrastructures will benefit most. If you aspire to implement state-of-the-art ETL pipelines or enhance existing workflows with ADF and SSIS, this book is an ideal resource.

Learning PHP, MySQL & JavaScript, 5th Edition

2018-05-29 O'Reilly Amazon

book

Robin Nixon

data data-engineering relational-databases MySQL HTML JavaScript

Build interactive, data-driven websites with the potent combination of open source technologies and web standards, even if you have only basic HTML knowledge. In this update to this popular hands-on guide, you’ll tackle dynamic web programming with the latest versions of today’s core technologies: PHP, MySQL, JavaScript, CSS, HTML5, and key jQuery libraries. Web designers will learn how to use these technologies together and pick up valuable web programming practices along the way—including how to optimize websites for mobile devices. At the end of the book, you’ll put everything together to build a fully functional social networking site suitable for both desktop and mobile browsers. Explore MySQL, from database structure to complex queries Use the MySQLi extension, PHP’s improved MySQL interface Create dynamic PHP web pages that tailor themselves to the user Manage cookies and sessions and maintain a high level of security Enhance the JavaScript language with jQuery and jQuery mobile libraries Use Ajax calls for background browser-server communication Style your web pages by acquiring CSS2 and CSS3 skills Implement HTML5 features, including geolocation, audio, video, and the canvas element Reformat your websites into mobile web apps

IBM Storage Networking SAN24B-6 Switch

2018-05-23 O'Reilly Amazon

book

Ernest A. Keenan

data data-engineering IBM

This IBM® Redbooks® product guide describes the IBM Storage Networking SAN24B-6 switch. Explosive data growth, coupled with user expectations of unlimited access from anywhere, at any time, is pushing storage environments to the limit. To meet these dynamic business demands, the network must evolve to improve speed, increase efficiency, and reduce costs. Legacy infrastructures were not designed to support the performance requirements of flash-based storage technology. A new approach to storage networking is required to unlock the full capabilities of all-flash arrays. By treating the network as a strategic part of a storage environment, organizations can maximize their productivity and efficiency, even as they rapidly grow their environments. The IBM Storage Networking SAN24B-6 switch provides exceptional value in an entry-level switch, combining high-performance capabilities of 4, 8, 16, and 32 Gbps, point-and-click simplicity, and enterprise-class functionality. The port speed capability is dependent on the transceiver installed. SAN24B-6 provides small to midsized data centers with low-cost access to industry-leading Gen 5 and Gen 6 Fibre Channel technology and the ability to start small and grow on demand from 8 to 24 ports to support an evolving storage environment. In addition, SAN24B-6 is easy to use and install, with a point-and-click user interface that simplifies deployment and saves time.

Storwize HyperSwap with IBM i

2018-05-23 O'Reilly Amazon

book

Jon Tate , Falk Schneider , Jana Jamsek

data data-engineering IBM

IBM® Storwize® HyperSwap® is a response to increasing demand for continuous application availability, minimizing downtime in the event of an outage, and non disruptive migrations. IT centers with IBM i can take full advantage of the HyperSwap solution. In this IBM Redpaper™ publication, we provide instructions to implement Storwize HyperSwap with IBM i. We also describe some business continuity scenarios in this area, including solutions with HyperSwap and IBM i Live Partition Mobility, and a solution with HyperSwap and IBM PowerHA® for IBM i.

PostgreSQL 10 Administration Cookbook - Fourth Edition

2018-05-18 O'Reilly Amazon

book

Simon Riggs , Gianni Ciolli

data data-engineering relational-databases postgresql

This book offers an extensive collection of practical recipes for administering PostgreSQL 10, covering everything from configuring servers to optimizing performance. By working through these structured solutions, you will develop the skills necessary to manage PostgreSQL databases effectively, making your systems reliable and responsive. What this Book will help me do Implement and leverage the latest PostgreSQL 10 features for better databases. Master techniques for performance tuning and optimization in PostgreSQL. Develop strategies for comprehensive backup and recovery processes. Learn best practices for ensuring replication and high availability. Understand how to diagnose and resolve common PostgreSQL challenges effectively. Author(s) The authors of this book are experienced database professionals with deep knowledge of PostgreSQL. They bring their practical insights and expertise to help administrators and developers achieve the most out of PostgreSQL. They are dedicated to making complex topics approachable and relevant. Who is it for? This book is for current or aspiring database administrators and developers who work with PostgreSQL. It suits those who are familiar with databases and want to gain practical skills in PostgreSQL administration. It is ideal for individuals aiming to improve performance and reliability of their PostgreSQL systems.

IBM Real-time Compression in IBM SAN Volume Controller and IBM Storwize V7000

2018-05-16 O'Reilly Amazon

book

Jon Tate , Christian Burns , Jorge Quintal , Bosmat Tuv-El

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller

IBM® Real-time Compression™ software that is embedded in IBM SAN Volume Controller (SVC) and IBM Storwize® V7000 solution addresses all the requirements of primary storage data reduction, including performance, by using a purpose-built technology called . This IBM Redpaper™ publication addresses the key requirements for primary storage data reduction and gives real world examples of savings that can be made by using compression. SVC and Storwize V7000 is designed to improve storage efficiency by compressing data by as much as 80% through supported real-time compression for block storage. This process enables up to five times as much data to be stored in the same physical disk space. Unlike other approaches to compression, IBM Real-time Compression is used with active primary data, such as production databases and email systems. This configuration dramatically expands the range of candidate data that can benefit from compression. As its name implies, IBM Real-time Compression operates as data is written to disk, avoiding the need to store data that is awaiting compression.

Designing Event-Driven Systems

2018-05-15 O'Reilly Amazon

book

Ben Stopford

data data-engineering streaming-messaging Kafka API DevOps

Many forces affect software today: larger datasets, geographical disparities, complex company structures, and the growing need to be fast and nimble in the face of change. Proven approaches such as service-oriented and event-driven architectures are joined by newer techniques such as microservices, reactive architectures, DevOps, and stream processing. Many of these patterns are successful by themselves, but as this practical ebook demonstrates, they provide a more holistic and compelling approach when applied together. Author Ben Stopford explains how service-based architectures and stream processing tools such as Apache Kafka can help you build business-critical systems. You’ll learn how to apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as "inside out databases" and "event streams as a source of truth." These approaches provide a unique foundation for how these large, autonomous service ecosystems can communicate and share data. Learn why streaming beats request-response based architectures in complex, contemporary use cases Understand why replayable logs such as Kafka provide a backbone for both service communication and shared datasets Explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches Build service ecosystems that blend event-driven and request-driven interfaces using a replayable log and Kafka’s Streams API Scale beyond individual teams into larger, department- and company-sized architectures, using event streams as a source of truth

Data Science Fundamentals for Python and MongoDB

2018-05-10 O'Reilly Amazon

book

David Paper

data data-engineering nosql-databases MongoDB AI/ML Data Science

Build the foundational data science skills necessary to work with and better understand complex data science algorithms. This example-driven book provides complete Python coding examples to complement and clarify data science concepts, and enrich the learning experience. Coding examples include visualizations whenever appropriate. The book is a necessary precursor to applying and implementing machine learning algorithms. The book is self-contained. All of the math, statistics, stochastic, and programming skills required to master the content are covered. In-depth knowledge of object-oriented programming isn’t required because complete examples are provided and explained. Data Science Fundamentals with Python and MongoDB is an excellent starting point for those interested in pursuing a career in data science. Like any science, the fundamentals of data science are a prerequisite to competency. Without proficiency in mathematics, statistics, data manipulation, and coding, the path to success is “rocky” at best. The coding examples in this book are concise, accurate, and complete, and perfectly complement the data science concepts introduced. What You'll Learn Prepare for a career in data science Work with complex data structures in Python Simulate with Monte Carlo and Stochastic algorithms Apply linear algebra using vectors and matrices Utilize complex algorithms such as gradient descent and principal component analysis Wrangle, cleanse, visualize, and problem solve with data Use MongoDB and JSON to work with data Who This Book Is For The novice yearning to break into the data science world, and the enthusiast looking to enrich, deepen, and develop data science skills through mastering the underlying fundamentalsthat are sometimes skipped over in the rush to be productive. Some knowledge of object-oriented programming will make learning easier.

IBM Geographically Dispersed Resilience for SAP HANA and SAP NetWeaver

2018-05-09 O'Reilly Amazon

book

Jes Kiran , Ravi A Shankar , Katharina Probst , Srikanth Thanneeru , Dishant J Doriwala , Marc-Stephan Tauchert

data data-engineering SAP IBM

This IBM® Redpaper™chapter publication explains the configuration, relocation, and verification of the IBM Geographically Dispersed Resiliency on IBM Power Systems™ solution to protect SAP HANA and SAP NetWeaver applications. This is a supplemental guide to IBM Geographically Dispersed Resiliency for IBM Power Systems, SG24-8382, which outlines the specifics when using Geographically Dispersed Resilience for SAP applications, including SAP HANA. Business continuity is a part of business operations. Downtime and disruptions can cause financial losses and impact public relations and trust in your business. Also, governments in many countries require businesses to have disaster recovery (DR) plans demonstrate regularly that the recovery plan tests successfully. IBM Geographically Dispersed Resiliency for IBM Power Systems is a DR solution that covers servers but can include business applications. In particular, this solution provides features to support the high availability (HA) of logical partitions (LPARs) running SAP HANA and SAP NetWeaver applications. IBM Geographically Dispersed Resiliency enables simplified DR management for IBM Power Systems servers. In fewer than 10 steps, administrators can deploy and configure the solution. This is the only solution on IBM Power Systems that offers nondisruptive DR testing.

IBM TS7700 Release 4.1 and 4.1.2 Guide

2018-05-04 O'Reilly Amazon

book

Aderson Pacini , Alberto Barajas Ortiz , Chen Zhu , Larry Coyne , Michael Scott , Derek Erdmann , Joe Hew , Katja Denefleh , Takahiro Tsuda

data data-engineering IBM

Abstract This IBM® Redbooks® publication covers IBM TS7700 R4.1 through R4.1.2. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. This publication explains the all-new hardware that is introduced with IBM TS7700 release R4.1 and the concepts associated with it. TS7700 R4.1 can be installed only on the IBM TS7720, TS7740, and the all-new, hardware-refreshed TS7760 Models. The IBM TS7720T and TS7760T (tape attach) partition mimics the behavior of the previous TS7740, but with higher performance and capacity. The IBM TS7700 offers a modular, scalable, and high-performance architecture for mainframe tape virtualization for the IBM Z® environment. It is a fully integrated, tiered storage hierarchy of disk and tape. This storage hierarchy is managed by robust storage management microcode with extensive self-management capability. It includes the following advanced functions: Policy management to control physical volume pooling Cache management Redundant copies, including across a grid network Copy mode control TS7700 delivers the following new capabilities: 7 and 8 way Grid support through approved request for price quotation 16 Gb FICON adapter support for TS7760 (R4.1.2) Optimized host data compression which is based on software (not FICON adapter hardware) compression algorithm (R4.1.2) Control-unit initiated reconfiguration (CUIR) for code load improvement (R4.1.2)[ Grid Resiliency Improvements (R4.1.2) System Events Redesign (R4.1.2) Remote System Log Processing Support in (R4.1.2) Improvements to reliability, availability, and serviceability The TS7760T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1150 and IBM TS1140 tape drives installed in an IBM TS4500 or TS3500 tape library. The TS7760 models are based on high-performance and redundant IBM POWER8® technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

IBM System Storage SAN Volume Controller and Storwize V7000 Best Practices and Performance Guidelines

2018-05-03 O'Reilly Amazon

book

An Chen Jon Tate Tiago Moreira Candelaria Bastos, Jana Jamsek, Danilo Morelli Miyasiro, Antonio Rainero

data data-engineering IBM

Abstract This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM System Storage® SAN Volume Controller and IBM Storwize® V7000 powered by IBM Spectrum Virtualize™ V8.1. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, remote copy services, and hosts. Then it provides performance guidelines for SAN Volume Controller, back-end storage, and applications. It explains how you can optimize disk performance with the IBM System Storage Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting SAN Volume Controller and Storwize V7000. This book is intended for experienced storage, SAN, and SAN Volume Controller administrators and technicians. Understanding his book requires advanced knowledge of the SAN Volume Controller and Storwize V7000 and SAN environments.

Viewing and Managing FlashSystem Performance with IBM Spectrum Control

2018-05-03 O'Reilly Amazon

book

Marion Hejny , Falk Schneider , Bert Dufrasne

data data-engineering IBM ibm-spectrum-control

This IBM® Redpaper™ publication discusses on performance monitoring for IBM FlashSystem® storage products. The products reviewed are the IBM FlashSystem FS900, the IBM FlashSystem V9000, and the IBM FlashSystem A9000 and A9000R. For each of the FlashSystem devices, the paper reviews performance monitoring options. The first option is to use features available with the storage management software specific to the respective devices. The other option, which is the focus of this paper, is to use the IBM Spectrum™ Control solution. Using IBM Spectrum Control™ offers the advantage of having a common tool and unique interface to monitor most devices in your storage infrastructure. This paper explains how to take advantage of the many monitoring features and reporting options offered by IBM Spectrum Control. The paper also gives some guidance on how to set appropriate monitoring thresholds and alerts according to your environment.

PostgreSQL 10 High Performance - Third Edition

2018-04-30 O'Reilly Amazon

book

Enrico Pirozzi

data data-engineering relational-databases postgresql SQL

PostgreSQL 10 High Performance provides you with all the tools to maximize the efficiency and reliability of your PostgreSQL 10 database. Written for database admins and architects, this book offers deep insights into optimizing queries, configuring hardware, and managing complex setups. By integrating these best practices, you'll ensure scalability and stability in your systems. What this Book will help me do Optimize PostgreSQL 10 queries for improved performance and efficiency. Implement database monitoring systems to identify and resolve issues proactively. Scale your database by implementing partitioning, replication, and caching strategies. Understand PostgreSQL hardware compatibility and configuration for maximum throughput. Learn how to design high-performance solutions tailored for large and demanding applications. Author(s) Enrico Pirozzi is a seasoned database professional with extensive experience in PostgreSQL management and optimization. Having worked on large-scale database infrastructures, Enrico shares his hands-on knowledge and practical advice for achieving high performance with PostgreSQL. His approachable style makes complex topics accessible to every reader. Who is it for? This book is intended for database administrators and system architects who are working with or planning to adopt PostgreSQL 10. Readers should have a foundational knowledge of SQL and some prior exposure to PostgreSQL. If you're aiming to design efficient, scalable database solutions while ensuring high availability, this book is for you.

Networking Design for HPC and AI on IBM Power Systems

2018-04-26 O'Reilly Amazon

book

Scott Vetter , Rico Franke , Yanil Zeledón Miranda , Tobias Elpelt

data data-engineering IBM AI/ML

This publication provides information about networking design for IBM® High Performance Computing (HPC) and AI for Power Systems™. This paper will help you understand the basic requirements when designing a solution, the components in an infrastructure for HPC and AI Systems, the designing of interconnect and data networks with use cases based in real life scenarios, the administration and the Out-Of-Band management networks. We cover all the necessary requirements, provide a good understanding of the technology and include examples for small, medium and large cluster environments. This paper is intended for IT architects, system designers, data center planners, and system administrators who must design or provide a solution for the infrastructure of a HPC cluster.

IBM Spectrum Scale Best Practices for Genomics Medicine Workloads

2018-04-25 O'Reilly Amazon

book

Monica Lemay , Kumaran Rajaram , Kevin Gildea , Piyush Chaudhary , Sandeep R Patil , Ulf Troppens , Joanna Wong , Luis Bolinches

data data-engineering IBM ibm-tivoli Analytics Data Management

Advancing the science of medicine by targeting a disease more precisely with treatment specific to each patient relies on access to that patient's genomics information and the ability to process massive amounts of genomics data quickly. Although genomics data is becoming a critical source for precision medicine, it is expected to create an expanding data ecosystem. Therefore, hospitals, genome centers, medical research centers, and other clinical institutes need to explore new methods of storing, accessing, securing, managing, sharing, and analyzing significant amounts of data. Healthcare and life sciences organizations that are running data-intensive genomics workloads on an IT infrastructure that lacks scalability, flexibility, performance, management, and cognitive capabilities also need to modernize and transform their infrastructure to support current and future requirements. IBM® offers an integrated solution for genomics that is based on composable infrastructure. This solution enables administrators to build an IT environment in a way that disaggregates the underlying compute, storage, and network resources. Such a composable building block based solution for genomics addresses the most complex data management aspect and allows organizations to store, access, manage, and share huge volumes of genome sequencing data. IBM Spectrum™ Scale is software-defined storage that is used to manage storage and provide massive scale, a global namespace, and high-performance data access with many enterprise features. IBM Spectrum Scale™ is used in clustered environments, provides unified access to data via file protocols (POSIX, NFS, and SMB) and object protocols (Swift and S3), and supports analytic workloads via HDFS connectors. Deploying IBM Spectrum Scale and IBM Elastic Storage™ Server (IBM ESS) as a composable storage building block in a Genomics Next Generation Sequencing deployment offers key benefits of performance, scalability, analytics, and collaboration via multiple protocols. This IBM Redpaper™ publication describes a composable solution with detailed architecture definitions for storage, compute, and networking services for genomics next generation sequencing that enable solution architects to benefit from tried-and-tested deployments, to quickly plan and design an end-to-end infrastructure deployment. The preferred practices and fully tested recommendations described in this paper are derived from running GATK Best Practices work flow from the Broad Institute. The scenarios provide all that is required, including ready-to-use configuration and tuning templates for the different building blocks (compute, network, and storage), that can enable simpler deployment and that can enlarge the level of assurance over the performance for genomics workloads. The solution is designed to be elastic in nature, and the disaggregation of the building blocks allows IT administrators to easily and optimally configure the solution with maximum flexibility. The intended audience for this paper is technical decision makers, IT architects, deployment engineers, and administrators who are working in the healthcare domain and who are working on genomics-based workloads.

IBM Spectrum Scale Functionality to Support GDPR Requirements

2018-04-24 O'Reilly Amazon

book

Clodoaldo Barrera , Nils Haustein , Carl Zetie , Sandeep R Patil , Felipe Knop

data data-engineering IBM GDPR/CCPA

The role of the IT solutions is to enforce the correct handling of personal data using processes developed by the establishment. Each element of the solution stack must address the objectives as appropriate to the data that it handles. Typically, personal data exists either in the form of structured data (like databases) or unstructured data (like files, text, documents, and so on.). This IBM Redbooks publication specifically deals with unstructured data and storage systems used to host unstructured data. For unstructured data storage in particular, some key attributes enable the overall solution to support compliance with the EU General Data Protection Regulation (GDPR). Because personal data subject to GDPR is commonly stored in an unstructured data format, a scale out file system like IBM Spectrum Scale provides essential functions to support GDPR requirements. This paper highlights some of the key compliance requirements and explains how IBM Spectrum Scale helps to address them.

JavaScript and JSON Essentials - Second Edition

2018-04-23 O'Reilly Amazon

book

Sai S Sriparasa , Bruno Joseph D'mello

data data-engineering storage-formats JSON JavaScript Kafka

Dive into "JavaScript and JSON Essentials" to discover how JSON works as a cornerstone in modern web development. Through hands-on examples and practical guidance, this book equips you with the knowledge to effectively use JSON with JavaScript for creating responsive, scalable, and capable web applications. What this Book will help me do Master JSON structures and utilize them in web development workflows. Integrate JSON data within Angular, Node.js, and other popular frameworks. Implement real-time JSON features using tools like Kafka and Socket.io. Understand BSON, GeoJSON, and JSON-LD formats for specialized applications. Develop efficient JSON handling for distributed and scalable systems. Author(s) None Joseph D'mello and Sai S Sriparasa are seasoned software developers and educators with extensive experience in JavaScript. Their expertise in web application development and JSON usage shines through in this book. They take a clear and engaging approach, ensuring that complex concepts are demystified and actionable. Who is it for? This book is best suited for web developers familiar with JavaScript who want to enhance their abilities to use JSON for building fast, data-driven web applications. Whether you're looking to strengthen your backend skills or learn tools like Angular and Kafka in conjunction with JSON, this book is made for you.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Mastering The Faster Web with PHP, MySQL, and JavaScript

Microsoft SQL Server 2017 on Linux

MySQL and JSON: A Practical Programming Guide

IBM z14 Model ZR1 Technical Guide

Data Analytics with Spark Using Python, First edition

Decarbonizing Logistics

Big Data Analytics with Hadoop 3

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering

Hands-On Data Warehousing with Azure Data Factory

Learning PHP, MySQL & JavaScript, 5th Edition

IBM Storage Networking SAN24B-6 Switch

Storwize HyperSwap with IBM i

PostgreSQL 10 Administration Cookbook - Fourth Edition

IBM Real-time Compression in IBM SAN Volume Controller and IBM Storwize V7000

Designing Event-Driven Systems

Data Science Fundamentals for Python and MongoDB

IBM Geographically Dispersed Resilience for SAP HANA and SAP NetWeaver

IBM TS7700 Release 4.1 and 4.1.2 Guide

IBM System Storage SAN Volume Controller and Storwize V7000 Best Practices and Performance Guidelines

Viewing and Managing FlashSystem Performance with IBM Spectrum Control

PostgreSQL 10 High Performance - Third Edition

Networking Design for HPC and AI on IBM Power Systems

IBM Spectrum Scale Best Practices for Genomics Medicine Workloads

IBM Spectrum Scale Functionality to Support GDPR Requirements

JavaScript and JSON Essentials - Second Edition