Piyush Chaudhary

Activities

3

talks

author

Frequent Collaborators

Sandeep R Patil 2

Filter by Event / Source

O'Reilly Data Engineering Books 3

Talks & appearances

3 activities · Newest first

Search activities →

Highly Efficient Data Access with RoCE on IBM Elastic Storage Systems and IBM Spectrum Scale

2022-02-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Piyush Chaudhary , Gero Schmidt , Olaf Weiser

data data-engineering IBM ELK

With Remote Direct Memory Access (RDMA), you can make a subset of a host's memory directly available to a remote host. RDMA is available on standard Ethernet-based networks by using the RDMA over Converged Ethernet (RoCE) interface. The RoCE network protocol is an industry-standard initiative by the InfiniBand Trade Association. This IBM® Redpaper publication describes how to set up RoCE to use within an IBM Spectrum® Scale cluster and IBM Elastic Storage® Systems (ESSs). This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective storage solutions with IBM Spectrum Scale and IBM ESSs.

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution

2018-06-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Muthu Muthiah , Piyush Chaudhary , Larry Coyne , Yong ZY Zheng , Sandeep R Patil , Wei Gong , Pallavi Galgali

data data-engineering IBM Analytics Data Lake Hadoop

This IBM® Redpaper™ publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum™ Scale and Hortonworks Data Platform for performing in-place Hadoop or Spark-based analytics. It covers the benefits of the integrated solution, and gives guidance about the types of deployment models and considerations during the implementation of these models. Hortonworks Data Platform (HDP) is a leading Hadoop and Spark distribution. HDP addresses the complete needs of data-at-rest, powers real-time customer applications, and delivers robust analytics that accelerate decision making and innovation. IBM Spectrum Scale™ is flexible and scalable software-defined file storage for analytics workloads. Enterprises around the globe have deployed IBM Spectrum Scale to form large data lakes and content repositories to perform high-performance computing (HPC) and analytics workloads. It can scale performance and capacity both without bottlenecks.

IBM Spectrum Scale Best Practices for Genomics Medicine Workloads

2018-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Monica Lemay , Kumaran Rajaram , Kevin Gildea , Piyush Chaudhary , Sandeep R Patil , Ulf Troppens , Joanna Wong , Luis Bolinches

data data-engineering IBM ibm-tivoli Analytics Data Management

Advancing the science of medicine by targeting a disease more precisely with treatment specific to each patient relies on access to that patient's genomics information and the ability to process massive amounts of genomics data quickly. Although genomics data is becoming a critical source for precision medicine, it is expected to create an expanding data ecosystem. Therefore, hospitals, genome centers, medical research centers, and other clinical institutes need to explore new methods of storing, accessing, securing, managing, sharing, and analyzing significant amounts of data. Healthcare and life sciences organizations that are running data-intensive genomics workloads on an IT infrastructure that lacks scalability, flexibility, performance, management, and cognitive capabilities also need to modernize and transform their infrastructure to support current and future requirements. IBM® offers an integrated solution for genomics that is based on composable infrastructure. This solution enables administrators to build an IT environment in a way that disaggregates the underlying compute, storage, and network resources. Such a composable building block based solution for genomics addresses the most complex data management aspect and allows organizations to store, access, manage, and share huge volumes of genome sequencing data. IBM Spectrum™ Scale is software-defined storage that is used to manage storage and provide massive scale, a global namespace, and high-performance data access with many enterprise features. IBM Spectrum Scale™ is used in clustered environments, provides unified access to data via file protocols (POSIX, NFS, and SMB) and object protocols (Swift and S3), and supports analytic workloads via HDFS connectors. Deploying IBM Spectrum Scale and IBM Elastic Storage™ Server (IBM ESS) as a composable storage building block in a Genomics Next Generation Sequencing deployment offers key benefits of performance, scalability, analytics, and collaboration via multiple protocols. This IBM Redpaper™ publication describes a composable solution with detailed architecture definitions for storage, compute, and networking services for genomics next generation sequencing that enable solution architects to benefit from tried-and-tested deployments, to quickly plan and design an end-to-end infrastructure deployment. The preferred practices and fully tested recommendations described in this paper are derived from running GATK Best Practices work flow from the Broad Institute. The scenarios provide all that is required, including ready-to-use configuration and tuning templates for the different building blocks (compute, network, and storage), that can enable simpler deployment and that can enlarge the level of assurance over the performance for genomics workloads. The solution is designed to be elastic in nature, and the disaggregation of the building blocks allows IT administrators to easily and optimally configure the solution with maximum flexibility. The intended audience for this paper is technical decision makers, IT architects, deployment engineers, and administrators who are working in the healthcare domain and who are working on genomics-based workloads.