Cloud Computing

PySpark Cookbook

2018-06-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Tomasz Drabas

AI/ML Analytics Big Data PySpark Python Spark Data Streaming apache-spark data data-engineering

Dive into the world of big data processing and analytics with the "PySpark Cookbook". This book provides over 60 hands-on recipes for implementing efficient data-intensive solutions using Apache Spark and Python. By mastering these recipes, you'll be equipped to tackle challenges in large-scale data processing, machine learning, and stream analytics. What this Book will help me do Set up and configure PySpark environments effectively, including working with Jupyter for enhanced interactivity. Understand and utilize DataFrames for data manipulation, analysis, and transformation tasks. Develop end-to-end machine learning solutions using the ML and MLlib modules in PySpark. Implement structured streaming and graph-processing solutions to analyze and visualize data streams and relationships. Deploy PySpark applications to the cloud infrastructure efficiently using best practices. Author(s) This book is co-authored by None Lee and None Drabas, who are experienced professionals in data processing and analytics leveraging Python and Apache Spark. With their deep technical expertise and a passion for teaching through practical examples, they aim to make the complex concepts of PySpark accessible to developers of varied experience levels. Who is it for? This book is ideal for Python developers who are keen to delve into the Apache Spark ecosystem. Whether you're just starting with big data or have some experience with Spark, this book provides practical recipes to enhance your skills. Readers looking to solve real-world data-intensive challenges using PySpark will find this resource invaluable.

Streaming Change Data Capture

2018-06-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kevin Petrie (Eckerson Group) , Dan Potter , Itamar Ankorion

Analytics Data Lake DWH Hadoop IoT Marketing Data Streaming data data-engineering data-lake storage-repositories

There are many benefits to becoming a data-driven organization, including the ability to accelerate and improve business decision accuracy through the real-time processing of transactions, social media streams, and IoT data. But those benefits require significant changes to your infrastructure. You need flexible architectures that can copy data to analytics platforms at near-zero latency while maintaining 100% production uptime. Fortunately, a solution already exists. This ebook demonstrates how change data capture (CDC) can meet the scalability, efficiency, real-time, and zero-impact requirements of modern data architectures. Kevin Petrie, Itamar Ankorion, and Dan Potter—technology marketing leaders at Attunity—explain how CDC enables faster and more accurate decisions based on current data and reduces or eliminates full reloads that disrupt production and efficiency. The book examines: How CDC evolved from a niche feature of database replication software to a critical data architecture building block Architectures where data workflow and analysis take place, and their integration points with CDC How CDC identifies and captures source data updates to assist high-speed replication to one or more targets Case studies on cloud-based streaming and streaming to a data lake and related architectures Guiding principles for effectively implementing CDC in cloud, data lake, and streaming environments The Attunity Replicate platform for efficiently loading data across all major database, data warehouse, cloud, streaming, and Hadoop platforms

Big Data Architect???s Handbook

2018-06-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Syed Muhammad Fahad Akhtar

AI/ML Big Data Hadoop data data-engineering

Big Data Architect's Handbook is your comprehensive guide to mastering the art of building sophisticated big data solutions. As you delve into this book, you'll learn to design end-to-end big data pipelines and integrate data from various sources for insightful analysis. What this Book will help me do Understand the Hadoop ecosystem and familiarize yourself with major Apache projects. Make informed decisions when designing cloud infrastructures for big data needs. Gain expertise in analyzing structured and unstructured data using machine learning. Develop skills to implement scalable and efficient big data pipelines. Enhance your ability to visualize and monitor data insights effectively. Author(s) None Akhtar has amassed a wealth of experience in big data architecture and related technologies. With years of hands-on involvement in development, analysis, and implementation of big data systems, None brings a pragmatic and insightful perspective. This passion for educating others about data-driven technologies shines through in a user-first approach to making complex topics accessible. Who is it for? This book caters to aspiring data professionals, software developers, and tech enthusiasts aiming to enhance their expertise in big data. Readers with basic programming and data analysis skills will find the content approachable yet challenging enough to deepen their understanding. If your career goal involves managing, analyzing, and making decisions based on large datasets, this book will help bridge the gap between skill and application.

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

2018-06-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Butch Quinto

Alteryx Analytics BI Big Data Data Governance DataViz DWH Apache HBase HDFS Kafka MySQL Oracle +7 more

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies. Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard. What You’ll Learn Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing Turbocharge Spark with Alluxio, a distributed in-memory storage platform Deploy big data in the cloud using Cloudera Director Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard Who This Book Is For BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics

Implementing IBM FlashSystem 900 Model AE3

2018-06-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jim Cioffi Detlef Helmbrecht Jon Herd, Jeffrey Irving, Christian Karpp, Volker Kiemes, Carsten Larsen, Adrian Orban

Analytics IBM data data-engineering

Abstract Today’s global organizations depend on being able to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900 Model AE3, powered by IBM FlashCore® technology, they can make faster decisions based on real-time insights and unleash the power of the most demanding applications, including online transaction processing (OLTP) and analytics databases, virtual desktop infrastructures (VDIs), technical computing applications, and cloud environments. This IBM Redbooks® publication introduces clients to the IBM FlashSystem® 900 Model AE3. It provides in-depth knowledge of the product architecture, software and hardware, implementation, and hints and tips. Also illustrated are use cases that show real-world solutions for tiering, flash-only, and preferred-read, and also examples of the benefits gained by integrating the FlashSystem storage into business environments. This book is intended for pre-sales and post-sales technical support professionals and storage administrators, and for anyone who wants to understand how to implement this new and exciting technology.

IBM z14 Model ZR1 Technical Guide

2018-06-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hervey Kamga Octavian Lascu Frank Packheiser, Martijn Raave, John Troy, Bill White

Analytics IBM Cyber Security data data-engineering

Abstract This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™ Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, in an industry standard footprint. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 ZR1 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 ZR1 servers to deliver a record level of capacity over the previous IBM Z platforms. In its maximum configuration, z14 ZR1 is powered by up to 30 client characterizable microprocessors (cores) running at 4.5 GHz. This configuration can run more than 29,000 million instructions per second and up to 8 TB of client memory. The IBM z14 Model ZR1 is estimated to provide up to 54% more total system capacity than the IBM z13s® Model N20. This Redbooks publication provides information about IBM z14 ZR1 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with IBM Z technology and terminology.

Data Analytics with Spark Using Python, First edition

2018-06-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeffrey Aven

AI/ML Analytics Data Analytics Data Science Hadoop NoSQL Python Spark Data Streaming apache-spark data data-engineering

Spark for Data Professionals introduces and solidifies the concepts behind Spark 2.x, teaching working developers, architects, and data professionals exactly how to build practical Spark solutions. Jeffrey Aven covers all aspects of Spark development, including basic programming to SparkSQL, SparkR, Spark Streaming, Messaging, NoSQL and Hadoop integration. Each chapter presents practical exercises deploying Spark to your local or cloud environment, plus programming exercises for building real applications. Unlike other Spark guides, Spark for Data Professionals explains crucial concepts step-by-step, assuming no extensive background as an open source developer. It provides a complete foundation for quickly progressing to more advanced data science and machine learning topics. This guide will help you: Understand Spark basics that will make you a better programmer and cluster “citizen” Master Spark programming techniques that maximize your productivity Choose the right approach for each problem Make the most of built-in platform constructs, including broadcast variables, accumulators, effective partitioning, caching, and checkpointing Leverage powerful tools for managing streaming, structured, semi-structured, and unstructured data

Big Data Analytics with Hadoop 3

2018-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sridhar Alla

Analytics Flink AWS Big Data Data Analytics Hadoop HDFS Python Spark data data-engineering

Big Data Analytics with Hadoop 3 is your comprehensive guide to understanding and leveraging the power of Apache Hadoop for large-scale data processing and analytics. Through practical examples, it introduces the tools and techniques necessary to integrate Hadoop with other popular frameworks, enabling efficient data handling, processing, and visualization. What this Book will help me do Understand the foundational components and features of Apache Hadoop 3 such as HDFS, YARN, and MapReduce. Gain the ability to integrate Hadoop with programming languages like Python and R for data analysis. Learn the skills to utilize tools such as Apache Spark and Apache Flink for real-time data analytics within the Hadoop ecosystem. Develop expertise in setting up a Hadoop cluster and performing analytics in cloud environments such as AWS. Master the process of building practical big data analytics pipelines for end-to-end data processing. Author(s) Sridhar Alla is a seasoned big data professional with extensive industry experience in building and deploying scalable big data analytics solutions. Known for his expertise in Hadoop and related ecosystems, Sridhar combines technical depth with clear communication in his writing, providing practical insights and hands-on knowledge. Who is it for? This book is tailored for data professionals, software engineers, and data scientists looking to expand their expertise in big data analytics using Hadoop 3. Whether you're an experienced developer or new to the big data ecosystem, this book provides the step-by-step guidance and practical examples needed to advance your skills and achieve your analytical goals.

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering

2018-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rob Basham , Amey Gokhale , Jinesh Shah , Anbazhagan Mani , Kedar Karmarkar , Nikhil Khandelwal , Larry Coyne , Sandeep R Patil , Donald Mathisen , Arend Dittmer

Cloud Storage IBM S3 cloud-storage data data-engineering storage-repositories

This IBM® Redbooks® publication provides information to help you with the sizing, configuration, and monitoring of hybrid cloud solutions using the transparent cloud tiering (TCT) functionality of IBM Spectrum™ Scale. IBM Spectrum Scale™ is a scalable data, file, and object management solution that provides a global namespace for large data sets and several enterprise features. The IBM Spectrum Scale feature called transparent cloud tiering allows cloud object storage providers, such as IBM Cloud™ Object Storage, IBM Cloud, and Amazon S3, to be used as a storage tier for IBM Spectrum Scale. Transparent cloud tiering can help cut storage capital and operating costs by moving data that does not require local performance to an on-premise or off-premise cloud object storage provider. Transparent cloud tiering reduces the complexity of cloud object storage by making data transfers transparent to the user or application. This capability can help you adapt to a hybrid cloud deployment model where active data remains directly accessible to your applications and inactive data is placed in the correct cloud (private or public) automatically through IBM Spectrum Scale policies. This publication is intended for IT architects, IT administrators, storage administrators, and those wanting to learn more about sizing, configuration, and monitoring of hybrid cloud solutions using IBM Spectrum Scale and transparent cloud tiering.

Hands-On Data Warehousing with Azure Data Factory

2018-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Cote , Giuseppe Ciaburro , Michelle Gutzait

AI/ML Analytics Azure ADF BI Data Engineering Data Lake Databricks DWH ETL/ELT Power BI Spark +6 more

Dive into the world of ETL (Extract, Transform, Load) with 'Hands-On Data Warehousing with Azure Data Factory'. This book guides readers through the essential techniques for working with Azure Data Factory and SQL Server Integration Services to design, implement, and optimize ETL solutions for both on-premises and cloud data environments. What this Book will help me do Understand and utilize Azure Data Factory and SQL Server Integration Services to build ETL solutions. Design scalable and high-performance ETL architectures tailored to modern data problems. Integrate various Azure services, such as Azure Data Lake Analytics, Machine Learning, and Databricks Spark, into your workflows. Troubleshoot and optimize ETL pipelines and address common challenges in data processing. Create insightful Power BI dashboards to visualize and interact with data from your ETL workflows. Author(s) Authors None Cote, Michelle Gutzait, and Giuseppe Ciaburro bring a wealth of experience in data engineering and cloud technologies to this practical guide. Combining expertise in Azure ecosystem and hands-on Data Warehousing, they deliver actionable insights for working professionals. Who is it for? This book is crafted for software professionals working in data engineering, especially those specializing in ETL processes. Readers with a foundational knowledge of SQL Server and cloud infrastructures will benefit most. If you aspire to implement state-of-the-art ETL pipelines or enhance existing workflows with ADF and SSIS, this book is an ideal resource.

Implementing IBM FlashSystem V9000 AE3

2018-04-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Karpp , Jon Herd , James Cioffi , Detlef Helmbrecht , Carsten Larsen , Jeffrey Irving , Adrian Orban , Volker Kiemes

Analytics Data Management IBM SAS Cyber Security data data-engineering

Abstract The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with several areas that affect your business: Highly virtualized environments Cloud computing Mobile and social systems of engagement In-depth, real-time analytics Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate when they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today's data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V8.1. It describes the core product architecture, software, hardware, and implementation, and provides hints and tips. The underlying basic hardware and software architecture and features of the IBM FlashSystem V9000 AC3 control enclosure and on IBM Spectrum Virtualize 8.1 software are described in these publications: Implementing IBM FlashSystem 900 Model AE3, SG24-8414 Implementing the IBM System Storage SAN Volume Controller V7.4, SG24-7933 Using IBM FlashSystem V9000 software functions, management tools, and interoperability combines the performance of IBM FlashSystem architecture with the advanced functions of software-defined storage to deliver performance, efficiency, and functions that meet the needs of enterprise workloads that demand IBM MicroLatency® response time. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

IBM z14 Model ZR1 Technical Introduction

2018-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Packheiser , John Troy , Bill White , Octavian Lascu , Hervey Kamga , Martijn Raave

Agile/Scrum Analytics IBM data data-engineering

Abstract This IBM® Redbooks® publication introduces the latest member of the IBM Z platform, the IBM z14 Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and provides insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Securing data with pervasive encryption Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Providing resilience towards zero downtime Accelerating digital transformation with agile service delivery Revolutionizing business processes Mixing open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z14 ZR1 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

Modern Big Data Processing with Hadoop

2018-03-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Prashant Shindgikar , Manoj R Patil , V Naresh Kumar

Analytics Big Data ELK Hadoop Spark Superset data data-engineering

Delve into the world of big data with 'Modern Big Data Processing with Hadoop.' This comprehensive guide introduces you to the powerful capabilities of Apache Hadoop and its ecosystem to solve data processing and analytics challenges. By the end, you will have mastered the techniques necessary to architect innovative, scalable, and efficient big data solutions. What this Book will help me do Master the principles of building an enterprise-level big data strategy with Apache Hadoop. Learn to integrate Hadoop with tools such as Apache Spark, Elasticsearch, and more for comprehensive solutions. Set up and manage your big data architecture, including deployment on cloud platforms with Apache Ambari. Develop real-time data pipelines and enterprise search solutions. Leverage advanced visualization tools like Apache Superset to make sense of data insights. Author(s) None R. Patil, None Kumar, and None Shindgikar are experienced big data professionals and accomplished authors. With years of hands-on experience in implementing and managing Apache Hadoop systems, they bring a depth of expertise to their writing. Their dedication lies in making complex technical concepts accessible while demonstrating real-world best practices. Who is it for? This book is designed for data professionals aiming to advance their expertise in big data solutions using Apache Hadoop. Ideal readers include engineers and project managers involved in data architecture and those aspiring to become big data architects. Some prior exposure to big data systems is beneficial to fully benefit from this book's insights and tutorials.

IBM Open Platform for DBaaS on IBM Power Systems

2018-03-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Fabio Martins , Rafael Camarda Silva Folco , Eduardo Luis Cerdas Moya

API IBM Linux data data-engineering

Abstract This IBM Redbooks publication describes how to implement an Open Platform for Database as a Service (DBaaS) on IBM Power Systems environment for Linux, and demonstrate the open source tools, optimization and best practices guidelines for it. Open Platform for DBaaS on Power Systems is an on-demand, secure, and scalable self-service database platform that automates provisioning and administration of databases to support new business applications and information insights. This publication addresses topics to help sellers, architects, brand specialists, distributors, resellers and anyone offering secure and scalable Open Platform for DBaaS on Power Systems solution with APIs that are consistent across heterogeneous open database types. An Open Platform for DBaaS on Power Systems solution has the capability to accelerate business success by providing an infrastructure, and tools leveraging Open Source and OpenStack software engineered to optimize hardware and software between workloads and resources so you have a responsive, and an adaptive environment. Moreover, this publication provides documentation to transfer the how-to-skills for cloud oriented operational management of Open Platform for DBaaS on Power Systems service and underlying infrastructure to the technical teams. Open Platform for DBaaS on Power Systems mission is to provide scalable and reliable cloud database as a service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework. For example, Trove is a database as a service for OpenStack. It is designed to run entirely on OpenStack, with the goal of allowing users to quickly and easily utilize the features of a relational or non-relational database without the burden of handling complex administrative tasks. Cloud users and database administrators can provision and manage multiple database instances as needed. Initially, the service focuses on providing resource isolation at high performance while automating complex administrative tasks including deployment, configuration, patching, backups, restores, and monitoring. In the context of this publication, the monitoring tool implemented is Nagios Core which is an open source monitoring tool. Hence, when you see a reference of Nagios in this book, Nagios Core is the open source monitoring solution implemented. Also note that the implementation of Open Platform for DBaaS on IBM Power Systems is based on open source solutions. This book is targeted toward sellers, architects, brand specialists, distributors, resellers and anyone developing and implementing Open Platform for DBaaS on Power Systems solutions.

IBM TS4500 R4 Tape Library Guide

2018-03-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Larry Coyne , Michael Engelbrecht , Simon Browne

ELK IBM Cyber Security data data-engineering

Abstract The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot, because the TS4500 can store up to 8.25 petabytes (PB) of uncompressed data in a single frame library or scale up at 1.5 PB per square foot to over 263 PB, which is more than 4 times the capacity of the IBM TS3500 tape library. The TS4500 offers these benefits: High availability dual active accessors with integrated service bays to reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from both the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to an additional 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the existing TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for the IBM TS1155 while also supporting TS1150 and TS1140 tape drive: The TS1155 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1155 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The new TS1155 Tape Drive Model 55E delivers a 10 Gb Ethernet host attachment interface optimized for cloud-based and hyperscale environments. The TS1155 Tape Drive Model 55F delivers a native data rate of 360 MBps, the same load/ready, locate speeds, and access times as the TS1150, and includes dual-port 8 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 8 tape drive: The LTO Ultrium 8 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 7, while they still protect your investment in the previous technology. Support of LTO 8 Type M cartridge (M8): The LTO Program is introducing a new capability with LTO-8 drives. The ability of the LTO-8 drive to write 9 TB on a brand new LTO-7 cartridge instead of 6 TB as specified by the LTO-7 format. Such a cartridge is called an LTO-7 initialized LTO-8 Type M cartridge. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), and command-line interface (CLI). You learn how to accomplish several specific tasks: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame. Manage storage by using the ALMS feature. Improve business continuity and disaster recovery with dual active accessor, automatic control path failover, and data path failover. Help ensure security and regulatory compliance with tape-drive encryption and Write Once Read Many (WORM) media. Support IBM LTO Ultrium 8, 7, 6, and 5, IBM TS1155, TS1150, and TS1140 tape drives. Provide a flexible upgrade path for users who want to expand their tape storage as their needs grow. Reduce the storage footprint and simplify cabling with 10 U of rack space on top of the library. This guide is for anyone who wants to understand more about the IBM TS4500 tape library. It is particularly suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Beginning PostgreSQL on the Cloud: Simplifying Database as a Service on Cloud Platforms

2018-03-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Avinash Vallarapu , Baji Shaik (Amazon RDS)

Azure data data-engineering postgresql relational-databases

Get started with PostgreSQL on the cloud and discover the advantages, disadvantages, and limitations of the cloud services from Amazon, Rackspace, Google, and Azure. Once you have chosen your cloud service, you will focus on securing it and developing a back-up strategy for your PostgreSQL instance as part of your long-term plan. Beginning PostgreSQL on the Cloud covers other essential topics such as setting up replication and high availability; encrypting your saved cloud data; creating a connection pooler for your database; and monitoring PostgreSQL on the cloud. The book concludes by showing you how to install and configure some of the tools that will help you get started with PostgreSQL on the cloud. This book shows you how database as a service enables you to spread your data across multiple data centers, ensuring that it is always accessible. You’ll discover that this model does not expect you to install and maintain databases yourself because the database service provider does it for you. You no longer have to worry about the scalability and high availability of your database. What You Will Learn Migrate PostgreSQL to the cloud Choose the best configuration and specifications of cloud instances Set up a backup strategy that enables point-in-time recovery Use connection pooling and load balancing on cloud environments Monitor database environments on the cloud Who This Book Is For Those who are looking to migrate to PostgreSQL on the Cloud. It will also help database administrators in setting up a cloud environment in an optimized way and help them with their day-to-day tasks.

Camel in Action, Second Edition

2018-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jonathan Anstey , Claus Ibsen

Docker Java Kubernetes Cyber Security XML camel data data-engineering streaming-messaging

Camel in Action, Second Edition is the most complete Camel book on the market. Written by core developers of Camel and the authors of the highly acclaimed first edition, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. About the Technology Apache Camel is a Java framework that implements enterprise integration patterns (EIPs) and comes with over 200 adapters to third-party systems. A concise DSL lets you build integration logic into your app with just a few lines of Java or XML. By using Camel, you benefit from the testing and experience of a large and vibrant open source community. About the Book Camel in Action, Second Edition is the definitive guide to the Camel framework. It starts with core concepts like sending, receiving, routing, and transforming data. It then goes in depth on many topics such as how to develop, debug, test, deal with errors, secure, scale, cluster, deploy, and monitor your Camel applications. The book also discusses how to run Camel with microservices, reactive systems, containers, and in the cloud. What's Inside Coverage of all relevant EIPs Camel microservices with Spring Boot Camel on Docker and Kubernetes Error handling, testing, security, clustering, monitoring, and deployment Hundreds of examples in Java and XML About the Reader Readers should be familiar with Java. This book is accessible to beginners and invaluable to experts. About the Authors Claus Ibsen is a senior principal engineer working for Red Hat specializing in cloud and integration. He has worked on Apache Camel for the last nine years where he heads the project. Claus lives in Denmark. Jonathan Anstey is an engineering manager at Red Hat and a core Camel contributor. He lives in Newfoundland, Canada. Quotes I highly recommend this book to anyone with even a passing interest in Apache Camel. Do take Camel for a ride...and don't get the hump! - From the Foreword by James Strachan, Creator of Apache Camel Claus and Jon are great writers, relying on figures and diagrams where needed and presenting lots of code snippets and worked examples. - From the Foreword by Dr. Mark Little, Technical Director of JBoss The second edition of this all-time classic is an indispensable companion for your Apache Camel rides. - Gregor Zurowski, Apache Camel Committer The absolute best way to learn and use Camel - top to bottom, front to back, and all the way through. Camel is a fantastic tool - every Java coder should have a copy of this book. - Rick Wagner, Red Hat An excellent book and the definite reference for experienced engineers. - Yan Guo, EventBrite

SQL Server 2017 Administration Inside Out, First Edition

2018-02-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by William Assaf , Sven Aelterman , Randolph West

Azure PowerShell Cyber Security SQL data data-engineering microsoft-sql-server relational-databases

Conquer SQL Server 2017 administration—from the inside out Dive into SQL Server 2017 administration—and really put your SQL Server DBA expertise to work. This supremely organized reference packs hundreds of timesaving solutions, tips, and workarounds—all you need to plan, implement, manage, and secure SQL Server 2017 in any production environment: on-premises, cloud, or hybrid. Four SQL Server experts offer a complete tour of DBA capabilities available in SQL Server 2017 Database Engine, SQL Server Data Tools, SQL Server Management Studio, and via PowerShell. Discover how experts tackle today’s essential tasks—and challenge yourself to new levels of mastery. • Install, customize, and use SQL Server 2017’s key administration and development tools • Manage memory, storage, clustering, virtualization, and other components • Architect and implement database infrastructure, including IaaS, Azure SQL, and hybrid cloud configurations • Provision SQL Server and Azure SQL databases • Secure SQL Server via encryption, row-level security, and data masking • Safeguard Azure SQL databases using platform threat protection, firewalling, and auditing • Establish SQL Server IaaS network security groups and user-defined routes • Administer SQL Server user security and permissions • Efficiently design tables using keys, data types, columns, partitioning, and views • Utilize BLOBs and external, temporal, and memory-optimized tables • Master powerful optimization techniques involving concurrency, indexing, parallelism, and execution plans • Plan, deploy, and perform disaster recovery in traditional, cloud, and hybrid environments For Experienced SQL Server Administrators and Other Database Professionals • Your role: Intermediate-to-advanced level SQL Server database administrator, architect, developer, or performance tuning expert • Prerequisites: Basic understanding of database administration procedures

IBM z14 Technical Guide

2018-02-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Esra Ufacik , Frank Packheiser , John Troy , Octavian Lascu , Michal Kordyzon , Hervey Kamga , William G White , Bo XU

Analytics IBM Cyber Security data data-engineering

Abstract This IBM® Redbooks® publication describes the new member of the IBM Z family, IBM z14®. IBM z14 is the trusted enterprise platform for pervasive encryption, integrating data, transactions, and insights into the data. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 servers to deliver a record level of capacity over the prior IBM Z platforms. In its maximum configuration, z14 is powered by up to 170 client characterizable microprocessors (cores) running at 5.2 GHz. This configuration can run more than 146,000 million instructions per second (MIPS) and up to 32 TB of client memory. The IBM z14 Model M05 is estimated to provide up to 35% more total system capacity than the IBM z13® Model NE1. This Redbooks publication provides information about IBM z14 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with existing IBM Z technology and terminology.

Complete Guide to Open Source Big Data Stack

2018-01-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Frampton

Big Data Cassandra Cloud Storage DataViz Hadoop Kafka Spark apache-spark data data-engineering

See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together. In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examines Apache Brooklyn. After that, he uses each chapter to introduce one piece of the big data stack—sharing how to source the software and how to install it. You learn by simple example, step by step and chapter by chapter, as a real big data stack is created. The book concentrates on Apache-based systems and shares detailed examples of cloud storage, release management, resource management, processing, queuing, frameworks, data visualization, and more. What You’ll Learn Install a private cloud onto the local cluster using Apache cloud stack Source, install, and configure Apache: Brooklyn, Mesos, Kafka, and Zeppelin See how Brooklyn can be used to install Mule ESB on a cluster and Cassandra in the cloud Install and use DCOS for big data processing Use Apache Spark for big data stack data processing Who This Book Is For Developers, architects, IT project managers, database administrators, and others charged with developing or supporting a big data system. It is also for anyone interested in Hadoop or big data, and those experiencing problems with data size.

talk-data.com

Activity Trend

Top Events

Top Speakers

PySpark Cookbook

Streaming Change Data Capture

Big Data Architect???s Handbook

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Implementing IBM FlashSystem 900 Model AE3

IBM z14 Model ZR1 Technical Guide

Data Analytics with Spark Using Python, First edition

Big Data Analytics with Hadoop 3

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering

Hands-On Data Warehousing with Azure Data Factory

Implementing IBM FlashSystem V9000 AE3

IBM z14 Model ZR1 Technical Introduction

Modern Big Data Processing with Hadoop

IBM Open Platform for DBaaS on IBM Power Systems

IBM TS4500 R4 Tape Library Guide

Beginning PostgreSQL on the Cloud: Simplifying Database as a Service on Cloud Platforms

Camel in Action, Second Edition

SQL Server 2017 Administration Inside Out, First Edition

IBM z14 Technical Guide

Complete Guide to Open Source Big Data Stack