talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

499

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
IBM Spectrum Accelerate Deployment, Usage, and Maintenance

Abstract This edition applies to IBM® Spectrum Accelerate V11.5.4. IBM Spectrum Accelerate™, a member of IBM Spectrum Storage™, is an agile, software-defined storage solution for enterprise and cloud that builds on the customer-proven and mature IBM XIV® storage software. The key characteristic of Spectrum Accelerate is that it can be easily deployed and run on purpose-built or existing hardware that is chosen by the customer. IBM Spectrum Accelerate enables rapid deployment of high-performance and scalable block data storage infrastructure over commodity hardware on-premises or off-premises. This IBM Redbooks® publication provides a broad understanding of IBM Spectrum Accelerate. The book introduces Spectrum Accelerate and describes planning and preparation that are essential for a successful deployment of the solution. The deployment is described through a step-by-step approach, by using a graphical user interface (GUI) based method or a simple command-line interface (CLI) based procedure. Chapters in this book describe the logical configuration of the system, host support and business continuity functions, and migration. Although it makes many references to the XIV storage software, the book also emphasizes where IBM Spectrum Accelerate differs from XIV. Finally, a substantial portion of the book is dedicated to maintenance and troubleshooting to provide detailed guidance for the customer support personnel.

Moving Hadoop to the Cloud

Until recently, Hadoop deployments existed on hardware owned and run by organizations. Now, of course, you can acquire the computing resources and network connectivity to run Hadoop clusters in the cloud. But there’s a lot more to deploying Hadoop to the public cloud than simply renting machines. This hands-on guide shows developers and systems administrators familiar with Hadoop how to install, use, and manage cloud-born clusters efficiently. You’ll learn how to architect clusters that work with cloud-provider features—not just to avoid pitfalls, but also to take full advantage of these services. You’ll also compare the Amazon, Google, and Microsoft clouds, and learn how to set up clusters in each of them. Learn how Hadoop clusters run in the cloud, the problems they can help you solve, and their potential drawbacks Examine the common concepts of cloud providers, including compute capabilities, networking and security, and storage Build a functional Hadoop cluster on cloud infrastructure, and learn what the major providers require Explore use cases for high availability, relational data with Hive, and complex analytics with Spark Get patterns and practices for running cloud clusters, from designing for price and security to dealing with maintenance

Learning SAP Analytics Cloud

Discover the power of SAP Analytics Cloud in solving business intelligence challenges through concise and clear instruction. This book is the essential guide for beginners, providing you a comprehensive understanding of the platform's features and capabilities. By the end, you'll master creating reports, models, and dashboards, making data-driven decisions with confidence. What this Book will help me do Learn how to navigate and utilize the SAP Analytics Cloud interface effectively. Create data models using various sources like Excel or text files for comprehensive insights. Design and compile visually engaging stories, reports, and dashboards effortlessly. Master collaborative and presentation tools inside SAP Digital Boardroom. Understand how to plan, predict, and analyze seamlessly within a single platform. Author(s) None Ahmed is an experienced SAP consultant and analytics professional, bringing years of practical experience in BI tools and enterprise analytics. As an expert in SAP Analytics Cloud, None has guided numerous teams in deploying effective analytics solutions. Their writing aims to demystify complex tools for learners. Who is it for? This book is ideal for IT professionals, business analysts, and newcomers eager to understand SAP Analytics Cloud. Beginner-level BI developers and managers seeking guided steps for mastering this platform will find it invaluable. If you aim to enhance your career in cloud-based analytics, this book is tailored for you.

Frank Kane's Taming Big Data with Apache Spark and Python

This book introduces you to the world of Big Data processing using Apache Spark and Python. You will learn to set up and run Spark on different systems, process massive datasets, and create solutions to real-world Big Data challenges with over 15 hands-on examples included. What this Book will help me do Understand the basics of Apache Spark and its ecosystem. Learn how to process large datasets with Spark RDDs using Python. Implement machine learning models with Spark's MLlib library. Master real-time data processing with Spark Streaming modules. Deploy and run Spark jobs on cloud clusters using AWS EMR. Author(s) Frank Kane spent 9 years working at Amazon and IMDb, handling and solving real-world machine learning and Big Data problems. Today, as an instructional designer and educator, he brings his wealth of experience to learners around the globe by creating accessible, practical learning resources. His teaching is clear, engaging, and designed to prepare students for real-world applications. Who is it for? This book is ideal for data scientists or data analysts seeking to delve into Big Data processing with Apache Spark. Readers who have foundational knowledge of Python, as well as some understanding of data processing principles, will find this book useful to sharpen their skills further. It is designed for those eager to learn the practical applications of Big Data tools in today's industry environments. By the end of this book, you should feel confident tackling Big Data challenges using Spark and Python.

Learning Elasticsearch

This comprehensive guide to Elasticsearch will teach you how to build robust and scalable search and analytics applications using Elasticsearch 5.x. You will learn the fundamentals of Elasticsearch, including its APIs and tools, and how to apply them to real-world problems. By the end of the book, you will have a solid grasp of Elasticsearch and be ready to implement your own solutions. What this Book will help me do Master the setup and configuration of Elasticsearch and Kibana. Learn to efficiently query and analyze both structured and unstructured data. Understand how to use Elasticsearch aggregations to perform advanced analytics. Gain knowledge of advanced search features including geospatial queries and autocomplete. Explore the Elastic Stack and learn deployment best practices and cloud hosting options. Author(s) None Andhavarapu is an expert in database technology and distributed systems, with years of experience in Elasticsearch. Their passion for search technologies is reflected in their clear and practical teaching style. They've written this guide to help developers of all levels get up to speed with Elasticsearch quickly and comprehensively. Who is it for? This book is perfect for software developers looking to implement effective search and analytics solutions. It's ideal for those who are new to Elasticsearch as well as for professionals familiar with other search tools like Lucene or Solr. The book assumes basic programming knowledge but no prior experience with Elasticsearch.

Apache Spark 2.x Cookbook

Discover how to harness the power of Apache Spark 2.x for your Big Data processing projects. In this book, you will explore over 70 cloud-ready recipes that will guide you to perform distributed data analytics, structured streaming, machine learning, and much more. What this Book will help me do Effectively install and configure Apache Spark with various cluster managers and platforms. Set up and utilize development environments tailored for Spark applications. Operate on schema-aware data using RDDs, DataFrames, and Datasets. Perform real-time streaming analytics with sources such as Apache Kafka. Leverage MLlib for supervised learning, unsupervised learning, and recommendation systems. Author(s) None Yadav is a seasoned data engineer with a deep understanding of Big Data tools and technologies, particularly Apache Spark. With years of experience in the field of distributed computing and data analysis, Yadav brings practical insights and techniques to enrich the learning experience of readers. Who is it for? This book is ideal for data engineers, data scientists, and Big Data professionals who are keen to enhance their Apache Spark 2.x skills. If you're working with distributed processing and want to solve complex data challenges, this book addresses practical problems. Note that a basic understanding of Scala is recommended to get the most out of this resource.

Mastering Ceph

Mastering Ceph offers a comprehensive guide to mastering the Ceph distributed storage system, empowering you to implement and manage scalable storage solutions effectively. As you delve into the chapters, you'll gain the practical experience needed to handle Ceph with confidence, achieve resource optimization, and ensure high availability for critical applications. What this Book will help me do Understand and utilize Ceph's advanced capabilities such as erasure coding and tiering for storage efficiency. Implement and manage scalable and resilient Ceph clusters effectively, easing resource allocation. Use tools like Ansible and Vagrant to deploy Ceph clusters quickly and reproducibly. Enhance your troubleshooting skills to resolve complex storage issues and ensure cluster stability. Develop applications to integrate with Ceph using Librados and distributed computation classes. Author(s) This book was authored by None Fisk, an experienced professional in cloud and distributed storage systems. Known for their expertise in Ceph, None Fisk shares practical insights developed over years of working as an administrator and developer. Through their accessible and systematic writing, they guide readers to overcome real-world storage challenges. Who is it for? This detailed guide is ideal for developers and system administrators familiar with deploying Ceph, who want to deepen their understanding of its advanced features. If you're aiming to optimize performance and design robust storage solutions, this is the book for you. Prior experience with Ceph is recommended to fully benefit from the book's insights.

Oracle on IBM z Systems

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® z Systems®. The enterprise-grade Linux on IBM z Systems solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from IBM z Systems®. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Oracle on LinuxONE

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® LinuxONE. The enterprise-grade Linux on LinuxONE solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from LinuxONE. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Sams Teach Yourself Hadoop in 24 Hours

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

Prepare for Microsoft Exam 70-761–and help demonstrate your real-world mastery of SQL Server 2016 Transact-SQL data management, queries, and database programming. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: Filter, sort, join, aggregate, and modify data Use subqueries, table expressions, grouping sets, and pivoting Query temporal and non-relational data, and output XML or JSON Create views, user-defined functions, and stored procedures Implement error handling, transactions, data types, and nulls This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have experience working with SQL Server as a database administrator, system engineer, or developer Includes downloadable sample database and code for SQL Server 2016 SP1 (or later) and Azure SQL Database Querying Data with Transact-SQL About the Exam Exam 70-761 focuses on the skills and knowledge necessary to manage and query data and to program databases with Transact-SQL in SQL Server 2016. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of essential skills for building and implementing on-premises and cloud-based databases across organizations. Exam 70-762 (Developing SQL Databases) is also required for MCSA: SQL 2016 Database Development certification. See full details at: microsoft.com/learning

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

Proven Database Optimization Solutions―Fully Updated for Oracle Database 12c Release 2 Systematically identify and eliminate database performance problems with help from Oracle Certified Master Richard Niemiec. Filled with real-world case studies and best practices, Oracle Database 12c Release 2 Performance Tuning Tips and Techniques details the latest monitoring, troubleshooting, and optimization methods. Find out how to identify and fix bottlenecks on premises and in the cloud, configure storage devices, execute effective queries, and develop bug-free SQL and PL/SQL code. Testing, reporting, and security enhancements are also covered in this Oracle Press guide. • Properly index and partition Oracle Database 12c Release 2 • Work effectively with Oracle Cloud, Oracle Exadata, and Oracle Enterprise Manager • Efficiently manage disk drives, ASM, RAID arrays, and memory • Tune queries with Oracle SQL hints and the Trace utility • Troubleshoot databases using V$ views and X$ tables • Create your first cloud database service and prepare for hybrid cloud • Generate reports using Oracle’s Statspack and Automatic Workload Repository tools • Use sar, vmstat, and iostat to monitor operating system statistics

Learning PySpark

"Learning PySpark" guides you through mastering the integration of Python with Apache Spark to build scalable and efficient data applications. You'll delve into Spark 2.0's architecture, efficiently process data, and explore PySpark's capabilities ranging from machine learning to structured streaming. By the end, you'll be equipped to craft and deploy robust data pipelines and applications. What this Book will help me do Master the Spark 2.0 architecture and its Python integration with PySpark. Leverage PySpark DataFrames and RDDs for effective data manipulation and analysis. Develop scalable machine learning models using PySpark's ML and MLlib libraries. Understand advanced PySpark features such as GraphFrames for graph processing and TensorFrames for deep learning models. Gain expertise in deploying PySpark applications locally and on the cloud for production-ready solutions. Author(s) Authors None Drabas and None Lee bring extensive experience in data engineering and Python programming. They combine a practical, example-driven approach with deep insights into Apache Spark's ecosystem. Their expertise and clarity in writing make this book accessible for individuals aiming to excel in big data technologies with Python. Who is it for? This book is best suited for Python developers who want to integrate Apache Spark 2.0 into their workflow to process large-scale data. Ideal readers will have foundational knowledge of Python and seek to build scalable data-intensive applications using Spark, regardless of prior experience with Spark itself.

Big Data Now: 2016 Edition

Now in its sixth edition, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve examined throughout 2016. This collection of blog posts, authored by leading thinkers and experts in the field, reflects a unique set of themes we’ve identified as gaining significant attention and traction. Our list of topics for 2016 includes: Careers in data Tools and architecture for big data Intelligent real-time applications Cloud infrastructure Machine learning: models and training Deep learning and artificial intelligence

Cloud Data Sharing with IBM Spectrum Scale

This IBM® Redpaper™ publication provides information to help you with the sizing, configuration, and monitoring of hybrid cloud solutions using the Cloud data sharing feature of IBM Spectrum Scale™. IBM Spectrum Scale, formerly IBM General Parallel File System (IBM GPFS™), is a scalable data and file management solution that provides a global namespace for large data sets along with several enterprise features. Cloud data sharing allows for the sharing and use of data between various cloud object storage types and IBM Spectrum Scale. Cloud data sharing can help with the movement of data in both directions, between file systems and cloud object storage, so that data is where it needs to be, when it needs to be there. This paper is intended for IT architects, IT administrators, storage administrators, and those who want to learn more about sizing, configuration, and monitoring of hybrid cloud solutions using IBM Spectrum Scale and Cloud data sharing.

HBase High Performance Cookbook

"HBase High Performance Cookbook" is your guide to mastering the optimization, scaling, and tuning of HBase systems. Covering everything from configuring HBase clusters to designing scalable table structures and performance tuning, this comprehensive book provides practical advice and strategies for leveraging HBase's full potential. By following this book's recipes, you'll supercharge your HBase expertise. What this Book will help me do Understand how to configure HBase for optimal performance, improving your data system's efficiency. Learn to design table structures to maximize scalability and functionality in HBase. Gain skills in performing CRUD operations and using advanced features like MapReduce within HBase. Discover practices for integrating HBase with other technologies such as ElasticSearch. Master the steps involved in setting up and optimizing HBase in cloud environments for enhanced performance. Author(s) Ruchir Choudhry is a seasoned data management professional with extensive experience in distributed database systems. He possesses deep expertise in HBase, Hadoop, and other big data technologies. His practical and engaging writing style aims to demystify complex technical topics, making them accessible to developers and architects alike. Who is it for? This book is tailored for developers and system architects looking to deepen their understanding of HBase. Whether you are experienced with other NoSQL databases or are new to HBase, this book provides extensive practical knowledge. Ideal for professionals working in big data applications or those eager to optimize and scale their database systems effectively.

EU GDPR & EU-US Privacy Shield: A Pocket Guide
A concise introduction to EU GDPR and EU-US Privacy Shield

The EU General Data Protection Regulation will unify data protection and simplify the use of personal data across the EU when it comes into force in May 2018.

It will also apply to every organization in the world that processes personal information of EU residents.

US organizations that process EU residents' personal data will be able to comply with the GDPR via the EU-US Privacy Shield (the successor to the Safe Harbor framework), which permits international data transfers of EU data to US organizations that self-certify that they have met a number of requirements.

EU GDPR & EU-US Privacy Shield – A Pocket Guide provides an essential introduction to this new data protection law, explaining the Regulation and setting out the compliance obligations for US organizations in handling data of EU citizens, including guidance on the EU-US Privacy Shield.

Product overview

EU GDPR & EU-US Privacy Shield – A Pocket Guide sets out:

A brief history of data protection and national data protection laws in the EU (such as the UK DPA, German BDSG and French LIL). The terms and definitions used in the GDPR, including explanations. The key requirements of the GDPR, including: Which fines apply to which Articles; The six principles that should be applied to any collection and processing of personal data; The Regulation’s applicability; Data subjects’ rights; Data protection impact assessments (DPIAs); The role of the data protection officer (DPO) and whether you need one; Data breaches, and the notification of supervisory authorities and data subjects; Obligations for international data transfers. How to comply with the Regulation, including: Understanding your data, and where and how it is used (e.g. Cloud suppliers, physical records); The documentation you need to maintain (such as statements of the information you collect and process, records of data subject consent, processes for protecting personal data); The “appropriate technical and organizational measures” you need to take to ensure your compliance with the Regulation. The history and principles of the EU-US Privacy Shield, and an overview of what organizations must do to comply. A full index of the Regulation, enabling you to find relevant Articles quickly and easily.

IBM PowerVC Version 1.3.2 Introduction and Configuration

IBM® Power Virtualization Center (IBM® PowerVC™) is an advanced, enterprise virtualization management offering for IBM Power Systems™. This IBM Redbooks® publication introduces IBM PowerVC and helps you understand its functions, planning, installation, and setup. IBM PowerVC Version 1.3.2 supports both large and small deployments, either by managing IBM PowerVM® that is controlled by the Hardware Management Console (HMC) by IBM PowerVM NovaLink, or by managing PowerKVM directly. With this capability, IBM PowerVC can manage IBM AIX®, IBM i, and Linux workloads that run on IBM POWER® hardware. IBM PowerVC is available as a Standard Edition, or as a Cloud PowerVC Manager edition. IBM PowerVC includes the following features and benefits: Virtual image capture, deployment, and management Policy-based virtual machine (VM) placement to improve use Management of real-time optimization and VM resilience to increase productivity VM Mobility with placement policies to reduce the burden on IT staff in a simple-to-install and easy-to-use graphical user interface (GUI) Role-based security policies to ensure a secure environment for common tasks The ability to enable an administrator to enable Dynamic Resource Optimization on a schedule IBM Cloud PowerVC Manager includes all of the IBM PowerVC Standard Edition features and adds: A Self-service portal that allows the provisioning of new VMs without direct system administrator intervention. There is an option for policy approvals for the requests that are received from the self-service portal. Pre-built deploy templates that are set up by the cloud administrator that simplify the deployment of VMs by the cloud user. Cloud management policies that simplify management of cloud deployments. Metering data that can be used for chargeback. This publication is for experienced users of IBM PowerVM and other virtualization solutions who want to understand and implement the next generation of enterprise virtualization management for Power Systems. Unless stated otherwise, the content of this publication refers to IBM PowerVC Version 1.3.2.

Introducing and Implementing IBM FlashSystem V9000

The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with highly virtualized environments, cloud computing, mobile and social systems of engagement, and in-depth, real-time analytics. Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate as they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today’s data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V7.7 and introduces the recently announced V7.8. It describes the product architecture, software, hardware, and implementation, and provides hints and tips. It illustrates use cases and independent software vendor (ISV) scenarios that demonstrate real-world solutions, and also provides examples of the benefits gained by integrating the IBM FlashSystem storage into business environments. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

IBM Business Process Manager Operations Guide

This IBM® Redbooks® publication provides operations teams with architectural design patterns and guidelines for the day-to-day challenges that they face when managing their IBM IBM Business Process Manager (BPM) infrastructure. Today, IBM BPM L2 and L3 Support and SWAT teams are constantly advising customers how to deal with the following common challenges: Deployment options (on-premises, patterns, cloud, and so on) Administration DevOps Automation Performance monitoring and tuning Infrastructure management Scalability High Availability and Data Recovery Federation This publication enables customers to become self-sufficient, promote consistency and accelerate IBM BPM Support engagements. This IBM Redbooks publication is targeted toward technical professionals (technical support staff, IT Architects, and IT Specialists) who are responsible for meeting day-to-day challenges that they face when they are managing an IBM BPM infrastructure.