O'Reilly Data Engineering Books

Apache Spark 2.x Cookbook

2017-05-31 O'Reilly Amazon

book

Rishi Yadav

data data-engineering apache-spark AI/ML Analytics Big Data

Discover how to harness the power of Apache Spark 2.x for your Big Data processing projects. In this book, you will explore over 70 cloud-ready recipes that will guide you to perform distributed data analytics, structured streaming, machine learning, and much more. What this Book will help me do Effectively install and configure Apache Spark with various cluster managers and platforms. Set up and utilize development environments tailored for Spark applications. Operate on schema-aware data using RDDs, DataFrames, and Datasets. Perform real-time streaming analytics with sources such as Apache Kafka. Leverage MLlib for supervised learning, unsupervised learning, and recommendation systems. Author(s) None Yadav is a seasoned data engineer with a deep understanding of Big Data tools and technologies, particularly Apache Spark. With years of experience in the field of distributed computing and data analysis, Yadav brings practical insights and techniques to enrich the learning experience of readers. Who is it for? This book is ideal for data engineers, data scientists, and Big Data professionals who are keen to enhance their Apache Spark 2.x skills. If you're working with distributed processing and want to solve complex data challenges, this book addresses practical problems. Note that a basic understanding of Scala is recommended to get the most out of this resource.

Mastering Ceph

2017-05-30 O'Reilly Amazon

book

Nick Fisk

data data-engineering ceph Ansible Cloud Computing

Mastering Ceph offers a comprehensive guide to mastering the Ceph distributed storage system, empowering you to implement and manage scalable storage solutions effectively. As you delve into the chapters, you'll gain the practical experience needed to handle Ceph with confidence, achieve resource optimization, and ensure high availability for critical applications. What this Book will help me do Understand and utilize Ceph's advanced capabilities such as erasure coding and tiering for storage efficiency. Implement and manage scalable and resilient Ceph clusters effectively, easing resource allocation. Use tools like Ansible and Vagrant to deploy Ceph clusters quickly and reproducibly. Enhance your troubleshooting skills to resolve complex storage issues and ensure cluster stability. Develop applications to integrate with Ceph using Librados and distributed computation classes. Author(s) This book was authored by None Fisk, an experienced professional in cloud and distributed storage systems. Known for their expertise in Ceph, None Fisk shares practical insights developed over years of working as an administrator and developer. Through their accessible and systematic writing, they guide readers to overcome real-world storage challenges. Who is it for? This detailed guide is ideal for developers and system administrators familiar with deploying Ceph, who want to deepen their understanding of its advanced features. If you're aiming to optimize performance and design robust storage solutions, this is the book for you. Prior experience with Ceph is recommended to fully benefit from the book's insights.

Oracle on IBM z Systems

2017-05-22 O'Reilly Amazon

book

Helene Grosch , David J Simpson , Armelle Chevé , Moshe Reder , Narjisse Zaki , Lydia Parziale , Sam Amsavelu

data data-engineering oracle-database-solutions Cloud Computing IBM Linux

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® z Systems®. The enterprise-grade Linux on IBM z Systems solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from IBM z Systems®. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Oracle on LinuxONE

2017-05-11 O'Reilly Amazon

book

Helene Grosch , David J Simpson , Armelle Chevé , Moshe Reder , Narjisse Zaki , Lydia Parziale , Sam Amsavelu

data data-engineering oracle-database-solutions Cloud Computing IBM Linux

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® LinuxONE. The enterprise-grade Linux on LinuxONE solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from LinuxONE. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Sams Teach Yourself Hadoop in 24 Hours

2017-04-07 O'Reilly Amazon

book

Jeffrey Aven

data data-engineering Hadoop API Big Data Cloud Computing

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

2017-04-06 O'Reilly Amazon

book

Itzik Ben-Gan

data data-engineering relational-databases microsoft-sql-server transact-sql Azure

Prepare for Microsoft Exam 70-761–and help demonstrate your real-world mastery of SQL Server 2016 Transact-SQL data management, queries, and database programming. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: Filter, sort, join, aggregate, and modify data Use subqueries, table expressions, grouping sets, and pivoting Query temporal and non-relational data, and output XML or JSON Create views, user-defined functions, and stored procedures Implement error handling, transactions, data types, and nulls This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have experience working with SQL Server as a database administrator, system engineer, or developer Includes downloadable sample database and code for SQL Server 2016 SP1 (or later) and Azure SQL Database Querying Data with Transact-SQL About the Exam Exam 70-761 focuses on the skills and knowledge necessary to manage and query data and to program databases with Transact-SQL in SQL Server 2016. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of essential skills for building and implementing on-premises and cloud-based databases across organizations. Exam 70-762 (Developing SQL Databases) is also required for MCSA: SQL 2016 Database Development certification. See full details at: microsoft.com/learning

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

2017-03-22 O'Reilly Amazon

book

Richard Niemiec

data data-engineering oracle-database-solutions Cloud Computing Oracle Cyber Security

Proven Database Optimization Solutions―Fully Updated for Oracle Database 12c Release 2 Systematically identify and eliminate database performance problems with help from Oracle Certified Master Richard Niemiec. Filled with real-world case studies and best practices, Oracle Database 12c Release 2 Performance Tuning Tips and Techniques details the latest monitoring, troubleshooting, and optimization methods. Find out how to identify and fix bottlenecks on premises and in the cloud, configure storage devices, execute effective queries, and develop bug-free SQL and PL/SQL code. Testing, reporting, and security enhancements are also covered in this Oracle Press guide. • Properly index and partition Oracle Database 12c Release 2 • Work effectively with Oracle Cloud, Oracle Exadata, and Oracle Enterprise Manager • Efficiently manage disk drives, ASM, RAID arrays, and memory • Tune queries with Oracle SQL hints and the Trace utility • Troubleshoot databases using V$ views and X$ tables • Create your first cloud database service and prepare for hybrid cloud • Generate reports using Oracle’s Statspack and Automatic Workload Repository tools • Use sar, vmstat, and iostat to monitor operating system statistics

Learning PySpark

2017-02-27 O'Reilly Amazon

book

Denny Lee , Tomasz Drabas

data data-engineering apache-spark PySpark AI/ML Big Data

"Learning PySpark" guides you through mastering the integration of Python with Apache Spark to build scalable and efficient data applications. You'll delve into Spark 2.0's architecture, efficiently process data, and explore PySpark's capabilities ranging from machine learning to structured streaming. By the end, you'll be equipped to craft and deploy robust data pipelines and applications. What this Book will help me do Master the Spark 2.0 architecture and its Python integration with PySpark. Leverage PySpark DataFrames and RDDs for effective data manipulation and analysis. Develop scalable machine learning models using PySpark's ML and MLlib libraries. Understand advanced PySpark features such as GraphFrames for graph processing and TensorFrames for deep learning models. Gain expertise in deploying PySpark applications locally and on the cloud for production-ready solutions. Author(s) Authors None Drabas and None Lee bring extensive experience in data engineering and Python programming. They combine a practical, example-driven approach with deep insights into Apache Spark's ecosystem. Their expertise and clarity in writing make this book accessible for individuals aiming to excel in big data technologies with Python. Who is it for? This book is best suited for Python developers who want to integrate Apache Spark 2.0 into their workflow to process large-scale data. Ideal readers will have foundational knowledge of Python and seek to build scalable data-intensive applications using Spark, regardless of prior experience with Spark itself.

Big Data Now: 2016 Edition

2017-02-15 O'Reilly Amazon

book

O'Reilly Media, Inc.

data data-engineering AI/ML Big Data Cloud Computing

Now in its sixth edition, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve examined throughout 2016. This collection of blog posts, authored by leading thinkers and experts in the field, reflects a unique set of themes we’ve identified as gaining significant attention and traction. Our list of topics for 2016 includes: Careers in data Tools and architecture for big data Intelligent real-time applications Cloud infrastructure Machine learning: models and training Deep learning and artificial intelligence

Cloud Data Sharing with IBM Spectrum Scale

2017-02-14 O'Reilly Amazon

book

Rob Basham , Amey Gokhale , Alexander Safonov , Ranjith Rajagopalan Nair , Ryan Marchese , Nikhil Khandelwal , Larry Coyne , Rishika Kedia , Arend Dittmer , Stan Li

data data-engineering IBM Cloud Computing

This IBM® Redpaper™ publication provides information to help you with the sizing, configuration, and monitoring of hybrid cloud solutions using the Cloud data sharing feature of IBM Spectrum Scale™. IBM Spectrum Scale, formerly IBM General Parallel File System (IBM GPFS™), is a scalable data and file management solution that provides a global namespace for large data sets along with several enterprise features. Cloud data sharing allows for the sharing and use of data between various cloud object storage types and IBM Spectrum Scale. Cloud data sharing can help with the movement of data in both directions, between file systems and cloud object storage, so that data is where it needs to be, when it needs to be there. This paper is intended for IT architects, IT administrators, storage administrators, and those who want to learn more about sizing, configuration, and monitoring of hybrid cloud solutions using IBM Spectrum Scale and Cloud data sharing.

HBase High Performance Cookbook

2017-01-31 O'Reilly Amazon

book

Ruchir Choudhry

data data-engineering nosql-databases Apache HBase Big Data Cloud Computing

"HBase High Performance Cookbook" is your guide to mastering the optimization, scaling, and tuning of HBase systems. Covering everything from configuring HBase clusters to designing scalable table structures and performance tuning, this comprehensive book provides practical advice and strategies for leveraging HBase's full potential. By following this book's recipes, you'll supercharge your HBase expertise. What this Book will help me do Understand how to configure HBase for optimal performance, improving your data system's efficiency. Learn to design table structures to maximize scalability and functionality in HBase. Gain skills in performing CRUD operations and using advanced features like MapReduce within HBase. Discover practices for integrating HBase with other technologies such as ElasticSearch. Master the steps involved in setting up and optimizing HBase in cloud environments for enhanced performance. Author(s) Ruchir Choudhry is a seasoned data management professional with extensive experience in distributed database systems. He possesses deep expertise in HBase, Hadoop, and other big data technologies. His practical and engaging writing style aims to demystify complex technical topics, making them accessible to developers and architects alike. Who is it for? This book is tailored for developers and system architects looking to deepen their understanding of HBase. Whether you are experienced with other NoSQL databases or are new to HBase, this book provides extensive practical knowledge. Ideal for professionals working in big data applications or those eager to optimize and scale their database systems effectively.

EU GDPR & EU-US Privacy Shield: A Pocket Guide

2017-01-10 O'Reilly Amazon

book

Alan Calder

data data-engineering data-security-privacy eu-general-data-protection-regulation-gdpr eu general data protection regulation (gdpr) Cloud Computing

A concise introduction to EU GDPR and EU-US Privacy Shield

The EU General Data Protection Regulation will unify data protection and simplify the use of personal data across the EU when it comes into force in May 2018.

It will also apply to every organization in the world that processes personal information of EU residents.

US organizations that process EU residents' personal data will be able to comply with the GDPR via the EU-US Privacy Shield (the successor to the Safe Harbor framework), which permits international data transfers of EU data to US organizations that self-certify that they have met a number of requirements.

EU GDPR & EU-US Privacy Shield – A Pocket Guide provides an essential introduction to this new data protection law, explaining the Regulation and setting out the compliance obligations for US organizations in handling data of EU citizens, including guidance on the EU-US Privacy Shield.

Product overview

EU GDPR & EU-US Privacy Shield – A Pocket Guide sets out:

A brief history of data protection and national data protection laws in the EU (such as the UK DPA, German BDSG and French LIL). The terms and definitions used in the GDPR, including explanations. The key requirements of the GDPR, including: Which fines apply to which Articles; The six principles that should be applied to any collection and processing of personal data; The Regulation’s applicability; Data subjects’ rights; Data protection impact assessments (DPIAs); The role of the data protection officer (DPO) and whether you need one; Data breaches, and the notification of supervisory authorities and data subjects; Obligations for international data transfers. How to comply with the Regulation, including: Understanding your data, and where and how it is used (e.g. Cloud suppliers, physical records); The documentation you need to maintain (such as statements of the information you collect and process, records of data subject consent, processes for protecting personal data); The “appropriate technical and organizational measures” you need to take to ensure your compliance with the Regulation. The history and principles of the EU-US Privacy Shield, and an overview of what organizations must do to comply. A full index of the Regulation, enabling you to find relevant Articles quickly and easily.

IBM PowerVC Version 1.3.2 Introduction and Configuration

2017-01-04 O'Reilly Amazon

book

Martin Parrella , Javier Bazan Lazcano

data data-engineering IBM Cloud Computing Linux Cyber Security

IBM® Power Virtualization Center (IBM® PowerVC™) is an advanced, enterprise virtualization management offering for IBM Power Systems™. This IBM Redbooks® publication introduces IBM PowerVC and helps you understand its functions, planning, installation, and setup. IBM PowerVC Version 1.3.2 supports both large and small deployments, either by managing IBM PowerVM® that is controlled by the Hardware Management Console (HMC) by IBM PowerVM NovaLink, or by managing PowerKVM directly. With this capability, IBM PowerVC can manage IBM AIX®, IBM i, and Linux workloads that run on IBM POWER® hardware. IBM PowerVC is available as a Standard Edition, or as a Cloud PowerVC Manager edition. IBM PowerVC includes the following features and benefits: Virtual image capture, deployment, and management Policy-based virtual machine (VM) placement to improve use Management of real-time optimization and VM resilience to increase productivity VM Mobility with placement policies to reduce the burden on IT staff in a simple-to-install and easy-to-use graphical user interface (GUI) Role-based security policies to ensure a secure environment for common tasks The ability to enable an administrator to enable Dynamic Resource Optimization on a schedule IBM Cloud PowerVC Manager includes all of the IBM PowerVC Standard Edition features and adds: A Self-service portal that allows the provisioning of new VMs without direct system administrator intervention. There is an option for policy approvals for the requests that are received from the self-service portal. Pre-built deploy templates that are set up by the cloud administrator that simplify the deployment of VMs by the cloud user. Cloud management policies that simplify management of cloud deployments. Metering data that can be used for chargeback. This publication is for experienced users of IBM PowerVM and other virtualization solutions who want to understand and implement the next generation of enterprise virtualization management for Power Systems. Unless stated otherwise, the content of this publication refers to IBM PowerVC Version 1.3.2.

Introducing and Implementing IBM FlashSystem V9000

2016-12-28 O'Reilly Amazon

book

Christophe Fagiano , Jon Herd , Detlef Helmbrecht , Carsten Larsen , Renato Santos , Jeffrey Irving , James Thompson , Jana Jamsek

data data-engineering IBM Analytics Cloud Computing Data Management

The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with highly virtualized environments, cloud computing, mobile and social systems of engagement, and in-depth, real-time analytics. Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate as they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today’s data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V7.7 and introduces the recently announced V7.8. It describes the product architecture, software, hardware, and implementation, and provides hints and tips. It illustrates use cases and independent software vendor (ISV) scenarios that demonstrate real-world solutions, and also provides examples of the benefits gained by integrating the IBM FlashSystem storage into business environments. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

IBM Business Process Manager Operations Guide

2016-12-16 O'Reilly Amazon

book

Bryan Brown , Weiming Gu , Chris Richardson , Karri S Carlson-Neumann , Dave Spriet , Shuo Zhang , Mark Filley

data data-engineering IBM Cloud Computing DevOps

This IBM® Redbooks® publication provides operations teams with architectural design patterns and guidelines for the day-to-day challenges that they face when managing their IBM IBM Business Process Manager (BPM) infrastructure. Today, IBM BPM L2 and L3 Support and SWAT teams are constantly advising customers how to deal with the following common challenges: Deployment options (on-premises, patterns, cloud, and so on) Administration DevOps Automation Performance monitoring and tuning Infrastructure management Scalability High Availability and Data Recovery Federation This publication enables customers to become self-sufficient, promote consistency and accelerate IBM BPM Support engagements. This IBM Redbooks publication is targeted toward technical professionals (technical support staff, IT Architects, and IT Specialists) who are responsible for meeting day-to-day challenges that they face when they are managing an IBM BPM infrastructure.

IBM DB2 12 for z/OS Technical Overview

2016-12-13 O'Reilly Amazon

book

Acacio Ricardo Gomes Pessoa , Tammie Dang , Meg Bernal

data data-engineering relational-databases ibm-db2 Agile/Scrum CI/CD

IBM® DB2® 12 for z/OS® delivers key innovations that increase availability, reliability, scalability, and security for your business-critical information. In addition, DB2 12 for z/OS offers performance and functional improvements for both transactional and analytical workloads and makes installation and migration simpler and faster. DB2 12 for z/OS also allows you to develop applications for the cloud and mobile devices by providing self-provisioning, multitenancy, and self-managing capabilities in an agile development environment. DB2 12 for z/OS is also the first version of DB2 built for continuous delivery. This IBM Redbooks® publication introduces the enhancements made available with DB2 12 for z/OS. The contents help database administrators to understand the new functions and performance enhancements, to plan for ways to use the key new capabilities, and to justify the investment in installing or migrating to DB2 12.

Implementing IBM FlashSystem 900

2016-11-18 O'Reilly Amazon

book

Karen Orlando , Jon Herd , Detlef Helmbrecht , Carsten Larsen , Ingo Dimmer , Matt Levan

data data-engineering IBM Analytics Cloud Computing

Today’s global organizations depend on being able to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900, powered by IBM FlashCore™ technology, they can make faster decisions based on real-time insights and unleash the power of the most demanding applications, including online transaction processing (OLTP) and analytics databases, virtual desktop infrastructures (VDIs), technical computing applications, and cloud environments. This IBM Redbooks® publication introduces clients to the IBM FlashSystem® 900. It provides in-depth knowledge of the product architecture, software and hardware, implementation, and hints and tips. Also illustrated are use cases that show real-world solutions for tiering, flash-only, and preferred-read, and also examples of the benefits gained by integrating the FlashSystem storage into business environments. This book is intended for pre-sales and post-sales technical support professionals and storage administrators, and for anyone who wants to understand how to implement this new and exciting technology. This book describes the following offerings of the IBM Spectrum™ Storage family: IBM Spectrum Storage™ IBM Spectrum Control™ IBM Spectrum Virtualize™ IBM Spectrum Scale™ IBM Spectrum Accelerate™

EU General Data Protection Regulation (GDPR): An Implementation and Compliance Guide

2016-11-03 O'Reilly Amazon

book

IT Governance Privacy Team

data data-engineering data-security-privacy eu-general-data-protection-regulation-gdpr eu general data protection regulation (gdpr) Cloud Computing

An in-depth guide to the changes your organization needs to make to comply with the EU GDPR.

The EU General Data Protection Regulation (GDPR) will supersede the 1995 EU Data Protection Directive (DPD) and all EU member states’ national laws based on it – including the UK Data Protection Act 1998 – in May 2018.

All organizations – wherever they are in the world – that process the personally identifiable information (PII) of EU residents must comply with the Regulation. Failure to do so could result in fines of up to €20 million or 4% of annual global turnover.

US organizations that process EU residents’ personal data can comply with the GDPR via the EU-US Privacy Shield, which replaced the EU-US Safe Harbor framework in 2016. The Privacy Shield is based on the DPD, and will likely be updated once the GDPR is applied in May 2018.

This book provides a detailed commentary on the GDPR, explains the changes you need to make to your data protection and information security regimes, and tells you exactly what you need to do to avoid severe financial penalties.

Product overview

EU GDPR – An Implementation and Compliance Guide is a clear and comprehensive guide to this new data protection law, explaining the Regulation, and setting out the obligations of data processors and controllers in terms you can understand.

Topics covered include:

The role of the data protection officer (DPO) – including whether you need one and what they should do. Risk management and data protection impact assessments (DPIAs), including how, when and why to conduct a DPIA. Data subjects’ rights, including consent and the withdrawal of consent; subject access requests and how to handle them; and data controllers’ and processors’ obligations. International data transfers to “third countries” – including guidance on adequacy decisions and appropriate safeguards; the EU-US Privacy Shield; international organizations; limited transfers; and Cloud providers. How to adjust your data protection processes to transition to GDPR compliance, and the best way of demonstrating that compliance. A full index of the Regulation to help you find the articles and stipulations relevant to your organization.

The GDPR will have a significant impact on organizational data protection regimes around the world. EU GDPR – An implementation and Compliance Guide shows you exactly what you need to do to comply with the new law.

About the authors

IT Governance is a leading global provider of IT governance, risk management, and compliance expertise, and we pride ourselves on our ability to deliver a broad range of integrated, high-quality solutions that meet the real-world needs of our international client base.

Our privacy team – led by Alan Calder, Richard Campo, and Adrian Ross – has substantial experience in privacy, data protection, compliance, and information security. This experience, and our understanding of the background and drivers for the GDPR, are combined in this manual to provide the world’s first guide to implementing the new data protection regulation.

Learning IBM Bluemix

2016-10-25 O'Reilly Amazon

book

Sreelatha Sankaranarayanan

data data-engineering IBM Cloud Computing Java JavaScript

Learning IBM Bluemix provides a comprehensive introduction to developing and deploying applications with the IBM Bluemix cloud platform. By following detailed examples and guided exercises, you'll understand the full life cycle of cloud-based application development, from initial setup to scaling and security. What this Book will help me do Understand the capabilities of IBM Bluemix as a Platform as a Service to build applications efficiently. Learn to develop and deploy applications using Cloud Foundry command line and Bluemix console. Explore microservices architecture and build scalable applications using Bluemix tools. Integrate on-premises systems with cloud-hosted applications on Bluemix. Develop mobile client applications with the support of Bluemix's Mobile services. Author(s) Sreelatha Sankaranarayanan is an experienced developer and cloud technology author, with extensive expertise in IBM Bluemix. Her passion for simplifying complex concepts is reflected in her engaging writing style, ensuring learners can master new skills effectively. She brings years of real-world experience in cloud computing and software development to her instructional materials. Who is it for? This book is tailored for developers aiming to transition to cloud-based application development using IBM Bluemix, with a focus on practical application. Readers should have foundational skills in Java and Node.js to fully benefit. Ideal for professionals looking to expand their capabilities with cloud infrastructure, or for those wanting to leverage microservices and cloud solutions in their applications.

Fast Data Processing with Spark 2 - Third Edition

2016-10-24 O'Reilly Amazon

book

Krishna Sankar , Holden Karau

data data-engineering apache-spark AI/ML Analytics API

Fast Data Processing with Spark 2 takes you through the essentials of leveraging Spark for big data analysis. You will learn how to install and set up Spark, handle data using its APIs, and apply advanced functionality like machine learning and graph processing. By the end of the book, you will be well-equipped to use Spark in real-world data processing tasks. What this Book will help me do Install and configure Apache Spark for optimal performance. Interact with distributed datasets using the resilient distributed dataset (RDD) API. Leverage the flexibility of DataFrame API for efficient big data analytics. Apply machine learning models using Spark MLlib to solve complex problems. Perform graph analysis using GraphX to uncover structural insights in data. Author(s) Krishna Sankar is an experienced data scientist and thought leader in big data technologies. With a deep understanding of machine learning, distributed systems, and Apache Spark, Krishna has guided numerous projects in data engineering and big data processing. Matei Zaharia, the co-author, is also widely recognized in the field of distributed systems and cloud computing, contributing to Apache Spark development. Who is it for? This book is catered to software developers and data engineers with a foundational understanding of Scala or Java programming. Beginner to medium-level understanding of big data processing concepts is recommended for readers. If you are aspiring to solve big data problems using scalable distributed computing frameworks, this book is perfect for you. By the end, you will be confident in building Spark-powered applications and analyzing data efficiently.

Oracle Application Express Administration: For DBAs and Developers

2016-10-22 O'Reilly Amazon

book

Luc Demanche , Francis Mignault

data data-engineering oracle-database-solutions Cloud Computing Oracle Cyber Security

Succeed in managing Oracle Application Express (APEX) environments. This book focuses on creating the right combination of scalability, high-availability, backup and recovery, integrity, and resource control. The book covers everything from simple to enterprise-class deployments, with emphasis on enterprise-level requirements and coverage of cloud and hybrid-cloud scenarios. Many books cover how to develop applications in Oracle APEX. It's a tool with a fast-growing user-base as developers come to know how quick and easy it is to create new applications that run in a browser. However, just getting an application off the ground is only a small part of a bigger picture. Applications must be supported. They must be available when users need them. They must be robust against disaster and secure against malicious attack. These are the issues addressed in . These are the issues that when tackled successfully lead to long term success in using Oracle APEX as a rapid application-development toolset. Oracle Application Express Administration Readers of this book learn how to install the Oracle APEX engine in support of small-scale projects such as at the departmental level, and in support of enterprise-level projects accessed by thousands of users across dozens of time zones. Readers learn to take advantage of Oracle Database's underlying feature set in regards to application scalability and performance, integrity, security, high-availability, and robustness against failure and data loss. also describes different cloud solutions, integration with Oracle E-Business Suite, and helps in taking advantage of multitenancy in Oracle Database 12c and beyond. Oracle Application Express Administration Covers important enterprise considerations such as scalability, robustness, high-availability. Describes cloud-based application deployment scenarios Focuses on creating the right deployment environment for long-term success What You Will Learn Install, upgrade, and configure robust APEX environments Back up and recover APEX applications and their data Monitor and tune the APEX engine and its applications Benefit from new administration features in APEX 5.0 Run under multi-tenant architecture in Oracle Database 12c Manage the use of scarce resources with Resource Manager Secure your data with advanced security features Build high-availability into your APEX deployments Integrate APEX with Oracle E-Business Suite Who This Book Is For Architects, administrators, and developers who want to better understand how APEX works in a corporate environment. Readers will use this book to design deployment architectures around Oracle Database strengths like multi-tenancy, resource management, and high availability. The book is also useful to administrators responsible for installation and upgrade, backup and recovery, and the ongoing monitoring of the APEX engine and the applications built upon it.

Securing Your Cloud: IBM z/VM Security for IBM z Systems and LinuxONE

2016-10-19 O'Reilly Amazon

book

Klaus Egeler , Vic Cross , Klaus Mueller , Willian Rampazzo , Lydia Parziale , Edi Lopes Alves

data data-engineering IBM Cloud Computing Linux Cyber Security

As workloads are being offloaded to IBM® z Systems™ based cloud environments, it is important to ensure that these workloads and environments are secure. This IBM Redbooks® publication describes the necessary steps to secure your environment for all of the components that are involved in a z Systems cloud infrastructure that uses IBM z/VM® and Linux on z Systems. The audience for this book is IT architects and those planning to use z Systems for their cloud environments.

VersaStack Solution by Cisco and IBM with Oracle RAC, IBM FlashSystem V9000, and IBM Spectrum Protect

2016-10-17 O'Reilly Amazon

book

Jon Tate , Dong Hai Yu , Randy Watson , Dharmesh Kamdar

data data-engineering IBM Agile/Scrum Analytics Cloud Computing

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations today require an IT infrastructure that is easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your data centers. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco UCS Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco Unified Computing System (Cisco UCS), Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM FlashSystem® V9000 enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVD) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators who are tasked with deploying a VersaStack solution with Oracle Real Application Clusters (RAC) and IBM Spectrum™ Protect.

Essentials of Cloud Application Development on IBM Bluemix

2016-10-10 O'Reilly Amazon

book

Hala A. Aziz , Ahmed Azraq , Sally Fikry , Ben Smith , Mohamed El-Khouly

data data-engineering IBM API Cloud Computing Computer Science

Abstract This IBM® Redbooks® publication is based on the Presentations Guide of the course "Essentials of Cloud Application Development on IBM Bluemix" that was developed by the IBM Redbooks team in partnership with IBM Middle East and Africa (MEA) University Program. This course is designed to teach university students the basic skills that are required to develop, deploy, and test cloud-based applications that use the IBM Bluemix® cloud services. The primary target audience for this course is university students in undergraduate computer science and computer engineer programs with no previous experience working in cloud environments. However, anyone new to cloud computing can benefit from this course. After completing this course, you should be able to accomplish these tasks: Describe the factors that lead to the adoption of cloud computing. Describe infrastructure as a service, platform as a service, and software as a service. Define cloud computing. Describe IBM Bluemix. Describe the architecture of IBM Bluemix. Identify the runtimes and services that Bluemix offers. Explain how to get started with Bluemix. Describe Bluemix organizations, domains, spaces, and users. Create Bluemix applications. Use services in a Bluemix application. Set environmental variables that are used with Bluemix services. Deploy and run Bluemix applications. Describe how to create an IBM SDK for Node.js application that runs on Bluemix. Explain how to manage a Bluemix account with the Cloud Foundry CLI.[ ]Describe how to integrate workstation development platforms with Bluemix. Manage application code and assets with IBM Bluemix DevOps services. Work with the Git repository that is used by DevOps services. Describe the characteristics of REST APIs. Describe the use of JSON as the preferred data format for REST APIs. dentify the data services that are available on Bluemix. Describe the features in Bluemix for developing mobile applications. Create a MobileFirst Services Starter application on Bluemix. Send push notifications from Bluemix and receive them on the mobile device emulator. The workshop materials were created in August 2016. Thus, all IBM Bluemix features discussed in this Presentations Guide and Bluemix user interfaces used in the examples are current as of August 2016. Note: This IBM Redbooks publication references exercises that are NOT included with this book. The exercises are only available to students attending the course.

IBM PowerVC Version 1.3.1 Introduction and Configuration Including IBM Cloud PowerVC Manager

2016-09-21 O'Reilly Amazon

book

Mika Heino , Guillermo Corti , Paul Sonnenberg

data data-engineering IBM Cloud Computing Linux Cyber Security

IBM® Power Virtualization Center (IBM® PowerVC™) is an advanced, enterprise virtualization management offering for IBM Power Systems™. This IBM Redbooks® publication introduces PowerVC and helps you understand its functions, planning, installation, and setup. PowerVC Version 1.3.1 supports both large and small deployments, either by managing IBM PowerVM® that is controlled by the Hardware Management Console (HMC) or by IBM PowerVM Novalink, or by managing PowerKVM directly. With this capability, PowerVC can manage IBM AIX®, IBM i, and Linux workloads that run on IBM POWER® hardware, including IBM PurePower systems. PowerVC is available as a Standard Edition, or as a Cloud PowerVC Manager edition. PowerVC Standard Edition includes the following features and benefits: Virtual image capture, deployment, and management Policy-based virtual machine (VM) placement to improve use Management of real-time optimization and VM resilience to increase productivity VM Mobility with placement policies to reduce the burden on IT staff in a simple-to-install and easy-to-use graphical user interface (GUI) Role-based security policies to ensure a secure environment for common tasks IBM Cloud PowerVC Manager includes all of the PowerVC Standard Edition features and adds: A self-service portal that enables user access to the cloud infrastructure on a per-project basis The ability to enable an administrator to enable Dynamic Resource Optimization on a schedule This publication is for experienced users of IBM PowerVM and other virtualization solutions who want to understand and implement the next generation of enterprise virtualization management for Power Systems. Unless stated otherwise, the content of this publication refers to IBM PowerVC Version 1.3.1.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Apache Spark 2.x Cookbook

Mastering Ceph

Oracle on IBM z Systems

Oracle on LinuxONE

Sams Teach Yourself Hadoop in 24 Hours

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

Learning PySpark

Big Data Now: 2016 Edition

Cloud Data Sharing with IBM Spectrum Scale

HBase High Performance Cookbook

EU GDPR & EU-US Privacy Shield: A Pocket Guide

IBM PowerVC Version 1.3.2 Introduction and Configuration

Introducing and Implementing IBM FlashSystem V9000

IBM Business Process Manager Operations Guide

IBM DB2 12 for z/OS Technical Overview

Implementing IBM FlashSystem 900

EU General Data Protection Regulation (GDPR): An Implementation and Compliance Guide

Learning IBM Bluemix

Fast Data Processing with Spark 2 - Third Edition

Oracle Application Express Administration: For DBAs and Developers

Securing Your Cloud: IBM z/VM Security for IBM z Systems and LinuxONE

VersaStack Solution by Cisco and IBM with Oracle RAC, IBM FlashSystem V9000, and IBM Spectrum Protect

Essentials of Cloud Application Development on IBM Bluemix

IBM PowerVC Version 1.3.1 Introduction and Configuration Including IBM Cloud PowerVC Manager