O'Reilly Data Engineering Books

Tabular Modeling with SQL Server 2016 Analysis Services Cookbook

2017-01-30 O'Reilly Amazon

book

Derek Wilson

data data-engineering relational-databases microsoft-sql-server Analytics BI

With "Tabular Modeling with SQL Server 2016 Analysis Services Cookbook," you'll discover how to harness the full potential of the latest Tabular models in SQL Server Analysis Services (SSAS). This practical guide equips data professionals with the tools, techniques, and knowledge to optimize data analytics and deliver fast, reliable, and impactful business insights. What this Book will help me do Understand the fundamentals of Tabular modeling and its advantages over traditional methods. Use SQL Server 2016 SSAS features to build and deploy Tabular models tailored to business needs. Master DAX for creating powerful calculated fields and optimized measures. Administer and secure your models effectively, ensuring robust BI solutions. Optimize performance and explore advanced features in Tabular solutions for maximum efficiency. Author(s) None Wilson is an experienced SQL BI professional with a strong background in database modeling and analytics. With years of hands-on experience in developing BI solutions, Wilson takes a practical and straightforward teaching approach. Their guidance in this book makes the complex topics of Tabular modeling and SSAS accessible to both seasoned professionals and newcomers to the field. Who is it for? This book is tailored for SQL BI professionals, database architects, and data analysts aiming to leverage Tabular models in SQL Server Analysis Services. It caters to those familiar with database management and basic BI concepts who are eager to improve their analysis solutions. It's a valuable resource if you aim to gain expertise in using tabular modeling for business intelligence.

IBM DS8880 Architecture and Implementation (Release 8.2.1)

2017-01-25 O'Reilly Amazon

book

Bjoern Wesselbaum , Kerstin Blum , Peter Kimmel , Andre Coelho , Sherry Brunson , Bert Dufrasne , Jeffery Cook

data data-engineering IBM Analytics

This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM DS8880 family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8880 systems. The IBM DS8000® family is a high-performance, high-capacity, highly secure, and resilient series of disk storage systems. The DS8880 family is the latest and most advanced of the DS8000 offerings to date. The high availability, multiplatform support, including IBM z Systems®, and simplified management tools help provide a cost-effective path to an on-demand world. The IBM DS8880 family now offers business-critical, all-flash, and hybrid data systems that span a wide range of price points: DS8884 -- Business Class DS8886 -- Enterprise Class DS8888 -- Analytics Class The DS8884 and DS8886 are available as either hybrid models, or can be configured as all-flash. Each model represents the most recent in this series of high-performance, high-capacity, flexible, and resilient storage systems. These systems are intended to address the needs of the most demanding clients. Two powerful IBM POWER8® processor-based servers manage the cache to streamline disk I/O, maximizing performance and throughput. These capabilities are further enhanced with the availability of the second generation of high-performance flash enclosures (HPFEs Gen-2). Like its predecessors, the DS8880 supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. All disk drives in the DS8880 storage system include the Full Disk Encryption (FDE) feature. The DS8880 can automatically optimize the use of each storage tier, particularly flash drives and flash cards, through the IBM Easy Tier® feature. The DS8880 also includes the Copy Services Manager code and allows for easier integration in a Lightweight Directory Access Protocol (LDAP) infrastructure.

IBM TS7700 Release 4.0 Guide

2017-01-21 O'Reilly Amazon

book

Aderson Pacini , Chen Zhu , Larry Coyne , Michael Scott , Derek Erdmann , Joe Hew , Katja Denefleh , Sosuke Matsui

data data-engineering IBM SAS

This IBM® Redbooks® publication highlights IBM TS7700 Release 4.0. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. The IBM TS7700 offers a modular, scalable, and high-performance architecture for mainframe tape virtualization for the IBM z™ Systems environment. It is a fully integrated, tiered storage hierarchy of disk and tape. This storage hierarchy is managed by robust storage management microcode with extensive self-management capability. It includes the following advanced functions: Policy management to control physical volume pooling Cache management Redundant copies, including across a grid network Copy mode control The IBM TS7700 offers enhanced statistical reporting. It also includes a standards-based Management Interface (MI) for IBM TS7700 management. IBM TS7700 R4.0 continues the next generation of IBM TS7700 for z Systems® tape: The IBM TS7760 is an all new hardware refresh and features Encryption Capable, high-capacity cache that uses 4 TB serial-attached Small Computer System Interface (SAS) HDDs in arrays that use dynamic disk pool configuration. This setup can scale to large capacities with the highest level of data protection. Release 4.0 introduces the option to attach to a TS4500 tape library, and to the previous TS3500 tape library, which contains back-end physical tape drives and policies to manage up to eight of the disk repositories in a tape-attached TS7760T. This TS7760T (Tape Attached) configuration mimics the behavior of a TS7740, with additional features that go beyond what a TS7740 can provide. The TS7760T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1150 and IBM TS1140 tape drives installed in an IBM TS4500 or TS3500 tape library. The TS7760 models are based on high-performance and redundant IBM POWER8® technology. They provide improved performance for most z Systems tape workloads when compared to the previous generations of IBM TS7700.

Exam Ref 70-762 Developing SQL Databases

2017-01-13 O'Reilly Amazon

book

Stacia Varga , Louis Davidson

data data-engineering SQL Microsoft RDBMS

Prepare for Microsoft Exam 70-762, Developing SQL Databases –and help demonstrate your real-world mastery of skills for building and implementing databases across organizations. Designed for database professionals who build and implement databases across organizations and who ensure high levels of data availability, Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: Design and implement database objects Implement programmability objects Manage database concurrency Optimize database objects and SQL infrastructure This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have working knowledge of Microsoft Windows, Transact-SQL, and relational databases About the Exam Exam 70-762 focuses on skills and knowledge for building and implementing databases across organizations and ensuring high levels of data availability. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of modern database development. Exam 70-761 (Querying Data with Transact-SQL) is also required for MCSA: SQL 2016 Database Development. See full details at: microsoft.com/learning

IBM Spectrum Archive Enterprise Edition V1.2.2: Installation and Configuration Guide

2017-01-13 O'Reilly Amazon

book

Wei Zheng Ong , Illarion Borisevich , Larry Coyne , Khanh Ngo , Stefan Neff

data data-engineering IBM

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive (formerly IBM Linear Tape File System™ (LTFS)) Enterprise Edition (EE) V1.2.2.0 for the IBM TS3310, IBM TS3500, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale™ (formerly IBM General Parallel File System (GPFS™)) based environment and helps encourage the use of tape as a critical tier in the storage environment. This is the third edition of IBM Spectrum Archive V1.2 (SG24-8333-00) although it is based on the prior editions of IBM Linear Tape File System Enterprise Edition V1.1.1.2: Installation and Configuration Guide, SG24-8143. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 7, 6, and 5 tape drives in IBM TS3310, TS3500, and TS4500 tape libraries. Also, IBM TS1140 and IBM TS1150 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Storage System: Host Attachment and Interoperability

2017-01-12 O'Reilly Amazon

book

Markus Oscheka , Bertrand Dufrasne , Detlef Helmbrecht , Roger Eriksson , Jana Jamsek , Bruce Spell

data data-engineering IBM Microsoft

This IBM® Redbooks® publication provides information for attaching the IBM FlashSystem® A9000, IBM FlashSystem A9000R, and IBM XIV® Storage System to various host operating system platforms, such as IBM AIX® and Microsoft Windows. The goal is to give an overview of the versatility and compatibility of the IBM Spectrum™ Accelerate family of storage systems with various platforms and environments. The information that is presented here is not meant as a replacement or substitute for the Host Attachment Kit publications. It is meant as a complement and to provide usage guidance and practical illustrations.

EU GDPR & EU-US Privacy Shield: A Pocket Guide

2017-01-10 O'Reilly Amazon

book

Alan Calder

data data-engineering data-security-privacy eu-general-data-protection-regulation-gdpr eu general data protection regulation (gdpr) Cloud Computing

A concise introduction to EU GDPR and EU-US Privacy Shield

The EU General Data Protection Regulation will unify data protection and simplify the use of personal data across the EU when it comes into force in May 2018.

It will also apply to every organization in the world that processes personal information of EU residents.

US organizations that process EU residents' personal data will be able to comply with the GDPR via the EU-US Privacy Shield (the successor to the Safe Harbor framework), which permits international data transfers of EU data to US organizations that self-certify that they have met a number of requirements.

EU GDPR & EU-US Privacy Shield – A Pocket Guide provides an essential introduction to this new data protection law, explaining the Regulation and setting out the compliance obligations for US organizations in handling data of EU citizens, including guidance on the EU-US Privacy Shield.

Product overview

EU GDPR & EU-US Privacy Shield – A Pocket Guide sets out:

A brief history of data protection and national data protection laws in the EU (such as the UK DPA, German BDSG and French LIL). The terms and definitions used in the GDPR, including explanations. The key requirements of the GDPR, including: Which fines apply to which Articles; The six principles that should be applied to any collection and processing of personal data; The Regulation’s applicability; Data subjects’ rights; Data protection impact assessments (DPIAs); The role of the data protection officer (DPO) and whether you need one; Data breaches, and the notification of supervisory authorities and data subjects; Obligations for international data transfers. How to comply with the Regulation, including: Understanding your data, and where and how it is used (e.g. Cloud suppliers, physical records); The documentation you need to maintain (such as statements of the information you collect and process, records of data subject consent, processes for protecting personal data); The “appropriate technical and organizational measures” you need to take to ensure your compliance with the Regulation. The history and principles of the EU-US Privacy Shield, and an overview of what organizations must do to comply. A full index of the Regulation, enabling you to find relevant Articles quickly and easily.

IBM z Systems Connectivity Handbook SG24-5444 and z Systems Functional Matrix REDP-5157

2017-01-05 O'Reilly Amazon

book

Frank Packheiser , Ewerson Palacio , Bill White , Octavian Lascu

data data-engineering IBM

This IBM® Redbooks® publication describes the connectivity options that are available for use within and beyond the data center for the IBM z Systems® family of mainframes, which includes these systems: IBM z13™ IBM z13s™ IBM zEnterprise® EC12 (zEC12) IBM zEnterprise BC12 (zBC12) IBM zEnterprise 196 (z196) IBM zEnterprise 114 (z114) This book highlights the hardware and software components, functions, typical uses, coexistence, and relative merits of these connectivity features. It helps readers understand the connectivity alternatives that are available when planning and designing their data center infrastructures.

IBM PowerVC Version 1.3.2 Introduction and Configuration

2017-01-04 O'Reilly Amazon

book

Martin Parrella , Javier Bazan Lazcano

data data-engineering IBM Cloud Computing Linux Cyber Security

IBM® Power Virtualization Center (IBM® PowerVC™) is an advanced, enterprise virtualization management offering for IBM Power Systems™. This IBM Redbooks® publication introduces IBM PowerVC and helps you understand its functions, planning, installation, and setup. IBM PowerVC Version 1.3.2 supports both large and small deployments, either by managing IBM PowerVM® that is controlled by the Hardware Management Console (HMC) by IBM PowerVM NovaLink, or by managing PowerKVM directly. With this capability, IBM PowerVC can manage IBM AIX®, IBM i, and Linux workloads that run on IBM POWER® hardware. IBM PowerVC is available as a Standard Edition, or as a Cloud PowerVC Manager edition. IBM PowerVC includes the following features and benefits: Virtual image capture, deployment, and management Policy-based virtual machine (VM) placement to improve use Management of real-time optimization and VM resilience to increase productivity VM Mobility with placement policies to reduce the burden on IT staff in a simple-to-install and easy-to-use graphical user interface (GUI) Role-based security policies to ensure a secure environment for common tasks The ability to enable an administrator to enable Dynamic Resource Optimization on a schedule IBM Cloud PowerVC Manager includes all of the IBM PowerVC Standard Edition features and adds: A Self-service portal that allows the provisioning of new VMs without direct system administrator intervention. There is an option for policy approvals for the requests that are received from the self-service portal. Pre-built deploy templates that are set up by the cloud administrator that simplify the deployment of VMs by the cloud user. Cloud management policies that simplify management of cloud deployments. Metering data that can be used for chargeback. This publication is for experienced users of IBM PowerVM and other virtualization solutions who want to understand and implement the next generation of enterprise virtualization management for Power Systems. Unless stated otherwise, the content of this publication refers to IBM PowerVC Version 1.3.2.

Modeling Human–System Interaction

2017-01-04 O'Reilly Amazon

book

Thomas B. Sheridan

data data-engineering data-models

This book presents theories and models to examine how humans interact with complex automated systems, including both empirical and theoretical methods. Provides examples of models appropriate to the four stages of human-system interaction Examines in detail the philosophical underpinnings and assumptions of modeling Discusses how a model fits into "doing science" and the considerations in garnering evidence and arriving at beliefs for the modeled phenomena Modeling Human-System Interaction is a reference for professionals in industry, academia and government who are researching, designing and implementing human-technology systems in transportation, communication, manufacturing, energy, and health care sectors.

Pro Apache Phoenix: An SQL Driver for HBase, First Edition

2016-12-29 O'Reilly Amazon

book

Ravi Magham , Shakil Akhtar

data data-engineering nosql-databases Apache HBase API Big Data

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space. This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds. Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop. You will learn how to: Handle a petabyte data store by applying familiar SQL techniques Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase Apply best practices while working with a scalable data store on Hadoop and HBase Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis Demonstrate real-time use cases and big data modeling techniques Who This Book Is For Data engineers, Big Data administrators, and architects

Weathering the Storm

2016-12-29 O'Reilly Amazon

book

Javier Villar Burke

data data-engineering streaming-messaging storm

Weathering the Storm explores the factors leading up to the recent global financial and economic crisis, how the crisis unfolded, and the response of European and national authorities. The book describes the rationale behind the measures undertaken to mitigate the consequences of the recession and to ensure that a similar situation does not happen again in the future. In the wake of the crisis, various major changes continue to significantly affect the life and social organization of Europeans. For instance, a new ESM with a size financially comparable to that of the IMF was created; similarly, the reforms in economic governance imply much more intrusive participation of European countries in each other's macroeconomic policies. Moreover, the organization, regulation, and supervision of the financial sector have been drastically revamped. The decisions taken by European and national authorities affect the daily lives of hundreds of millions of European citizens and countless more around the globe. An insightful read for anyone interested in understanding the topic and its effect on their lives, the book primarily addresses undergraduate students in their final year and graduate students in fields such as economics, finance, and political science. The main messages are explained through examples and charts.

Introducing and Implementing IBM FlashSystem V9000

2016-12-28 O'Reilly Amazon

book

Christophe Fagiano , Jon Herd , Detlef Helmbrecht , Carsten Larsen , Renato Santos , Jeffrey Irving , James Thompson , Jana Jamsek

data data-engineering IBM Analytics Cloud Computing Data Management

The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with highly virtualized environments, cloud computing, mobile and social systems of engagement, and in-depth, real-time analytics. Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate as they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today’s data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V7.7 and introduces the recently announced V7.8. It describes the product architecture, software, hardware, and implementation, and provides hints and tips. It illustrates use cases and independent software vendor (ISV) scenarios that demonstrate real-world solutions, and also provides examples of the benefits gained by integrating the IBM FlashSystem storage into business environments. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

Weathering the Storm

2016-12-26 O'Reilly Amazon

book

Javier Villar Burke

data data-engineering streaming-messaging storm

Weathering the Storm explores the factors leading up to the recent global financial and economic crisis, how the crisis unfolded, and the response of European and national authorities. The book describes the rationale behind the measures undertaken to mitigate the consequences of the recession and to ensure that a similar situation does not happen again in the future. In the wake of the crisis, various major changes continue to significantly affect the life and social organization of Europeans. For instance, a new ESM with a size financially comparable to that of the IMF was created; similarly, the reforms in economic governance imply much more intrusive participation of European countries in each other's macroeconomic policies. Moreover, the organization, regulation, and supervision of the financial sector have been drastically revamped. The decisions taken by European and national authorities affect the daily lives of hundreds of millions of European citizens and countless more around the globe. An insightful read for anyone interested in understanding the topic and its effect on their lives, the book primarily addresses undergraduate students in their final year and graduate students in fields such as economics, finance, and political science. The main messages are explained through examples and charts.

Apache Spark for Data Science Cookbook

2016-12-22 O'Reilly Amazon

book

Padma Priya Chitturi

data data-engineering apache-spark AI/ML Analytics Big Data

In "Apache Spark for Data Science Cookbook," you'll delve into solving real-world analytical challenges using the robust Apache Spark framework. This book features hands-on recipes that cover data analysis, distributed machine learning, and real-time data processing. You'll gain practical skills to process, visualize, and extract insights from large datasets efficiently. What this Book will help me do Master using Apache Spark for processing and analyzing large-scale datasets effectively. Harness Spark's MLLib for implementing machine learning algorithms like classification and clustering. Utilize libraries such as NumPy, SciPy, and Pandas in conjunction with Spark for numerical computations. Apply techniques like Natural Language Processing and text mining using Spark-integrated tools. Perform end-to-end data science workflows, including data exploration, modeling, and visualization. Author(s) Nagamallikarjuna Inelu and None Chitturi bring their extensive experience working with data science and distributed computing frameworks like Apache Spark. Nagamallikarjuna specializes in applying machine learning algorithms to big data problems, while None has contributed to various big data system implementations. Together, they focus on providing practitioners with practical and efficient solutions. Who is it for? This book is primarily intended for novice and intermediate data scientists and analysts who are curious about using Apache Spark to tackle data science problems. Readers are expected to have some familiarity with basic data science tasks. If you want to learn practical applications of Spark in data analysis and enhance your big data analytics skills, this resource is for you.

Fast Data Processing Systems with SMACK Stack

2016-12-22 O'Reilly Amazon

book

Raúl Estrada

data data-engineering smack-stack Big Data Cassandra Kafka

Fast Data Processing Systems with SMACK Stack introduces you to the SMACK stack-a combination of Spark, Mesos, Akka, Cassandra, and Kafka. You will learn to integrate these technologies to build scalable, efficient, and real-time data processing platforms tailored for solving critical business challenges. What this Book will help me do Understand the concepts of fast data pipelines and design scalable architectures using the SMACK stack Gain expertise in functional programming with Scala and leverage its power in data processing tasks Build and optimize distributed databases using Apache Cassandra for scaling extensively Deploy and manage real-time data streams using Apache Kafka to handle massive messaging workloads Implement cost-effective cluster infrastructures with Apache Mesos for efficient resource utilization Author(s) None Estrada is an expert in distributed systems and big data technologies. With years of experience implementing SMACK-based solutions across industries, Estrada offers a practical viewpoint to designing scalable systems. Their blend of theoretical knowledge and applied practices ensures readers receive actionable guidance. Who is it for? This book is perfect for software developers, data engineers, or data scientists looking to deepen their understanding of real-time data processing systems. If you have a foundational knowledge of the technologies in the SMACK stack or wish to learn how to combine these cutting-edge tools to solve complex problems, this is for you. Readers with an interest in building efficient big data solutions will find tremendous value here.

MOS 2016 Study Guide for Microsoft Access

2016-12-22 O'Reilly Amazon

book

John Pierce

data data-engineering database-management-tools microsoft-access Microsoft

Advance your everyday proficiency with Access 2016. And earn the credential that proves it! Demonstrate your expertise with Microsoft Access! Designed to help you practice and prepare for Microsoft Office Specialist (MOS): Access 2016 certification, this official Study Guide delivers: • In-depth preparation for each MOS objective • Detailed procedures to help build the skills measured by the exam • Hands-on tasks to practice what you’ve learned • Practice files and sample solutions Sharpen the skills measured by these objectives: • Create and manage databases • Build tables • Create queries • Create forms • Create reports

IBM Business Process Manager Operations Guide

2016-12-16 O'Reilly Amazon

book

Bryan Brown , Weiming Gu , Chris Richardson , Karri S Carlson-Neumann , Dave Spriet , Shuo Zhang , Mark Filley

data data-engineering IBM Cloud Computing DevOps

This IBM® Redbooks® publication provides operations teams with architectural design patterns and guidelines for the day-to-day challenges that they face when managing their IBM IBM Business Process Manager (BPM) infrastructure. Today, IBM BPM L2 and L3 Support and SWAT teams are constantly advising customers how to deal with the following common challenges: Deployment options (on-premises, patterns, cloud, and so on) Administration DevOps Automation Performance monitoring and tuning Infrastructure management Scalability High Availability and Data Recovery Federation This publication enables customers to become self-sufficient, promote consistency and accelerate IBM BPM Support engagements. This IBM Redbooks publication is targeted toward technical professionals (technical support staff, IT Architects, and IT Specialists) who are responsible for meeting day-to-day challenges that they face when they are managing an IBM BPM infrastructure.

Mastering RethinkDB

2016-12-16 O'Reilly Amazon

book

Shahid Shaikh

data data-engineering nosql-databases rethinkdb Docker ELK

Mastering RethinkDB offers a comprehensive guide to using the open-source, scalable database RethinkDB for real-time application development. Throughout this book, you'll gain practical knowledge on query management with ReQL, build dynamic web apps, and perform advanced database administration tasks. What this Book will help me do Gain expertise in managing and configuring RethinkDB clusters for optimal performance in real-time applications. Develop robust web applications using RethinkDB and integrate them seamlessly with Node.js. Leverage advanced querying features of ReQL, including geospatial and time-series queries. Enhance RethinkDB's capabilities with integration techniques for third-party libraries like ElasticSearch. Master deployment practices using platforms such as Docker and PaaS for production-grade applications. Author(s) None Shaikh, an expert in database technologies and real-time system design, brings years of hands-on experience working with open-source databases like RethinkDB. Known for writing practical technical books, None emphasizes real-world applications and clarity to help both novice and seasoned developers excel. Who is it for? This book is ideal for developers who are building real-time applications and want to adopt RethinkDB for their solutions. Readers should have a basic understanding of RethinkDB and Node.js to get the most benefit. It's particularly suited for programmers looking to deepen their database administration skills and enhance their real-time data handling expertise.

Building Web Apps that Respect a User's Privacy and Security

2016-12-15 O'Reilly Amazon

book

Adam D. Scott

data data-engineering data-security-privacy data security & privacy Cyber Security

A recent survey from the Pew Research Center found that few Americans are confident about the security or privacy of their data—particularly when it comes to the use of online tools. As a web developer, you represent the first line of defense in protecting your user’s data and privacy. This report explores several techniques, tools, and best practices for developing and maintaining web apps that provide the privacy and security that every user needs—and deserves. Each individual now produces more data every day than people in earlier generations did throughout their lifetimes. Every time we click, tweet, or visit a site, we leave a digital trace. As web developers, we’re responsible for shaping the experiences of users’ online lives. By making ethical, user-centered choices, we can create a better Web for everyone. Learn how web tracking works, and how you can provide users with greater privacy controls Explore HTTPS and learn how to use this protocol to encrypt user connections Use web development frameworks that provide baked-in security support for protecting user data Learn methods for securing user authentication, and for sanitizing and validating user input Provide exports that allow users to reclaim their data if and when you close your service This is the third report in the Ethical Web Development series from author Adam Scott. Previous reports in this series include Building Web Apps for Everyone and Building Web Apps That Work Everywhere.

Data modeling with Cassandra

2016-12-15 O'Reilly Amazon

book

Eben Hewitt , Jeff Carpenter

data data-engineering nosql-databases Cassandra Data Modelling

In this lesson, you’ll learn how to design data models for Cassandra, including a data modeling process and notation. To apply this knowledge, we’ll design the data model for a sample application. This will help show how all the parts fit together. Along the way, we’ll use a tool to help us manage our CQL (Cassandra Query Language) scripts. What you’ll learn—and how you can apply it You will learn common patterns and antipatterns for data modeling in Cassandra. This lesson will cover the concepts around data modeling and will compare a Cassandra data model with an equivalent relational database model. You’ll learn about defining queries and about logical and physical database modeling. You’ll learn how to optimize your model for performance, and finally you’ll learn how to implement your model schema using CQL. This lesson is for you because… You are an application developer or architect who wants to learn how data is stored and processed in Cassandra. You are a database administrator who wants to learn about Cassandra. Prerequisites Helpful but not essential to have a basic understanding of relational vs. distributed databases. Helpful but not essential to understand Cassandra Query Language, CQL. Materials or downloads needed in advance None

Determining the right model for your experience

2016-12-15 O'Reilly Amazon

book

Erin Malone , Christian Crumlish

data data-engineering data-models

Inherent in creating a social layer into your experience is some form of relationships between people. There are different models, each of which create different kinds of social interactions and outcomes within an experience. What you'll learn—and how you can apply it This lesson reviews the different types of relationship models and shows you how to assess your specific goals to determine which model might be the right fit for your product or needs and what supporting tools are appropriate to create a rich relationship framework. Prerequisites You want to create or enhance a product with a social layer. This Lesson is taken from , 2nd Edition, by Erin Malone and Christian Crumlish. Designing Social Interfaces

Optimizing Cassandra performance

2016-12-15 O'Reilly Amazon

book

Eben Hewitt , Jeff Carpenter

data data-engineering nosql-databases Cassandra Data Modelling

In this lesson, we look at how to tune Cassandra to improve performance. There are a variety of settings in the configuration file and on individual tables. Although the default settings are appropriate for many use cases, there might be circumstances in which you need to change them. We’ll look at how and why to make these changes. We also see how to use the cassandra-stress test tool that ships with Cassandra to generate load against Cassandra and quickly see how it behaves under stress test circumstances. We can then tune Cassandra appropriately and feel confident that we’re ready to deploy to a production environment. What you’ll learn—and how you can apply it You’ll learn how to monitor and analyze Cassandra performance. You’ll learn about Cassandra features such as caching, memtables, commit logs, SStables, hinted handoff, compaction, and threading to improve responsiveness, consistency, and speed and reduce data loss. We’ll also look at timeout properties and JVM settings. This lesson is for you because… You are a developer, database administrator, or architect who wants to learn how to tune Cassandra. Prerequisites Understanding of Cassandra architecture and data model. If you want to run cassandra-stress Cassandra installed with a running Cassandra cluster. Materials or downloads needed A Cassandra cluster if you want to run cassandra-stress

IBM DB2 12 for z/OS Technical Overview

2016-12-13 O'Reilly Amazon

book

Acacio Ricardo Gomes Pessoa , Tammie Dang , Meg Bernal

data data-engineering relational-databases ibm-db2 Agile/Scrum CI/CD

IBM® DB2® 12 for z/OS® delivers key innovations that increase availability, reliability, scalability, and security for your business-critical information. In addition, DB2 12 for z/OS offers performance and functional improvements for both transactional and analytical workloads and makes installation and migration simpler and faster. DB2 12 for z/OS also allows you to develop applications for the cloud and mobile devices by providing self-provisioning, multitenancy, and self-managing capabilities in an agile development environment. DB2 12 for z/OS is also the first version of DB2 built for continuous delivery. This IBM Redbooks® publication introduces the enhancements made available with DB2 12 for z/OS. The contents help database administrators to understand the new functions and performance enhancements, to plan for ways to use the key new capabilities, and to justify the investment in installing or migrating to DB2 12.

Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale

2016-12-12 O'Reilly Amazon

book

Casey Stella , Douglas Eadline , Ofer Mendelevitch

data data-engineering Hadoop AI/ML Analytics Big Data

The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials. Practical Data Science with Hadoop® and Spark The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization. Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP). This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives. Learn What data science is, how it has evolved, and how to plan a data science career How data volume, variety, and velocity shape data science use cases Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark Data importation with Hive and Spark Data quality, preprocessing, preparation, and modeling Visualization: surfacing insights from huge data sets Machine learning: classification, regression, clustering, and anomaly detection Algorithms and Hadoop tools for predictive modeling Cluster analysis and similarity functions Large-scale anomaly detection NLP: applying data science to human language

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Tabular Modeling with SQL Server 2016 Analysis Services Cookbook

IBM DS8880 Architecture and Implementation (Release 8.2.1)

IBM TS7700 Release 4.0 Guide

Exam Ref 70-762 Developing SQL Databases

IBM Spectrum Archive Enterprise Edition V1.2.2: Installation and Configuration Guide

IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Storage System: Host Attachment and Interoperability

EU GDPR & EU-US Privacy Shield: A Pocket Guide

IBM z Systems Connectivity Handbook SG24-5444 and z Systems Functional Matrix REDP-5157

IBM PowerVC Version 1.3.2 Introduction and Configuration

Modeling Human–System Interaction

Pro Apache Phoenix: An SQL Driver for HBase, First Edition

Weathering the Storm

Introducing and Implementing IBM FlashSystem V9000

Weathering the Storm

Apache Spark for Data Science Cookbook

Fast Data Processing Systems with SMACK Stack

MOS 2016 Study Guide for Microsoft Access

IBM Business Process Manager Operations Guide

Mastering RethinkDB

Building Web Apps that Respect a User's Privacy and Security

Data modeling with Cassandra

Determining the right model for your experience

Optimizing Cassandra performance

IBM DB2 12 for z/OS Technical Overview

Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale