O'Reilly Data Engineering Books

IBM Storage Solutions for SAP Applications Version 1.3

2020-02-11 O'Reilly Amazon

book

IBM

data data-engineering SAP IBM Linux

This paper is intended as an architecture and configuration guide to set up the IBM® System Storage® for the SAP HANA tailored data center integration (SAP HANA TDI) within a storage area network (SAN) environment. SAP HANA TDI allows the SAP customer to attach external storage to the SAP HANA server. The paper also describes the setup and configuration of SAP Landscape Management for SAP HANA systems on IBM infrastructure components: IBM Power Systems™ and IBM Storage based on IBM Spectrum™ Virtualize. This document is written for IT technical specialists and architects with advanced skill levels on SUSE Linux Enterprise Server (SLES) or Red Hat Enterprise Linux (RHEL) and IBM System Storage. This document provides the necessary information to select, verify, and connect IBM System Storage to the SAP HANA server through a Fibre Channel-based SAN. The recommendations in this Blueprint apply to single-node and scale-out configurations, and Intel and IBM Power based SAP HANA systems.

Temenos on IBM LinuxONE Best Practices Guide

2020-02-11 O'Reilly Amazon

book

Jonathan Page , Vic Cross , Ernest Horn , Deana Coble , Robert Schulz , Colin Page , John Smith , Chris Vogan

data data-engineering IBM

The world's most successful banks run on IBM®, and increasingly IBM LinuxONE. Temenos, the global leader in banking software, has worked alongside IBM for many years on banking deployments of all sizes. This book marks an important milestone in that partnership. Temenos on IBM LinuxONE Best Practices Guide shows financial organizations how they can combine the power and flexibility of the Temenos solution with the IBM platform that is purpose built for the digital revolution.

Mastering Large Datasets with Python

2020-01-27 O'Reilly Amazon

book

John Wolohan

data data-engineering AI/ML AWS Cloud Computing Data Science

Modern data science solutions need to be clean, easy to read, and scalable. In Mastering Large Datasets with Python, author J.T. Wolohan teaches you how to take a small project and scale it up using a functionally influenced approach to Python coding. You’ll explore methods and built-in Python tools that lend themselves to clarity and scalability, like the high-performing parallelism method, as well as distributed technologies that allow for high data throughput. The abundant hands-on exercises in this practical tutorial will lock in these essential skills for any large-scale data science project. About the Technology Programming techniques that work well on laptop-sized data can slow to a crawl—or fail altogether—when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change. About the Book Mastering Large Datasets with Python teaches you to write code that can handle datasets of any size. You’ll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You’ll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firmly in place, you’ll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3. What's Inside An introduction to the map and reduce paradigm Parallelization with the multiprocessing module and pathos framework Hadoop and Spark for distributed computing Running AWS jobs to process large datasets About the Reader For Python programmers who need to work faster with more data. About the Author J. T. Wolohan is a lead data scientist at Booz Allen Hamilton, and a PhD researcher at Indiana University, Bloomington. Quotes A clear and efficient path to mastery of the map and reduce paradigm for developers of all levels. - Justin Fister, GrammarBot An amazing book for anybody looking to add parallel processing and the map/reduce pattern to their toolkit. - Gary Bake, Radius Payment Solutions Learn fundamentals of MapReduce and other core concepts and save money on expensive hardware! - Al Krinker, USPTO A comprehensive guide to the fundamentals of efficient Python data processing. - Craig Pfeifer, MITRE Corporation

IBM TS4500 R6 Tape Library Guide

2020-01-22 O'Reilly Amazon

book

Jesus Eduardo Cervantes Rolon , Larry Coyne , Robert Beiderbeck , Khanh Ngo , Jeremy Tudgay

data data-engineering IBM Cloud Computing ELK Cyber Security

The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 11 petabytes (PB) of uncompressed data in a single frame library or scale up to 2 PB per square foot to over 350 PB. The TS4500 offers the following benefits: High availability: Dual active accessors with integrated service bays reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to another 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for IBM TS1160 while also supporting TS1155, TS1150, and TS1140 tape drive: The TS1160 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1160 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The TS1160 Tape Drive Model 60E delivers a dual 10 Gb or 25 Gb Ethernet host attachment interface that is optimized for cloud-based and hyperscale environments. The TS1160 Tape Drive Model 60F delivers a native data rate of 400 MBps, the same load/ready, locate speeds, and access times as the TS1155, and includes dual-port 16 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 8 tape drive: The LTO Ultrium 8 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 7, while still protecting your investment in the previous technology. Support of LTO 8 Type M cartridge (M8): The LTO Program is introducing a new capability with LTO-8 drives. The ability of the LTO-8 drive to write 9 TB on a brand new LTO-7 cartridge instead of 6 TB as specified by the LTO-7 format. Such a cartridge is called an LTO-7 initialized LTO-8 Type M cartridge. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), command-line interface (CLI), and REST over SCSI (RoS) to obtain status information about library components. You learn how to accomplish the following specific tasks:: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 3

2020-01-21 O'Reilly Amazon

book

IBM

data data-engineering IBM Cloud Computing DevOps Cyber Security

IBM Storage for Red Hat OpenShift is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift environment. This blueprint includes Red Hat OpenShift Container Platform and uses Container Storage Interface (CSI) standards. IBM Storage brings enterprise data services to containers. In this blueprint, learn how to: · Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! · Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform utilizing new open source Container Storage interface (CSI) drivers · Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform is designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

Refactoring Legacy T-SQL for Improved Performance: Modern Practices for SQL Server Applications

2020-01-10 O'Reilly Amazon

book

Lisa Bohm

data data-engineering relational-databases microsoft-sql-server transact-sql SQL

Breathe new life into older applications by refactoring T-SQL queries and code using modern techniques. This book shows you how to significantly improve the performance of older applications by finding common anti-patterns in T-SQL code, then rewriting those anti-patterns using new functionality that is supported in current versions of SQL Server, including SQL Server 2019. The focus moves through the different types of database objects and the code used to create them, discussing the limitations and anti-patterns commonly found for each object type in your database. Legacy code isn’t just found in queries and external applications. It’s also found in the definitions of underlying database objects such as views and tables. This book helps you quickly find problematic code throughout the database and points out where and how modern solutions can replace older code, thereby making your legacy applications run faster and extending their lifetimes. Author Lisa Bohm explains the logic behind each anti-pattern, helping you understand why each pattern is a problem and showing how it can be avoided. Good coding habits are discussed, including guidance on topics such as readability and maintainability. What You Will Learn Find specific areas in code to target for performance gains Identify pain points quickly and understand why they are problematic Rewrite legacy T-SQL to reduce or eliminate hidden performance issues Write modern code with an awareness of readability and maintainability Recognize and correlate T-SQL anti-patterns with techniques for better solutions Make a positive impact on application user experience in your organization Who This Book Is For Database administrators or developers who maintain older code, those frustrated with complaints about slow codewhen there is so much of it to fix, and those who want a head start in making a positive impact on application user experience in their organization

The SQL Workshop

2019-12-30 O'Reilly Amazon

book

Frank Solomon , Dixit Patel , Prashanth Jayaram , Shashikant Shakya , Fiodar Sazanavets , Awni Al Saqqa , Aaditya Pokkunuri , Scott Cosentino , Pradeep Kumar Gupta , Shubham Jain , Rakesh Kumar Pandey

data data-engineering SQL RDBMS Cyber Security

The SQL Workshop is your go-to guide for delving into the essential techniques and best practices of working with SQL. You'll start with the basics of querying and database management, progressing to advanced concepts like joins, normalization, and database security. What this Book will help me do Construct and maintain relational databases that meet real-world requirements. Perform CRUD operations efficiently using SQL queries. Design effective and optimized database schemas through normalization. Secure and safeguard data with access controls and privilege management. Leverage SQL for data analysis and reporting through advanced query techniques. Author(s) Frank Solomon, Prashanth Jayaram, and Awni Al Saqqa bring together decades of practical and academic experience in SQL and database management. Their informative and hands-on approach helps readers bridge the gap between theoretical concepts and practical applications. Who is it for? Written for newcomers and intermediate learners, this book is ideal for aspiring software developers, data scientists, and database managers looking to advance their SQL skills. Beginners with no database experience will find this book's gradual learning curve approachable.

Apache Pulsar Versus Apache Kafka

2019-12-25 O'Reilly Amazon

book

Chris Bartholomew

data data-engineering apache-pulsar Cloud Computing Kafka Kubernetes

For nearly a decade, Apache Kafka has been the go-to publish-subscribe (pub-sub) messaging system—and for good reason. It offers functionality for a wide range of enterprise use cases, along with a large ecosystem of tools and a dedicated community. But lately, upstart Apache Pulsar has been gaining ground. This detailed report explains why. Apache Pulsar takes the best parts of Kafka and expands on them to solve problems that were out of scope of Kafka’s original design. Author Chris Bartholomew shows you how Kafka and Pulsar compare and where they differ. Engineers and other technical decision makers will learn the advantages that make Pulsar a compelling alternative to Kafka. Explore the architecture and major components of Kafka and Pulsar Discover the benefits of Pulsar’s subscription model for messaging Understand how Pulsar simplifies the messaging system for organizations that need high performance pub-sub messaging, delivery guarantees, and traditional messaging patterns Learn how Pulsar’s separation of serving and storing makes it natural to run in cloud native environments like Kubernetes See how Kafka and Pulsar perform on the OpenMessage Project benchmark

The Rise of Operational Analytics

2019-12-25 O'Reilly Amazon

book

Scott Haines

data data-engineering AI/ML Analytics Cloud Computing Kafka

Fast access to data has become a critical game changer. Today, a new breed of company understands that the faster they can build, access, and share well-defined datasets, the more competitive they’ll be in our data-driven world. In this practical report, Scott Haines from Twilio introduces you to operational analytics, a new approach for making sense of all the data flooding into business systems. Data architects and data scientists will see how Apache Kafka and other tools and processes laid the groundwork for fast analytics on a mix of historical and near-real-time data. You’ll learn how operational analytics feeds minute-by-minute customer interactions, and how NewSQL databases have entered the scene to drive machine learning algorithms, AI programs, and ongoing decision-making within an organization. Understand the key advantages that data-driven companies have over traditional businesses Explore the rise of operational analytics—and how this method relates to current tech trends Examine the impact of can’t wait business decisions and won’t wait customer experiences Discover how NewSQL databases support cloud native architecture and set the stage for operational databases Learn how to choose the right database to support operational analytics in your organization

What Is Data Engineering?

2019-12-25 O'Reilly Amazon

book

Lewis Gavin

data data-engineering Data Engineering DWH

The demand for data scientists is well-known, but when it comes time to build solutions based on data, your company also needs data engineers—people with strong data warehousing and programming backgrounds. In fact, whether you’re powering self-driving cars or creating music playlists, this field has emerged as one of the most important in modern business. In this report, Lewis Gavin explores key aspects of data engineering and presents a case study from Spotify that demonstrates the tremendous value of this role.

GDPR For Dummies

2019-12-24 O'Reilly Amazon

book

Suzanne Dibble

data data-engineering data-security-privacy eu-general-data-protection-regulation-gdpr eu general data protection regulation (gdpr) GDPR/CCPA

Don’t be afraid of the GDPR wolf! How can your business easily comply with the new data protection and privacy laws and avoid fines of up to $27M? GDPR For Dummies sets out in simple steps how small business owners can comply with the complex General Data Protection Regulations (GDPR). These regulations apply to all businesses established in the EU and to businesses established outside of the EU insofar as they process personal data about people within the EU. Inside, you’ll discover how GDPR applies to your business in the context of marketing, employment, providing your services, and using service providers. Learn how to avoid fines, regulatory investigations, customer complaints, and brand damage, while gaining a competitive advantage and increasing customer loyalty by putting privacy at the heart of your business. Find out what constitutes personal data and special category data Gain consent for online and offline marketing Put your Privacy Policy in place Report a data breach before being fined 79% of U.S. businesses haven’t figured out how they’ll report breaches in a timely fashion, provide customers the right to be forgotten, conduct privacy impact assessments, and more. If you are one of those businesses that hasn't put a plan in place, then GDPR For Dummies is for you.

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

2019-12-20 O'Reilly Amazon

book

Donna Strok , Dmitry Shirokov , Dmitry Anoshin

data data-engineering Snowflake Analytics AWS Azure

Explore the modern market of data analytics platforms and the benefits of using Snowflake computing, the data warehouse built for the cloud. With the rise of cloud technologies, organizations prefer to deploy their analytics using cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Cloud vendors are offering modern data platforms for building cloud analytics solutions to collect data and consolidate into single storage solutions that provide insights for business users. The core of any analytics framework is the data warehouse, and previously customers did not have many choices of platform to use. Snowflake was built specifically for the cloud and it is a true game changer for the analytics market. This book will help onboard you to Snowflake, present best practices to deploy, and use the Snowflake data warehouse. In addition, it covers modern analytics architecture and use cases. It provides use cases of integration with leading analytics software such as Matillion ETL, Tableau, and Databricks. Finally, it covers migration scenarios for on-premise legacy data warehouses. What You Will Learn Know the key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Who This Book Is For Those working with data warehouse and business intelligence (BI) technologies, and existing and potential Snowflake users

PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond

2019-12-20 O'Reilly Amazon

book

Kevin Feasel

data data-engineering apache-spark Azure Cosmos ETL/ELT

Harness the power of PolyBase data virtualization software to make data from a variety of sources easily accessible through SQL queries while using the T-SQL skills you already know and have mastered. PolyBase Revealed shows you how to use the PolyBase feature of SQL Server 2019 to integrate SQL Server with Azure Blob Storage, Apache Hadoop, other SQL Server instances, Oracle, Cosmos DB, Apache Spark, and more. You will learn how PolyBase can help you reduce storage and other costs by avoiding the need for ETL processes that duplicate data in order to make it accessible from one source. PolyBase makes SQL Server into that one source, and T-SQL is your golden ticket. The book also covers PolyBase scale-out clusters, allowing you to distribute PolyBase queries among several SQL Server instances, thus improving performance. With great flexibility comes great complexity, and this book shows you where to look when queries fail, complete with coverageof internals, troubleshooting techniques, and where to find more information on obscure cross-platform errors. Data virtualization is a key target for Microsoft with SQL Server 2019. This book will help you keep your skills current, remain relevant, and build new business and career opportunities around Microsoft’s product direction. What You Will Learn Install and configure PolyBase as a stand-alone service, or unlock its capabilities with a scale-out cluster Understand how PolyBase interacts with outside data sources while presenting their data as regular SQL Server tables Write queries combining data from SQL Server, Apache Hadoop, Oracle, Cosmos DB, Apache Spark, and more Troubleshoot PolyBase queries using SQL Server Dynamic Management Views Tune PolyBase queries using statistics and execution plans Solve common business problems, including "cold storage" of infrequentlyaccessed data and simplifying ETL jobs Who This Book Is For SQL Server developers working in multi-platform environments who want one easy way of communicating with, and collecting data from, all of these sources

IBM Power System L922 Technical Overview and Introduction

2019-12-19 O'Reilly Amazon

book

Gareth Coates , Scott Vetter , Young Hoon Cho , Volker Haug , Bartlomiej Grabowski

data data-engineering IBM Analytics Linux Marketing

This IBM® Redpaper™ publication is a comprehensive guide covering the IBM Power System L922 (9008-22L) server, which was designed for data-intensive workloads such as databases and analytics in the Linux operating system. The objective of this paper is to introduce the major innovative Power L922 offering and its relevant functions: The new IBM POWER9™ processor, available at frequencies of 2.7 - 3.8 GHz, 2.9 - 3.8 GHz, and 3.4 - 3.9 GHz. Significantly strengthened cores and larger caches. Two integrated memory controllers that allow double the memory footprint of IBM POWER8® processor-based servers. An integrated I/O subsystem and hot-pluggable Peripheral Component Interconnect Express (PCIe) Gen4 and Gen3 I/O slots. I/O drawer expansion options offer greater flexibility. Support for Coherent Accelerator Processor Interface (CAPI) 2.0. New feature IBM EnergyScale™ technology provides new variable processor frequency modes that provide a significant performance boost beyond the static nominal frequency. This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power L922 system. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 2

2019-12-17 O'Reilly Amazon

book

IBM

data data-engineering IBM Cloud Computing DevOps Cyber Security

IBM Storage for Red Hat OpenShift is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift environment. This blueprint includes Red Hat OpenShift Container Platform and uses Container Storage Interface (CSI) standards. IBM Storage brings enterprise data services to containers. In this blueprint, learn how to: · Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! · Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform utilizing new open source Container Storage interface (CSI) drivers · Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform is designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

IBM Storage Solutions for Splunk Enterprise

2019-12-17 O'Reilly Amazon

book

IBM

data data-engineering IBM ELK Splunk

This document is intended to facilitate the deployment of the Splunk Enterprise Solutions using IBM All Flash Array systems for the Hot and Warm tiers, and IBM Elastic Storage System for the Cold and Frozen tiers. This document provides the reference architecture and configuration guidelines for the IBM Storage systems. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Storage Systems are supported, entitled and where the issues are specific to a blueprint implementation.

Hands On Google Cloud SQL and Cloud Spanner: Deployment, Administration and Use Cases with Python

2019-12-16 O'Reilly Amazon

book

Shakuntala Gupta Edward , Navin Sabharwal

data data-engineering relational-databases google-cloud-sql Big Data Cloud Computing

Discover the methodologies and best practices for getting started with Google Cloud Platform relational services – CloudSQL and CloudSpanner. The book begins with the basics of working with the Google Cloud Platform along with an introduction to the database technologies available for developers from Google Cloud. You'll then take an in-depth hands on journey into Google CloudSQL and CloudSpanner, including choosing the right platform for your application needs, planning, provisioning, designing and developing your application. Sample applications are given that use Python to connect to CloudSQL and CloudSpanner, along with helpful features provided by the engines. You''ll also implement practical best practices in the last chapter. Hands On Google Cloud SQL and Cloud Spanner is a great starting point to apply GCP data offerings in your technology stack and the code used allows you to try out the examples and extend them in interestingways. What You'll Learn Get started with Big Data technologies on the Google Cloud Platform Review CloudSQL and Cloud Spanner from basics to administration Apply best practices and use Google’s CloudSQL and CloudSpanner offering Work with code in Python notebooks and scripts Who This Book Is For Application architects, database architects, software developers, data engineers, cloud architects.

IBM Db2 Mirror for i Getting Started

2019-12-13 O'Reilly Amazon

book

Scott Vetter , Ingo Dimmer , Jean-Luc Bonhommet

data data-engineering relational-databases ibm-db2 IBM

IBM® Db2® Mirror for i provides a new solution for continuous availability for an IBM i environment based on an active-active clustering design that uses a low-latency communication protocol for synchronous database replication. With Db2 Mirror, IBM i customers can benefit from continuous application availability for both planned and unplanned outages. Db2 Mirror can help reduce or eliminate application downtime for regular maintenance operations such as program temporary fix (PTF) installations, operating system (OS) upgrades, or for planned server outages. This IBM Redpaper publication provides a broad overview and understanding of this new solution by covering its architecture, positioning, planning, and implementation aspects. It provides an introduction reference for a seller or technical specialist audience to become familiar with the new Db2 Mirror solution.

MongoDB Recipes: With Data Modeling and Query Building Strategies

2019-12-13 O'Reilly Amazon

book

Dharanitharan Ganesan , Subhashini Chellappan

data data-engineering nosql-databases MongoDB Data Modelling

Get the most out of MongoDB using a problem-solution approach. This book starts with recipes on the MongoDB query language, including how to query various data structures stored within documents. These self-contained code examples allow you to solve your MongoDB problems without fuss. MongoDB Recipes describes how to use advanced querying in MongoDB, such as indexing and the aggregation framework. It demonstrates how to use the Compass function, a GUI client interacting with MongoDB, and how to apply data modeling to your MongoDB application. You’ll see recipes on the latest features of MongoDB 4 allowing you to manage data in an efficient manner using MongoDB. What You Will Learn Work with the MongoDB document model Design MongoDB schemas Use the MongoDB query language Harness the aggregation framework Create replica sets and sharding in MongoDB Who This Book Is For Developers and professionals who work with MongoDB.

Database Design and Relational Theory: Normal Forms and All That Jazz

2019-12-12 O'Reilly Amazon

book

C.J. Date

data data-engineering relational-databases

Create database designs that scale, meet business requirements, and inherently work toward keeping your data structured and usable in the face of changing business models and software systems. This book is about database design theory. Design theory is the scientific foundation for database design, just as the relational model is the scientific foundation for database technology in general. Databases lie at the heart of so much of what we do in the computing world that negative impacts of poor design can be extraordinarily widespread. This second edition includes greatly expanded coverage of exotic and little understood normal forms such as: essential tuple normal form (ETNF), redundancy free normal form (RFNF), superkey normal form (SKNF), sixth normal form (6NF), and domain key normal form (DKNF). Also included are new appendixes, including one that provides an in-depth look into the crucial notion of data consistency.Sequencing of topics has been improved, and many explanations and examples have been rewritten and clarified based upon the author’s teaching of the content in instructor-led courses. This book aims to be different from other books on design by bridging the gap between the theory of design and the practice of design. The book explains theory in a way that practitioners should be able to understand, and it explains why that theory is of considerable practical importance. Reading this book provides you with an important theoretical grounding on which to do the practical work of database design. Reading the book also helps you in going to and understanding the more academic texts as you build your base of knowledge and expertise. Anyone with a professional interest in database design can benefit from using this book as a stepping-stone toward a more rigorous design approach and more lasting database models. What You Will Learn Understand what design theory is and is not Be aware of the two different goals of normalization Know which normal forms are truly significant Apply design theory in practice Be familiar with techniques for dealing with redundancy Understand what consistency is and why it is crucially important Who This Book Is For Those having a professional interest in database design, including data and database administrators; educators and students specializing in database matters; information modelers and database designers; DBMS designers, implementers, and other database vendor personnel; and database consultants. The book is product independent.

Information Privacy Engineering and Privacy by Design: Understanding Privacy Threats, Technology, and Regulations Based on Standards and Best Practices

2019-12-12 O'Reilly Amazon

book

William Stallings

data data-engineering data-security-privacy data security & privacy Cloud Computing GDPR/CCPA

The Comprehensive Guide to Engineering and Implementing Privacy Best Practices As systems grow more complex and cybersecurity attacks more relentless, safeguarding privacy is ever more challenging. Organizations are increasingly responding in two ways, and both are mandated by key standards such as GDPR and ISO/IEC 27701:2019. The first approach, privacy by design, aims to embed privacy throughout the design and architecture of IT systems and business practices. The second, privacy engineering, encompasses the technical capabilities and management processes needed to implement, deploy, and operate privacy features and controls in working systems. In Information Privacy Engineering and Privacy by Design, internationally renowned IT consultant and author William Stallings brings together the comprehensive knowledge privacy executives and engineers need to apply both approaches. Using the techniques he presents, IT leaders and technical professionals can systematically anticipate and respond to a wide spectrum of privacy requirements, threats, and vulnerabilities–addressing regulations, contractual commitments, organizational policies, and the expectations of their key stakeholders. • Review privacy-related essentials of information security and cryptography • Understand the concepts of privacy by design and privacy engineering • Use modern system access controls and security countermeasures to partially satisfy privacy requirements • Enforce database privacy via anonymization and de-identification • Prevent data losses and breaches • Address privacy issues related to cloud computing and IoT • Establish effective information privacy management, from governance and culture to audits and impact assessment • Respond to key privacy rules including GDPR, U.S. federal law, and the California Consumer Privacy Act This guide will be an indispensable resource for anyone with privacy responsibilities in any organization, and for all students studying the privacy aspects of cybersecurity.

IBM Power Systems LC921 and LC922: Technical Overview and Introduction

2019-12-10 O'Reilly Amazon

book

Gustavo Santos , Scott Vetter , Ritesh Nohria , Volker Haug

data data-engineering IBM Linux Marketing SAS

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power Systems™ LC921 and LC922 (9006-12P and 9006-22P)) servers that use the current IBM POWER9™ processor-based technology and supports Linux operating systems (OSes). The objective of this paper is to introduce the offerings and their capacities and available features. These new Linux scale-out systems provide differentiated performance, scalability, and low acquisition cost, and include the following features: Superior throughput and performance for high-value Linux workloads. Low acquisition cost through system optimization (industry-standard memory and industry-standard three-year warranty). Rich I/O options in the system unit. There are 12 large form factor (LFF)/small form factor (SFF) bays for 12 SAS/SATA hard disk drives (HDDs) or solid-state drives (SSDs), and four bays that are available for Non-Volatile Memory Express (NVMe) Gen3 adapters. Includes Trusted Platform Module (TPM) 2.0 Nuvoton NPCT650ABAWX through I2C (for secure boot and trusted boot). Integrated MicroSemi PM8069 SAS/SATA 16-port Internal Storage Controller Peripheral Component Interconnect Express (PCIe) 3.0 x8 with RAID 0, 1, 5, and 10 support (no write cache). Integrated Intel XL710 Quad Port 10 GBase-T PCIe 3.0 x8 UIO built-in local area network (LAN) (one shared management port). Dedicated 1 Gb Intelligent Platform Management Interface (IPMI) port. This publication is for professionals who want to acquire a better understanding of IBM Power Systems products. The intended audience includes: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs)

MongoDB: The Definitive Guide, 3rd Edition

2019-12-10 O'Reilly Amazon

book

Eoin Brazil , Shannon Bradshaw , Kristina Chodorow

data data-engineering nosql-databases MongoDB NoSQL Cyber Security

Manage your data with a system designed to support modern application development. Updated for MongoDB 4.2, the third edition of this authoritative and accessible guide shows you the advantages of using document-oriented databases. You’ll learn how this secure, high-performance system enables flexible data models, high availability, and horizontal scalability. Authors Shannon Bradshaw, Eoin Brazil, and Kristina Chodorow provide guidance for database developers, advanced configuration for system administrators, and use cases for a variety of projects. NoSQL newcomers and experienced MongoDB users will find updates on querying, indexing, aggregation, transactions, replica sets, ops management, sharding and data administration, durability, monitoring, and security. In six parts, this book shows you how to: Work with MongoDB, perform write operations, find documents, and create complex queries Index collections, aggregate data, and use transactions for your application Configure a local replica set and learn how replication interacts with your application Set up cluster components and choose a shard key for a variety of applications Explore aspects of application administration and configure authentication and authorization Use stats when monitoring, back up and restore deployments, and use system settings when deploying MongoDB

IBM Power System E950: Technical Overview and Introduction

2019-12-09 O'Reilly Amazon

book

Scott Vetter , Yongsheng Li , James Cruickshank , Volker Haug , Armin Röll

data data-engineering IBM Linux Marketing SAS

This IBM® Redpaper™ publication gives a broad understanding of a new architecture of the IBM Power System E950 (9040-MR9) server that supports IBM AIX®, and Linux operating systems. The objective of this paper is to introduce the major innovative Power E950 offerings and relevant functions: The IBM POWER9™ processor, which is available at frequencies of 2.8 - 3.4 GHz. Significantly strengthened cores and larger caches. Supports up to 16 TB of memory, which is four times more than the IBM POWER8® processor-based IBM Power System E850 server. Integrated I/O subsystem and hot-pluggable Peripheral Component Interconnect Express (PCIe) Gen4 slots, which have double the bandwidth of Gen3 I/O slots. Supports EXP12SX and ESP24SX external disk drawers, which have 12 Gb Serial Attached SCSI (SAS) interfaces and support Active Optical Cables (AOCs) for greater distances and less cable bulk. New IBM EnergyScale™ technology offers new variable processor frequency modes that provide a significant performance boost beyond the static nominal frequency. This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power E950 server. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

IBM Power Systems H922 and H924 Technical Overview and Introduction

2019-12-09 O'Reilly Amazon

book

Gareth Coates , Scott Vetter , Young Hoon Cho , Volker Haug , Bartlomiej Grabowski

data data-engineering IBM Linux Marketing SAP

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power System H924 (9223-42H), and IBM Power System H922 (9223-22H) servers that support memory-intensive workloads such as SAP HANA, and deliver superior price/performance for mission-critical applications in IBM AIX®, IBM i, and Linux operating systems. The objective of this paper is to introduce the major innovative Power H92 and Power H922 offerings and their relevant functions: The new IBM POWER9™ processor, which is available at frequencies of 2.8 - 3.8 GHz, 2.9 - 3.8 GHz, 2.8 - 3.8 GHz, 3.4 - 3.9 GHz, 3.5 - 3.9 GHz, and 3.8 - 4.0 GHz. Significantly strengthened cores and larger caches. Two integrated memory controllers that allow doubled the memory footprint of IBM POWER8® servers. An integrated I/O subsystem and hot-pluggable Peripheral Component Interconnect Express (PCIe) Gen4 and Gen3 I/O slots. I/O drawer expansion options offer greater flexibility. Support for Coherent Accelerator Processor Interface (CAPI) 2.0. IBM EnergyScale™ technology provides new variable processor frequency modes that provide a significant performance boost beyond the static nominal frequency. This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power H92 and Power H922 systems. This paper does not replace the latest marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

IBM Storage Solutions for SAP Applications Version 1.3

Temenos on IBM LinuxONE Best Practices Guide

Mastering Large Datasets with Python

IBM TS4500 R6 Tape Library Guide

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 3

Refactoring Legacy T-SQL for Improved Performance: Modern Practices for SQL Server Applications

The SQL Workshop

Apache Pulsar Versus Apache Kafka

The Rise of Operational Analytics

What Is Data Engineering?

GDPR For Dummies

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond

IBM Power System L922 Technical Overview and Introduction

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 2

IBM Storage Solutions for Splunk Enterprise

Hands On Google Cloud SQL and Cloud Spanner: Deployment, Administration and Use Cases with Python

IBM Db2 Mirror for i Getting Started

MongoDB Recipes: With Data Modeling and Query Building Strategies

Database Design and Relational Theory: Normal Forms and All That Jazz

Information Privacy Engineering and Privacy by Design: Understanding Privacy Threats, Technology, and Regulations Based on Standards and Best Practices

IBM Power Systems LC921 and LC922: Technical Overview and Introduction

MongoDB: The Definitive Guide, 3rd Edition

IBM Power System E950: Technical Overview and Introduction

IBM Power Systems H922 and H924 Technical Overview and Introduction