O'Reilly Data Engineering Books

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

2021-02-17 O'Reilly Amazon

book

Andy Leonard

data data-engineering etl Azure ADF Azure DevOps

Build custom SQL Server Integration Services (SSIS) tasks using Visual Studio Community Edition and C#. Bring all the power of Microsoft .NET to bear on your data integration and ETL processes, and for no added cost over what you’ve already spent on licensing SQL Server. New in this edition is a demonstration deploying a custom SSIS task to the Azure Data Factory (ADF) Azure-SSIS Integration Runtime (IR). All examples in this new edition are implemented in C#. Custom task developers are shown how to implement custom tasks using the widely accepted and default language for .NET development. Why are custom components necessary? Because even though the SSIS catalog of built-in tasks and components is a marvel of engineering, gaps remain in the available functionality. One such gap is a constraint of the built-in SSIS Execute Package Task, which does not allow SSIS developers to select SSIS packages from other projects in the SSIS Catalog. Examples in this bookshow how to create a custom Execute Catalog Package task that allows SSIS developers to execute tasks from other projects in the SSIS Catalog. Building on the examples and patterns in this book, SSIS developers may create any task to which they aspire, custom tailored to their specific data integration and ETL needs. What You Will Learn Configure and execute Visual Studio in the way that best supports SSIS task development Create a class library as the basis for an SSIS task, and reference the needed SSIS assemblies Properly sign assemblies that you create in order to invoke them from your task Implement source code control via Azure DevOps, or your own favorite tool set Troubleshoot and execute custom tasks as part of your own projects Create deployment projects (MSIs) for distributing code-complete tasks Deploy custom tasks to Azure Data Factory Azure-SSIS IRs in the cloud Create advanced editors for custom task parameters Who This Book Is For For database administrators and developers who are involved in ETL projects built around SQL Server Integration Services (SSIS). Readers do not need a background in software development with C#. Most important is a desire to optimize ETL efforts by creating custom-tailored tasks for execution in SSIS packages, on-premises or in ADF Azure-SSIS IRs.

IBM Spectrum Scale and IBM Elastic Storage System Network Guide

2021-02-17 O'Reilly Amazon

book

Rakesh Chutke , Sandeep Naik , Kevin Gildea , John Lewars , Kedar Karmarkar , Larry Coyne , Sandeep R Patil

data data-engineering IBM ELK

High-speed I/O workloads are moving away from the SAN to Ethernet and IBM® Spectrum Scale is pushing the network limits. The IBM Spectrum® Scale team discovered that many infrastructure Ethernet networks that were used for years to support various applications are not designed to provide a high-performance data path concurrently to many clients from many servers. IBM Spectrum Scale is not the first product to use Ethernet for storage access. Technologies, such as Fibre Channel over Ethernet (FCoE), scale out NAS, and IP connected storage (iSCSI and others) use Ethernet though IBM Spectrum Scale as the leader in parallel I/O performance, which provides the best performance and value when used on a high-performance network. This IBM Redpaper publication is based on lessons that were learned in the field by deploying IBM Spectrum Scale on Ethernet and InfiniBand networks. This IBM Redpaper® publication answers several questions, such as, "How can I prepare my network for high performance storage?", "How do I know when I am ready?", and "How can I tell what is wrong?" when deploying IBM Spectrum Scale and IBM Elastic Storage® Server (ESS). This document can help IT architects get the design correct from the beginning of the process. It also can help the IBM Spectrum Scale administrator work effectively with the networking team to quickly resolve issues.

Data Pipelines Pocket Reference

2021-02-10 O'Reilly Amazon

book

James Densmore

data data-engineering Analytics Cloud Computing Data Analytics Modern Data Stack

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting

Mastering Kafka Streams and ksqlDB

2021-02-04 O'Reilly Amazon

book

Mitch Seymour

data data-engineering streaming-messaging Kafka Java Pub/Sub

Working with unbounded and fast-moving data streams has historically been difficult. But with Kafka Streams and ksqlDB, building stream processing applications is easy and fun. This practical guide shows data engineers how to use these tools to build highly scalable stream processing applications for moving, enriching, and transforming large amounts of data in real time. Mitch Seymour, data services engineer at Mailchimp, explains important stream processing concepts against a backdrop of several interesting business problems. You'll learn the strengths of both Kafka Streams and ksqlDB to help you choose the best tool for each unique stream processing project. Non-Java developers will find the ksqlDB path to be an especially gentle introduction to stream processing. Learn the basics of Kafka and the pub/sub communication pattern Build stateless and stateful stream processing applications using Kafka Streams and ksqlDB Perform advanced stateful operations, including windowed joins and aggregations Understand how stateful processing works under the hood Learn about ksqlDB's data integration features, powered by Kafka Connect Work with different types of collections in ksqlDB and perform push and pull queries Deploy your Kafka Streams and ksqlDB applications to production

Implementing the IBM SAN Volume Controller with IBM Spectrum Virtualize V8.3.1

2021-02-01 O'Reilly Amazon

book

Markus Döllinger , Pawel Brodacki , Jon Tate , Jon Herd , Hartmut Lonzer , Carsten Larsen , Sergey Kubin , Jack Armstrong , Tiago Bastos

data data-engineering IBM

This IBM® Redbooks® publication is a detailed technical guide to the IBM System Storage™ SAN Volume Controller, which is powered by IBM Spectrum® Virtualize V8.3.1. IBM SAN Volume Controller is a virtualization appliance solution that maps virtualized volumes that are visible to hosts and applications to physical volumes on storage devices. Each server within the storage area network (SAN) has its own set of virtual storage addresses that are mapped to physical addresses. If the physical addresses change, the server continues running by using the same virtual addresses that it had before. Therefore, volumes or storage can be added or moved while the server is still running. The IBM virtualization technology improves the management of information at the block level in a network, which enables applications and servers to share storage devices on a network.

IBM Integrated Synchronization: Incremental Updates Unleashed

2021-01-27 O'Reilly Amazon

book

Günter Schöllmann , Cüneyt Göksu , Christian Michel

data data-engineering IBM Analytics Cloud Computing

The IBM® Db2® Analytics Accelerator (Accelerator) is a logical extension of Db2 for IBM z/OS® that provides a high-speed query engine that efficiently and cost-effectively runs analytics workloads. The Accelerator is an integrated back-end component of Db2 for z/OS. Together, they provide a hybrid workload-optimized database management system that seamlessly manages queries that are found in transactional workloads to Db2 for z/OS and queries that are found in analytics applications to Accelerator. Each query runs in its optimal environment for maximum speed and cost efficiency. The incremental update function of Db2 Analytics Accelerator for z/OS updates Accelerator-shadow tables continually. Changes to the data in original Db2 for z/OS tables are propagated to the corresponding target tables with a high frequency and a brief delay. Query results from Accelerator are always extracted from recent, close-to-real-time data. An incremental update capability that is called IBM InfoSphere® Change Data Capture (InfoSphere CDC) is provided by IBM InfoSphere Data Replication for z/OS up to Db2 Analytics Accelerator V7.5. Since then, an extra new replication protocol between Db2 for z/OS and Accelerator that is called IBM Integrated Synchronization was introduced. With Db2 Analytics Accelerator V7.5, customers can choose which one to use. IBM Integrated Synchronization is a built-in product feature that you use to set up incremental updates. It does not require InfoSphere CDC, which is bundled with IBM Db2 Analytics Accelerator. In addition, IBM Integrated Synchronization has more advantages: Simplified administration, packaging, upgrades, and support. These items are managed as part of the Db2 for z/OS maintenance stream. Updates are processed quickly. Reduced CPU consumption on the mainframe due to a streamlined, optimized design where most of the processing is done on the Accelerator. This situation provides reduced latency. Uses IBM Z® Integrated Information Processor (zIIP) on Db2 for z/OS, which leads to reduced CPU costs on IBM Z and better overall performance data, such as throughput and synchronized rows per second. On z/OS, the workload to capture the table changes was reduced, and the remainder can be handled by zIIPs. With the introduction of an enterprise-grade Hybrid Transactional Analytics Processing (HTAP) enabler that is also known as the Wait for Data protocol, the integrated low latency protocol is now enabled to support more analytical queries running against the latest committed data. IBM Db2 for z/OS Data Gate simplifies delivering data from IBM Db2 for z/OS to IBM Cloud® Pak® for Data for direct access by new applications. It uses the special-purpose integrated synchronization protocol to maintain data currency with low latency between Db2 for z/OS and dedicated target databases on IBM Cloud Pak for Data.

IBM Power Systems H922 and H924 Technical Overview and Introduction

2021-01-26 O'Reilly Amazon

book

Scott Vetter , Mauro Minomizaki , Tamas David Domjan , Bartlomiej Grabowski

data data-engineering IBM ibm-power-systems Cloud Computing Linux

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power System H922 (9223-22S), and IBM Power System H924 (9223-42S) servers that support memory-intensive workloads, such as SAP HANA, and deliver superior price and performance for mission-critical applications in IBM AIX®, IBM i, and Linux® operating systems. The goal of this paper is to provide a hardware architecture analysis and highlight the changes, new technologies, and major features that are being introduced in these systems' 2020 release, such as the following examples: Availability of new IBM POWER9™ processor configurations for the number of cores per socket. More performance by using industry-leading IBM Peripheral Component Interconnect® Express (PCIe) Gen4 slots. Enhanced internal disk configuration options, with up to 14 NVMe adapters (four U.2 NVMe plus up to 10 PCIe add-in cards). Twice as fast back-end I/O enables seamless maximum speed and throughput between on-premises and multiple public cloud infrastructures with high availability (HA). This publication is for professionals who want to acquire a better understanding of IBM Power Systems products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power H922 and Power H924 systems.

MySQL Concurrency: Locking and Transactions for MySQL Developers and DBAs

2021-01-22 O'Reilly Amazon

book

Jesper Wisborg Krogh

data data-engineering relational-databases MySQL SQL

Know how locks work in MySQL and how they relate to transactions. This book explains the major role that locks play in database systems, showing how locks are essential in allowing high-concurrency workloads. You will learn about lock access levels and lock granularities from the user level as well as table locks to record and gap locks. Most importantly, the book covers troubleshooting techniques when locking becomes a pain point. Several of the lock types in MySQL have a duration of a transaction. For this reason, it is important to understand how transactions work. This book covers the basics of transactions as well as transaction isolation levels and how they affect locking. The book is meant to be your go-to resource for solving locking contention and similar problems in high-performance MySQL database applications. Detecting locking issues when they occur is the first key to resolving such issues. MySQL Concurrency provides techniques for detecting locking issues such as contention. The book shows how to analyze locks that are causing contention to see why those locks are in place. A collection of six comprehensive case studies combine locking and transactional theory with realistic lock conflicts. The case studies walk you through the symptoms to look for in order to identify which issue you are facing, the cause of the conflict, its analysis, solution, and how to prevent the issue in the future. What You Will Learn Understand which lock types exist in MySQL and how they are used Choose the best transaction isolation level for a given transaction Detect and analyze lock contention when it occurs Reduce locking issues in your applications Resolve deadlocks between transactions Resolve InnoDB record-level locking issues Resolve issues from metadata and schema locks Who This Book Is For Database administrators and SQL developers who are familiar with MySQL and want to gain a better understanding of locking and transactions as well as how to work with them. While some experience with MySQL is required, no prior knowledge of locks and transactions is needed.

High Performance SQL Server: Consistent Response for Mission-Critical Applications

2021-01-21 O'Reilly Amazon

book

Benjamin Nevarez

data data-engineering relational-databases microsoft-sql-server Linux SQL

Design and configure SQL Server instances and databases in support of high-throughput, mission-critical applications providing consistent response times in the face of variations in numbers of users and query volumes. In this new edition, with over 100 pages of additional content, every original chapter has been updated for SQL Server 2019, and the book also includes two new chapters covering SQL Server on Linux and Intelligent Query Processing. This book shows you how to configure SQL Server and design your databases to support a given instance and workload. You will learn advanced configuration options, in-memory technologies, storage and disk configuration, and more, all aimed toward enabling your desired application performance and throughput. Configuration doesn’t stop with implementation. Workloads change over time, and other impediments can arise to thwart desired performance. High Performance SQL Server covers monitoring and troubleshooting to aid you in detecting and fixing production performance problems and minimizing application outages. You will learn about a variety of tools, ranging from the traditional wait analysis methodology to the query store or indexing, and you will learn how improving performance is an iterative process. This book is an excellent complement to query performance tuning books and provides the other half of what you need to know by focusing on configuring the instances on which mission-critical queries are executed. What You Will Learn Understand SQL Server's database engine and how it processes queries Configure instances in support of high-throughput applications Provide consistent response times to varying user numbers and query volumes Design databases for high-throughput applications with focus on performance Record performance baselines and monitor SQL Server instances against them Troubleshot and fix performance problems Who This Book Is For SQL Server database administrators, developers, and data architects. The book is also of use to system administrators who are managing and are responsible for the physical servers on which SQL Server instances are run.

Data Accelerator for AI and Analytics

2021-01-20 O'Reilly Amazon

book

Christof Westhues , Abhishek Dave , Mike Knieriemen , Nils Haustein , Gero Schmidt , TJ Harris , Simon Lorenz , Venkateswara Puvvada

data data-engineering AI/ML Analytics Data Analytics Data Management

This IBM® Redpaper publication focuses on data orchestration in enterprise data pipelines. It provides details about data orchestration and how to address typical challenges that customers face when dealing with large and ever-growing amounts of data for data analytics. While the amount of data increases steadily, artificial intelligence (AI) workloads must speed up to deliver insights and business value in a timely manner. This paper provides a solution that addresses these needs: Data Accelerator for AI and Analytics (DAAA). A proof of concept (PoC) is described in detail. This paper focuses on the functions that are provided by the Data Accelerator for AI and Analytics solution, which simplifies the daily work of data scientists and system administrators. This solution helps increase the efficiency of storage systems and data processing to obtain results faster while eliminating unnecessary data copies and associated data management.

Privileged Access Management for Secure Storage Administration: IBM Spectrum Scale with IBM Security Verify Privilege Vault

2021-01-08 O'Reilly Amazon

book

Sridhar Muppidi , Nishant Singhai , Sumit Kumar , Vincent Hsu , Sandeep R Patil , Kanad Jadhav

data data-engineering IBM ELK Cyber Security

There is a growing insider security risk to organizations. Human error, privilege misuse, and cyberespionage are considered the top insider threats. One of the most dangerous internal security threats is the privileged user with access to critical data, which is the "crown jewels" of the organization. This data is on storage, so storage administration has critical privilege access that can cause major security breaches and jeopardize the safety of sensitive assets. Organizations must maintain tight control over whom they grant privileged identity status to for storage administration. Extra storage administration access must be shared with support and services teams when required. There also is a need to audit critical resource access that is required by compliance to standards and regulations. IBM® Security™ Verify Privilege Vault On-Premises (Verify Privilege Vault), formerly known as IBM Security™ Secret Server, is the next-generation privileged account management that integrates with IBM Storage to ensure that access to IBM Storage administration sessions is secure and monitored in real time with required recording for audit and compliance. Privilege access to storage administration sessions is centrally managed, and each session can be timebound with remote monitoring. You also can use remote termination and an approval workflow for the session. In this IBM Redpaper, we demonstrate the integration of IBM Spectrum® Scale and IBM Elastic Storage® Server (IBM ESS) with Verify Privilege Vault, and show how to use privileged access management (PAM) for secure storage administration. This paper is targeted at storage and security administrators, storage and security architects, and chief information security officers.

Modernizing Applications with IBM CICS

2020-12-30 O'Reilly Amazon

book

Ezriel Gross , Jim Harrison , Will Yates , Debra Scharfstein , Russell Bonner , Sophie Green

data data-engineering IBM API CI/CD Cloud Computing

IBM® CICS® is a mixed language application server that runs on IBM Z®. Over the 50 years since CICS was introduced in 1969, enterprises have used the qualities of service (QoSs) that CICS provides to allow them to create high throughput and secure transactional applications that have powered their business. As the IT landscape has evolved, so has CICS to allow these applications to integrate with new platforms and still provide value to the rest of the business. Because of this capability, many businesses still rely on CICS to power their core applications. This IBM Redpaper publication focuses on modernizing these CICS applications, allowing them to integrate with cloud-native applications. This modernization can be achieved either by constructing application programming interfaces (APIs) that allow new cloud-native applications to connect to your existing assets, rewriting parts of your application in newer languages and hosting them back on CICS, or by using CICS capabilities to extend your applications to provide new capabilities and functions. The paper takes a traditional example application and shows you how it works. Then, the paper extends the example, rewrites portions of its functions, and enables its APIs. It also explains how CICS applications can use continuous integration (CI) and continuous delivery (CD) to deliver, test, and deploy code into CICS easily and with quality.

Coronavirus News, Markets and AI

2020-12-27 O'Reilly Amazon

book

Pankaj Sharma

data data-engineering streaming-messaging real-time-analytics AI/ML

Coronavirus News, Markets and AI explores the analysis of unstructured data from coronavirus related news and the underlying sentiment during its real-time impact on the world and on global financial markets, in particular.

Public Budgeting in Search for an Identity

2020-12-27 O'Reilly Amazon

book

Maria Francesca Sicilia , Ileana Steccolini

data data-engineering search

This book provides a state-of-the-art reflection on current trends in international public budgeting, representing an important pillar in the accumulation of knowledge on public sector budgeting processes, contents, evolutions and critical issues.

Introducing Microsoft Access Using Macro Programming Techniques: An Introduction to Desktop Database Development by Example

2020-12-26 O'Reilly Amazon

book

Flavio Morgado

data data-engineering database-management-tools microsoft-access Microsoft

Learn Microsoft Access by building a powerful database application from start to finish. Microsoft Access ships with every version of Office, from Office 2019 to Office 365 Home and Personal editions. Most people understand the value of having a reliable contact database, but few realize that Access can be an incredibly valuable data tool and an excellent gateway for learning database development. Introducing Microsoft Access Using Macro Programming Techniques approaches database development from a practical and experiential standpoint. You will learn important data concepts as you journey through each step of creating a database using Access. The example you will build takes advantage of a massive amount of data from an external source of nutritional data (USDA). You will leverage this freely available repository of information in multiple ways, putting Access to the test in creating powerful business solutions that you can then apply to your own data sets. Thetables and records in this database will be used to demonstrate key relational principles in Access, including how to use the relationship window to understand the relationships between tables and how to create different objects such as queries, forms, reports, and macros. Using this approach, you will learn how desktop database development can be a powerful solution to meet your business needs. What You Will Learn Discover the relational database and how it is different from other databases Create database tables and establish relationships between them to create a solid relational database system Understand the concept and importance of referential integrity (RI) in data and databases Use different types of Access queries to extract the information you need from the database Show database information in individual, customized windows using Access Forms Present insightful information about the database using Access Reports Automate your database solutions with macros Who This Book Is For Anyone who wants to learn how to build a database using Microsoft Access to create customized solutions. It is also useful for those working in IT managing large contact data sets (healthcare, retail, etc.) who need to learn the basics in order to create a professional database solution. Readers should have access to some version of Microsoft Access in order to perform the exercises in this book.

MongoDB Fundamentals

2020-12-22 O'Reilly Amazon

book

Liviu Nedov , Sam Anderson , Amit Phaltankar , Juned Ahsan , Michael Harrison

data data-engineering nosql-databases MongoDB Cloud Computing NoSQL

This book, "MongoDB Fundamentals", is the ideal hands-on guide to learning MongoDB. By starting from the basics of NoSQL databases and progressing to cloud integration using MongoDB Atlas, you will gain practical experience managing, querying, and visualizing data effectively for real-world applications. What this Book will help me do Set up and manage a MongoDB database with both local and cloud environments. Master querying and modifying data using the aggregation framework for complex operations. Implement effective database architecture with replication and sharding techniques. Ensure data security and resilience through user management and efficient backup/restore methods. Visualize data insights through dynamic reports and charts using MongoDB Charts. Author(s) Amit Phaltankar, Juned Ahsan, Michael Harrison, and Liviu Nedov are seasoned professionals in the field of database management systems, each bringing extensive experience working with MongoDB and cloud technologies. They excel at translating technical concepts into accessible, actionable insights, and have a passion for enabling IT professionals to create high-performance database solutions. Who is it for? "MongoDB Fundamentals" is tailored for developers, database administrators, system administrators, and cloud architects who are new to MongoDB but are looking to integrate it into their data processing workflows. It's perfect for those who aim to enhance their skills in handling data within cloud computing environments and have some basic programming or database experience.

IBM Storage for Red Hat OpenShift Blueprint

2020-12-18 O'Reilly Amazon

book

IBM

data data-engineering IBM Cloud Computing

This IBM® Blueprint is intended to facilitate the deployment of IBM Storage for Red Hat OpenShift Container Platform by using detailed hardware specifications to build a system. It describes the associated parameters for configuring persistent storage within a Red Hat OpenShift Container Platform environment. To complete the tasks, you must understand Red Hat OpenShift, IBM Storage, the IBM block storage Container Storage Interface (CSI) driver, and the IBM Spectrum Scale CSI driver. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Storwize® or IBM FlashSystem® storage devices, Enterprise Storage Server®, and IBM Spectrum® Scale are supported and entitled, and where the issues are not specific to a blueprint implementation. IBM Storage Suite for IBM Cloud® Paks is an offering bundle that includes software-defined storage from IBM and Red Hat. Use this document for more information about how to deploy IBM Storage product licenses that are obtained through Storage Suite for Cloud Paks (IBM Spectrum Virtualize and IBM Spectrum Scale).

Toward Solving Complex Human Problems

2020-12-18 O'Reilly Amazon

book

Brian E. White

data data-engineering integration-solutions

Readers will learn more about CS characteristics and behaviors, and CSE principles, and will therefore be able to focus on techniques that will better serve them in their working/everyday environments in dealing with complexity.

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

2020-12-17 O'Reilly Amazon

book

Ramcharan Kakarla , Sridhar Alla , Sundar Krishnan

data data-engineering apache-spark PySpark AI/ML API

Discover the capabilities of PySpark and its application in the realm of data science. This comprehensive guide with hand-picked examples of daily use cases will walk you through the end-to-end predictive model-building cycle with the latest techniques and tricks of the trade. Applied Data Science Using PySpark is divided unto six sections which walk you through the book. In section 1, you start with the basics of PySpark focusing on data manipulation. We make you comfortable with the language and then build upon it to introduce you to the mathematical functions available off the shelf. In section 2, you will dive into the art of variable selection where we demonstrate various selection techniques available in PySpark. In section 3, we take you on a journey through machine learning algorithms, implementations, and fine-tuning techniques. We will also talk about different validation metrics and how to use them for picking the best models. Sections 4 and 5 go through machine learning pipelines and various methods available to operationalize the model and serve it through Docker/an API. In the final section, you will cover reusable objects for easy experimentation and learn some tricks that can help you optimize your programs and machine learning pipelines. By the end of this book, you will have seen the flexibility and advantages of PySpark in data science applications. This book is recommended to those who want to unleash the power of parallel computing by simultaneously working with big datasets. What You Will Learn Build an end-to-end predictive model Implement multiple variable selection techniques Operationalize models Master multiple algorithms and implementations Who This Book is For Data scientists and machine learning and deep learning engineers who want to learn and use PySpark for real-time analysis of streamingdata.

Database Design for Mere Mortals: 25th Anniversary Edition, 4th Edition

2020-12-17 O'Reilly Amazon

book

Michael J. Hernandez

data data-engineering relational-databases RDBMS

The #1 Easy, Commonsense Guide to Database DesignNow Updated Foreword by Michelle Poolet, Mount Vernon Data Systems LLC Database Design for Mere Mortals has earned worldwide respect as the simplest way to learn relational database design. Now, this hands-on, software independent tutorial is even clearer and easier to use. Step by step, this new 25th Anniversary Edition shows you how to design modern databases that are soundly structured, reliable, and flexible, even in the latest online applications. Michael Hernandez guides you through everything from planning to defining tables, fields, keys, table relationships, business rules, and views. You will learn practical ways to improve data integrity, how to avoid common mistakes, and when to break the rules. Updated review questions and figures help you learn these techniques more easily and effectively. Understand database types, models, and design terminology Perform interviews to efficiently capture requirementseven if everyone works remotely Set clear design objectives and transform them into effective designs Analyze a current database so you can identify ways to improve it Establish table structures and relationships, assign primary keys, set field specifications, and set up views Ensure the correct level of data integrity for each database Identify and establish business rules Preview and prepare for the future of relational databases Whatever relational database systems you use, Hernandez will help you design databases that are robust and trustworthy. Never designed a database before? Settling for inadequate generic designs? Running existing databases that need improvement? Start here.

IBM DS8900F Product Guide Release 9.1

2020-12-17 O'Reilly Amazon

book

Peter Kimmel

data data-engineering IBM

Built on over 50 years of enterprise storage expertise, the IBM® DS8000® series is the flagship of disk storage systems within the IBM System Storage portfolio. As of October 2020, the DS8900F with DS8000 Release 9.1 is the latest addition. The DS8900F is an all-flash system exclusively, and it offers two classes: DS8910F: Flexibility Class The flexibility class delivers significant performance improvements compared to the previous IBM DS8880F generation. DS8950F: Agility Class The agility class is efficiently designed to consolidate all your mission-critical workloads for IBM Z®, IBM LinuxONE, IBM Power Systems, and distributed environments under a single all-flash storage solution. This IBM Redbooks® Product Guide gives an overview of the features and functions that are available with the IBM DS8900F models running microcode Release 9.1 (Bundle 89.10 / Licensed Machine Code 7.9.10).

Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance

2020-12-14 O'Reilly Amazon

book

Louis Davidson

data data-engineering relational-databases Cloud Computing Cyber Security SQL

Learn effective and scalable database design techniques in SQL Server 2019 and other recent SQL Server versions. This book is revised to cover additions to SQL Server that include SQL graph enhancements, in-memory online transaction processing, temporal data storage, row-level security, and other design-related features. This book will help you design OLTP databases that are high-quality, protect the integrity of your data, and perform fast on-premises, in the cloud, or in hybrid configurations. Designing an effective and scalable database using SQL Server is a task requiring skills that have been around for well over 30 years, using technology that is constantly changing. This book covers everything from design logic that business users will understand to the physical implementation of design in a SQL Server database. Grounded in best practices and a solid understanding of the underlying theory, author Louis Davidson shows you how to "getit right" in SQL Server database design and lay a solid groundwork for the future use of valuable business data. What You Will Learn Develop conceptual models of client data using interviews and client documentation Implement designs that work on premises, in the cloud, or in a hybrid approach Recognize and apply common database design patterns Normalize data models to enhance integrity and scalability of your databases for the long-term use of valuable data Translate conceptual models into high-performing SQL Server databases Secure and protect data integrity as part of meeting regulatory requirements Create effective indexing to speed query performance Understand the concepts of concurrency Who This Book Is For Programmers and database administrators of all types who want to use SQL Server to store transactional data. The book is especially useful to those wanting to learn the latest database design features in SQL Server 2019 (features that include graph objects, in-memory OLTP, temporal data support, and more). Chapters on fundamental concepts, the language of database modeling, SQL implementation, and the normalization process lay a solid groundwork for readers who are just entering the field of database design. More advanced chapters serve the seasoned veteran by tackling the latest in physical implementation features that SQL Server has to offer. The book has been carefully revised to cover all the design-related features that are new in SQL Server 2019.

Implementation Guide for IBM Elastic Storage System 5000

2020-12-08 O'Reilly Amazon

book

Robert Guthrie , Farida Yaragatti , Van Smith , John Sing , Stephen M Tee , Mary Jane Zajac , Sumit Kumar , Ravindra Sure , Sukumar Vankadhara , Vasfi Gucer , Ricardo D. Zamora Ruvalcaba , Shradha Thakare , Jay Vaddi , Jonathan Terner , Todd M Tosseth , Brian Herr , Wesley Jones , Steve Duersch , Puneet Chaudhary , Luis Bolinches

data data-engineering IBM AI/ML Big Data ELK

This IBM® Redbooks® publication introduces and describes the IBM Elastic Storage® Server 5000 (ESS 5000) as a scalable, high-performance data and file management solution. The solution is built on proven IBM Spectrum® Scale technology, formerly IBM General Parallel File System (IBM GPFS). ESS is a modern implementation of software-defined storage, making it easier for you to deploy fast, highly scalable storage for AI and big data. With the lightning-fast NVMe storage technology and industry-leading file management capabilities of IBM Spectrum Scale, the ESS 3000 and ESS 5000 nodes can grow to over YB scalability and can be integrated into a federated global storage system. By consolidating storage requirements from the edge to the core data center — including kubernetes and Red Hat OpenShift — IBM ESS can reduce inefficiency, lower acquisition costs, simplify storage management, eliminate data silos, support multiple demanding workloads, and deliver high performance throughout your organization. This book provides a technical overview of the ESS 5000 solution and helps you to plan the installation of the environment. We also explain the use cases where we believe it fits best. Our goal is to position this book as the starting point document for customers that would use the ESS 5000 as part of their IBM Spectrum Scale setups. This book is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective storage solutions with ESS 5000.

IBM System Storage SAN Volume Controller, IBM Storwize V7000, and IBM FlashSystem 7200 Best Practices and Performance Guidelines

2020-12-07 O'Reilly Amazon

book

Sang-hyun Kim , Jon Tate , Dirk Peitzmann , Jon Herd , Sergey Kubin , Antonio Rainero , Tiago Moreira Candelaria Bastos

data data-engineering IBM ibm-system-storage ibm-system-storage-san-volume-controller

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM System Storage® SAN Volume Controller and IBM Storwize® V7000 powered by IBM Spectrum Virtualize™ V8.2.1. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, remote copy services, and hosts. Then it provides performance guidelines for SAN Volume Controller, back-end storage, and applications. It explains how you can optimize disk performance with the IBM System Storage Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting SAN Volume Controller and Storwize V7000. This book is intended for experienced storage, SAN, and SAN Volume Controller administrators and technicians. Understanding his book requires advanced knowledge of the SAN Volume Controller and Storwize V7000 and SAN environments. Important: On 11th February 2020 IBM announced the arrival of SAN Volume Controller SA2 and SV2, and IBM FlashSystem® 7200 to the family. This book was written specifically for prior versions of SVC and Storwize V7000; however, most of the general principles will apply. If you are in any doubt as to their applicability then you should work with your local IBM representative. This book will be updated to comprehensively include SAN Volume Controller SA2 and SV2 and FlashSystem 7200 in due course.

Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion, 2nd Edition

2020-12-05 O'Reilly Amazon

book

Ken Ledeen , Harry Lewis , Hal Abelson , Wendy Seltzer

data data-engineering data-security-privacy data security & privacy AI/ML Big Data

What you must know to protect yourself today The digital technology explosion has blown everything to bits--and the blast has provided new challenges and opportunities. This second edition of Blown to Bits delivers the knowledge you need to take greater control of your information environment and thrive in a world thats coming whether you like it or not. Straight from internationally respected Harvard/MIT experts, this plain-English bestseller has been fully revised for the latest controversies over social media, fake news, big data, cyberthreats, privacy, artificial intelligence and machine learning, self-driving cars, the Internet of Things, and much more. Discover who owns all that data about youand what they can infer from it Learn to challenge algorithmic decisions See how close you can get to sending truly secure messages Decide whether you really want always-on cameras and microphones Explore the realities of Internet free speech Protect yourself against out-of-control technologies (and the powerful organizations that wield them) You will find clear explanations, practical examples, and real insight into what digital tech means to you--as an individual, and as a citizen.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

IBM Spectrum Scale and IBM Elastic Storage System Network Guide

Data Pipelines Pocket Reference

Mastering Kafka Streams and ksqlDB

Implementing the IBM SAN Volume Controller with IBM Spectrum Virtualize V8.3.1

IBM Integrated Synchronization: Incremental Updates Unleashed

IBM Power Systems H922 and H924 Technical Overview and Introduction

MySQL Concurrency: Locking and Transactions for MySQL Developers and DBAs

High Performance SQL Server: Consistent Response for Mission-Critical Applications

Data Accelerator for AI and Analytics

Privileged Access Management for Secure Storage Administration: IBM Spectrum Scale with IBM Security Verify Privilege Vault

Modernizing Applications with IBM CICS

Coronavirus News, Markets and AI

Public Budgeting in Search for an Identity

Introducing Microsoft Access Using Macro Programming Techniques: An Introduction to Desktop Database Development by Example

MongoDB Fundamentals

IBM Storage for Red Hat OpenShift Blueprint

Toward Solving Complex Human Problems

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Database Design for Mere Mortals: 25th Anniversary Edition, 4th Edition

IBM DS8900F Product Guide Release 9.1

Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance

Implementation Guide for IBM Elastic Storage System 5000

IBM System Storage SAN Volume Controller, IBM Storwize V7000, and IBM FlashSystem 7200 Best Practices and Performance Guidelines

Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion, 2nd Edition