O'Reilly Data Engineering Books

Apache Superset Quick Start Guide

2018-12-19 O'Reilly Amazon

book

Shashank Shekhar

data data-engineering relational-databases MySQL BI Dashboard

Apache Superset Quick Start Guide teaches you how to leverage Apache Superset to create interactive and insightful data visualizations. With this book, you'll understand how to integrate Superset with popular databases and build user-friendly dashboards tailored for business intelligence needs. What this Book will help me do Set up and configure Apache Superset for data visualization tasks. Integrate data from SQL databases into Superset for dashboards. Design dashboards tailored to represent business metrics and insights. Use Superset's visualization techniques to explore and present various datasets. Understand and apply user role management and security features in Superset. Author(s) None Shekhar is an experienced data visualization and business intelligence specialist with years of experience in working with Apache Superset. They have written several guides on utilizing open-source tools for enterprise needs. Their technical expertise and approachable writing style make this guide practical and engaging. Who is it for? This book is geared towards data analysts, business intelligence professionals, and developers. Beginners to Superset can quickly grasp the fundamentals, while those with prior experience in data visualization will appreciate the advanced techniques. It's perfect for anyone looking to enhance their data storytelling and dashboard design skills.

Vertically Integrated Architectures: Versioned Data Models, Implicit Services, and Persistence-Aware Programming

2018-12-18 O'Reilly Amazon

book

Jos Jong

data data-engineering data-models Computer Science

Understand how and why the separation between layers and tiers in service-oriented architectures holds software developers back from being truly productive, and how you can remedy that problem. Strong processes and development tools can help developers write more complex software, but large amounts of code can still be directly deduced from the underlying database model, hampering developer productivity. In a world with a shortage of developers, this is bad news. More code also increases maintenance costs and the risk of bugs, meaning less time is spent improving the quality of systems. You will learn that by making relationships first-class citizens within an item/relationship model, you can develop an extremely compact query language, inspired by natural language. You will also learn how this model can serve as both a database schema and an object model upon which to build business logic. Implicit services free you from writing code for standard read/write operations, while still supporting fine-grained authorization. Vertically Integrated Architectures explains how functional schema mappings can solve database migrations and service versioning at the same time, and how all this can support any client, from free-format to fully vertically integrated types. Unleash the potential and use VIA to drastically increase developer productivity and quality. What You'll Learn See how the separation between application server and database in a SOA-based architecture might be justifiable from a historical perspective, but can also hold us back Examine how the vertical integration of application logic and database functionality can drastically increase developer productivity and quality Review why application developers only need to write pure business logic if an architecture takes care of basic read/write client-server communication and data persistence Understand why a set-oriented and persistence-aware programming language would not only make it easier to build applications, but would also enable the fully optimized execution of incoming service requests Who This Book Is For Software architects, senior software developers, computer science professionals and students, and the open source community.

IBM Tape Library Guide for Open Systems

2018-12-14 O'Reilly Amazon

book

Michael Engelbrecht Larry Coyne Simon Browne, Illarion Borisevich

data data-engineering IBM

Abstract This IBM® Redbooks® publication presents a general introduction to the latest IBM tape and tape library technologies. Featured tape technologies include the IBM LTO Ultrium and Enterprise 3592 tape drives, and their implementation in IBM tape libraries. This 16th edition introduces the new TS1160 tape drive with up to 20 TB capacity on JE media and the latest updates to the IBM TS4500 and TS4300 tape libraries, It includes generalized sections about Small Computer System Interface (SCSI) and Fibre Channel connections, and multipath architecture configurations. This book also covers tools and techniques for library management. It is intended for anyone who wants to understand more about IBM tape products and their implementation. It is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists. If you do not have a background in computer tape storage products, you might need to read other sources of information. In the interest of being concise, topics that are generally understood are not covered in detail.

Machine Learning with PySpark: With Natural Language Processing and Recommender Systems

2018-12-14 O'Reilly Amazon

book

Pramod Singh

data data-engineering apache-spark PySpark AI/ML Data Science

Build machine learning models, natural language processing applications, and recommender systems with PySpark to solve various business challenges. This book starts with the fundamentals of Spark and its evolution and then covers the entire spectrum of traditional machine learning algorithms along with natural language processing and recommender systems using PySpark. Machine Learning with PySpark shows you how to build supervised machine learning models such as linear regression, logistic regression, decision trees, and random forest. You’ll also see unsupervised machine learning models such as K-means and hierarchical clustering. A major portion of the book focuses on feature engineering to create useful features with PySpark to train the machine learning models. The natural language processing section covers text processing, text mining, and embedding for classification. After reading thisbook, you will understand how to use PySpark’s machine learning library to build and train various machine learning models. Additionally you’ll become comfortable with related PySpark components, such as data ingestion, data processing, and data analysis, that you can use to develop data-driven intelligent applications. What You Will Learn Build a spectrum of supervised and unsupervised machine learning algorithms Implement machine learning algorithms with Spark MLlib libraries Develop a recommender system with Spark MLlib libraries Handle issues related to feature engineering, class balance, bias and variance, and cross validation for building an optimal fit model Who This Book Is For Data science and machine learning professionals.

Practical Apache Spark: Using the Scala API

2018-12-12 O'Reilly Amazon

book

Dharanitharan Ganesan , Subhashini Chellappan

data data-engineering apache-spark AI/ML API Hive

Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also covers the integration of Apache Spark with Kafka with examples. You’ll follow a learn-to-do-by-yourself approach to learning – learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure. On completion, you’ll have knowledge of the functional programming aspects of Scala, and hands-on expertise in various Spark components. You’ll also become familiar with machine learning algorithms with real-time usage. What You Will Learn Discover the functional programming features of Scala Understand the completearchitecture of Spark and its components Integrate Apache Spark with Hive and Kafka Use Spark SQL, DataFrames, and Datasets to process data using traditional SQL queries Work with different machine learning concepts and libraries using Spark's MLlib packages Who This Book Is For Developers and professionals who deal with batch and stream data processing.

IBM Power Systems RAID Solutions Introduction and Technical Overview

2018-12-11 O'Reilly Amazon

book

Scott Vetter , Harihara Balakrishnan , Swarna Narendra Babu

data data-engineering IBM SAS

This IBM® Redpaper™ publication given an overview and technical introduction to IBM Power Systems™ RAID solutions. The book is organized to start with an introduction to Redundant Array of Independent Disks (RAID), and various RAID levels with their benefits. A brief comparison of Direct Attached Storage (DAS) and networked storage systems such as SAN / NAS is provided with a focus on emerging applications that typically use the DAS model over networked storage models. The book focuses on IBM Power Systems I/O architecture and various SAS RAID adapters that are supported in IBM POWER8™ processor-based systems. A detailed description of the SAS adapters, along with their feature comparison tables, is included in Chapter 3, "RAID adapters for IBM Power Systems" on page 45. The book is aimed at readers who have the responsibility of configuring IBM Power Systems for individual solution requirements. This audience includes IT Architects, IBM Technical Sales Teams, IBM Business Partner Solution Architects and Technical Sales teams, and systems administrators who need to understand the SAS RAID hardware and RAID software solutions supported in POWER8 processor-based systems.

Dynamic Oracle Performance Analytics: Using Normalized Metrics to Improve Database Speed

2018-12-06 O'Reilly Amazon

book

Roger Cornejo

data data-engineering oracle-database-solutions Analytics Big Data Oracle

Use an innovative approach that relies on big data and advanced analytical techniques to analyze and improve Oracle Database performance. The approach used in this book represents a step-change paradigm shift away from traditional methods. Instead of relying on a few hand-picked, favorite metrics, or wading through multiple specialized tables of information such as those found in an automatic workload repository (AWR) report, you will draw on all available data, applying big data methods and analytical techniques to help the performance tuner draw impactful, focused performance improvement conclusions. This book briefly reviews past and present practices, along with available tools, to help you recognize areas where improvements can be made. The book then guides you through a step-by-step method that can be used to take advantage of all available metrics to identify problem areas and work toward improving them. The method presented simplifies the tuning process and solves the problem of metric overload. You will learn how to: collect and normalize data, generate deltas that are useful in performing statistical analysis, create and use a taxonomy to enhance your understanding of problem performance areas in your database and its applications, and create a root cause analysis report that enables understanding of a specific performance problem and its likely solutions. What You'll Learn Collect and prepare metrics for analysis from a wide array of sources Apply statistical techniques to select relevant metrics Create a taxonomy to provide additional insight into problem areas Provide a metrics-based root cause analysis regarding the performance issue Generate an actionable tuning plan prioritized according to problem areas Monitor performance using database-specific normal ranges Who This Book Is For Professional tuners: responsible for maintaining the efficient operation of large-scale databases who wish to focus on analysis, who want to expand their repertoire to include a big data methodology and use metrics without being overwhelmed, who desire to provide accurate root cause analysis and avoid the cyclical fix-test cycles that are inevitable when speculation is used

IBM TS4500 R5 Tape Library Guide

2018-12-06 O'Reilly Amazon

book

Michael Engelbrecht Larry Coyne Simon Browne, Illarion Borisevich, Robert Beiderbeck

data data-engineering IBM Cloud Computing ELK Cyber Security

Abstract The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today’s and tomorrow’s data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 11 petabytes (PB) of uncompressed data in a single frame library or scale up to 2 PB per square foot to over 350 PB. The TS4500 offers the following benefits: High availability: Dual active accessors with integrated service bays reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to another 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for IBM TS1160 while also supporting TS1155, TS1150, and TS1140 tape drive: The TS1160 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1160 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The new TS1160 Tape Drive Model 60E delivers a dual 10 Gb or 25 Gb Ethernet host attachment interface that is optimized for cloud-based and hyperscale environments. The TS1160 Tape Drive Model 60F delivers a native data rate of 400 MBps, the same load/ready, locate speeds, and access times as the TS1155, and includes dual-port 16 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 8 tape drive: The LTO Ultrium 8 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 7, while still protecting your investment in the previous technology. Support of LTO 8 Type M cartridge (M8): The LTO Program is introducing a new capability with LTO-8 drives. The ability of the LTO-8 drive to write 9 TB on a brand new LTO-7 cartridge instead of 6 TB as specified by the LTO-7 format. Such a cartridge is called an LTO-7 initialized LTO-8 Type M cartridge. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), and command-line interface (CLI). You learn how to accomplish the following specific tasks: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame. Manage storage by using the ALMS feature. Improve business continuity and disaster recovery with dual active accessor, automatic control path failover, and data path failover. Help ensure security and regulatory compliance with tape-drive encryption and Write Once Read Many (WORM) media. Support IBM LTO Ultrium 8, 7, 6, and 5, IBM TS1160, TS1155, TS1150, and TS1140 tape drives. Provide a flexible upgrade path for users who want to expand their tape storage as their needs grow. Reduce the storage footprint and simplify cabling with 10 U of rack space on top of the library. This guide is for anyone who wants to understand more about the IBM TS4500 tape library. It is particularly suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Introducing the IBM DS8882F Rack Mounted Storage System

2018-12-06 O'Reilly Amazon

book

Sherry Brunson , Stephen Manthorpe , Bert Dufrasne

data data-engineering IBM

This IBM® Redpaper™ presents and positions the DS8882F. The DS8882F adds a modular rack-mountable enterprise storage system to the DS8880 family of all-flash enterprise storage systems. The modular system can be integrated into 16U contiguous space of an existing IBM z14™ Model ZR1 (z14 Model ZR1), IBM LinuxONE™ Rockhopper II (z14 Model LR1), or other standard 19-inch wide rack. The DS8882F allows you to take advantage of the performance boost of DS8880 all-flash enterprise systems and advanced features while limiting datacenter footprint and power infrastructure requirements.

Migrating to MariaDB: Toward an Open Source Database Solution

2018-12-06 O'Reilly Amazon

book

William Wood

data data-engineering relational-databases MySQL MariaDB Cyber Security

Mitigate the risks involved in migrating away from a proprietary database platform toward MariaDB’s open source database engine. This book will help you assess the risks and the work involved, and ensure a successful migration. Migrating to MariaDB describes the process and lessons learned during a migration from a proprietary database management engine to the MariaDB open source solution. The book discusses the drivers for making the decision and change, walking you through all aspects of the process from evaluating the licensing, navigating the pitfalls and hurdles of a migration, through to final implementation on the new platform. The book highlights the cost-effectiveness of MariaDB and how the licensing worries are simplified in comparison to running on a proprietary platform. You’ll learn to do your own risk assessment, to identify database and application code that may need to be modified or re-implemented, and to identify MariaDB features to provide the security and failover protection needed by corporate customers. Let the author’s experience in migrating a financial firm to MariaDB inform your own efforts, helping you to develop a road map for both technical and political success within your own organization as you migrate away from proprietary lock-in toward MariaDB’s open source solution. What You'll Learn Evaluate and compare licensing costs between proprietary databases and MariaDB Perform a proper risk assessment to inform your planning and execution of the migration Build a migration road map from the book’s example that is specific to your situation Make needed application changes and migrate data to the MariaDB open source database engine Who This Book Is For Technical professionals (including database administrators, programmers, and technical management) who are interested in migrating away from a proprietary database platform toward MariaDB’s open source database engine and need to assess the risks and the work involved

IBM DS8880 High-Performance Flash Enclosure Gen2

2018-12-04 O'Reilly Amazon

book

Tamas Toser , Axel Westphal , Stephen Manthorpe

data data-engineering IBM

This IBM® Redpaper™ publication describes the IBM DS8880 High-Performance Enclosure (HPFE) Gen2 architecture and configuration, as of DS8880 Release 8.51. The DS8880 HPFE Gen2 is a 2U Redundant Array of Independent Disks (RAID) flash enclosure with associated Flash RAID adapters that can be used exclusively with DS8880 models. The flash enclosure and Flash RAID adapters are installed in pairs. Each storage enclosure pair can support 16, 32, or 48 encryption-capable flash drives (2.5-inch, 63.5 mm form factor).

IBM Storage Networking SAN768C-6 Product Guide

2018-12-04 O'Reilly Amazon

book

Jon Tate

data data-engineering IBM Fabric Cyber Security

This IBM® Redbooks® Product Guide describes the IBM Storage Networking SAN768C-6. IBM Storage Networking SAN768C-6 has the industry's highest port density for a storage area network (SAN) director and features 768 line-rate 32 gigabits per second (Gbps) or 16 Gbps Fibre Channel ports. Designed to support multiprotocol workloads, IBM Storage Networking SAN768C-6 enables SAN consolidation and collapsed-core solutions for large enterprises, which reduces the number of managed switches and leads to easy-to-manage deployments. IBM Storage Networking SAN768C-6 supports the 48-Port 32 Gbps Fibre Channel Switching Module, the 48-Port 16 Gbps Fibre Channel Switching Module, the 48-port 10 Gbps FCoE Switching Module, the 24-port 40 Gbps FCoE switching module, and the 24/10-port SAN Extension Module. By reducing the number of front-panel ports that are used on inter-switch links (ISLs), it also offers room for future growth. IBM Storage Networking SAN768C-6 addresses the mounting storage requirements of today's large virtualized data centers. As a director-class SAN switch, IBM Storage Networking SAN768C-6 uses the same operating system and management interface as other IBM data center switches. It brings intelligent capabilities to a high-performance, protocol-independent switch fabric, and delivers uncompromising availability, security, scalability, simplified management, and the flexibility to integrate new technologies. You can use IBM Storage Networking SAN768C-6 to transparently deploy unified fabrics with Fibre Channel and Fibre Channel over Ethernet (FCoE) connectivity to achieve low total cost of ownership (TCO). For mission-critical enterprise storage networks that require secure, robust, cost-effective business-continuance services, the FCIP extension module is designed to deliver outstanding SAN extension performance, reducing latency for disk and tape operations with FCIP acceleration features, including FCIP write acceleration and FCIP tape write and read acceleration.

Hands-On Big Data Modeling

2018-11-30 O'Reilly Amazon

book

Tao Wei , Suresh Kumar Mukhiya , James Lee

data data-engineering data-models BI Big Data Data Management

This book, Hands-On Big Data Modeling, provides you with practical guidance on data modeling techniques, focusing particularly on the challenges of big data. You will learn the concepts behind various data models, explore tools and platforms for efficient data management, and gain hands-on experience with structured and unstructured data. What this Book will help me do Master the fundamental concepts of big data and its challenges. Explore advanced data modeling techniques using SQL, Python, and R. Design effective models for structured, semi-structured, and unstructured data types. Apply data modeling to real-world datasets like social media and sensor data. Optimize data models for performance and scalability in various big data platforms. Author(s) The authors of this book are experienced data architects and engineers with a strong background in developing scalable data solutions. They bring their collective expertise to simplify complex concepts in big data modeling, ensuring readers can effectively apply these techniques in their projects. Who is it for? This book is intended for data architects, business intelligence professionals, and any programmer interested in understanding and applying big data modeling concepts. If you are already familiar with basic data management principles and want to enhance your skills, this book is perfect for you. You will learn to tackle real-world datasets and create scalable models. Additionally, it is suitable for professionals transitioning to working with big data frameworks.

Hands-On Geospatial Analysis with R and QGIS

2018-11-30 O'Reilly Amazon

book

Shammunul Islam

data data-engineering location-data geographic-information-system-gis geographic information system (gis) AI/ML

Dive into the intricate world of geospatial data with "Hands-On Geospatial Analysis with R and QGIS". This book guides readers through managing, analyzing, and visualizing spatial data using the popular tools R and QGIS. Packed with practical examples, it empowers you to effectively handle GIS and remote sensing data in your projects. What this Book will help me do Understand how to install and set up R and QGIS environments for geospatial tasks. Learn the fundamentals of spatial data processing, including management, visualization, and analysis. Create compelling geospatial visualizations using R packages like ggplot2 and tools in QGIS. Master raster data handling and leverage the QGIS graphical modeler for automating geoprocessing tasks. Apply machine learning techniques to geospatial problems such as landslide susceptibility mapping using real-world data. Author(s) None Hamson and None Islam are experts in the field of geospatial analysis and provide practical, actionable insights throughout this book. With extensive experience in GIS and remote sensing technologies, they focus on guiding readers from basic principles to advanced applications. Their collaborative teaching style ensures clarity and accessibility for learners at different skill levels. Who is it for? This book is ideal for geographers, environmental scientists, and other professionals working with spatial data. Beginner to intermediate-level readers will find it approachable, with step-by-step instructions to build their expertise. While prior familiarity with R or QGIS can be helpful, it is not required. The book is tailored for those eager to expand their skills in geospatial data analysis and visualization.

Hands-On Data Science with SQL Server 2017

2018-11-29 O'Reilly Amazon

book

Vladimír Mužný , Marek Chmel

data data-engineering SQL Analytics Azure BI

In "Hands-On Data Science with SQL Server 2017," you will discover how to implement end-to-end data analysis workflows, leveraging SQL Server's robust capabilities. This book guides you through collecting, cleaning, and transforming data, querying for insights, creating compelling visualizations, and even constructing predictive models for sophisticated analytics. What this Book will help me do Grasp the essential data science processes and how SQL Server supports them. Conduct data analysis and create interactive visualizations using Power BI. Build, train, and assess predictive models using SQL Server tools. Integrate SQL Server with R, Python, and Azure for enhanced functionality. Apply best practices for managing and transforming big data with SQL Server. Author(s) Marek Chmel and Vladimír Mužný bring their extensive experience in data science and database management to this book. Marek is a seasoned database specialist with a strong background in SQL, while Vladimír is known for his instructional expertise in analytics and data manipulation. Together, they focus on providing actionable insights and practical examples tailored for data professionals. Who is it for? This book is an ideal resource for aspiring and seasoned data scientists, data analysts, and database professionals aiming to deepen their expertise in SQL Server for data science workflows. Beginners with fundamental SQL knowledge will find it a guided entry into data science applications. It is especially suited for those who aim to implement data-driven solutions in their roles while leveraging SQL's capabilities.

PostgreSQL 11 Server Side Programming Quick Start Guide

2018-11-29 O'Reilly Amazon

book

Luca Ferrari

data data-engineering relational-databases postgresql Data Management Java

PostgreSQL 11 Server Side Programming Quick Start Guide introduces you to the world of database programming directly at the database level. This book delves into the concepts of server-side programming, providing you with the necessary tools to author stored procedures, triggers, and extensions for your PostgreSQL instance. What this Book will help me do Learn how to create stored procedures and functions for efficient database logic. Understand how to use triggers and rules to maintain data integrity. Gain expertise in developing extensions to extend PostgreSQL functionality. Master techniques for handling inter-process communication and background workers. Explore custom data types and integration with programming languages like Java and Perl. Author(s) None Ferrari, a seasoned database administrator and developer, specializes in delivering insightful PostgreSQL training. With extensive experience in both database management and software development, None brings practical knowledge and real-world examples to guide readers through mastering PostgreSQL server-side programming. Who is it for? This book is tailored for database administrators, developers, and engineers who have a basic understanding of PostgreSQL and are looking to expand their knowledge into server-side programming. If you're aiming to implement advanced database functionality or streamline data management tasks in PostgreSQL, this book is for you. It is ideal for those who wish to apply database programming techniques to enterprise-grade challenges. Beginner-friendly but designed to empower professionals with actionable insights.

Learn QGIS - Fourth Edition

2018-11-27 O'Reilly Amazon

book

Anita Graser , Andrew Cutts

data data-engineering location-data geographic-information-system-gis geographic information system (gis) DataViz

Unlock the world of geospatial analysis and mapping with 'Learn QGIS.' This comprehensive guide takes you through the capabilities of QGIS 3.4, covering everything from data loading and styling to spatial analysis and plugin development. Geared towards beginners and seasoned GIS users alike, you'll gain hands-on expertise to master QGIS effectively and confidently. What this Book will help me do Load, edit, and manage geospatial data efficiently in QGIS 3.4 for impactful analysis. Create professional-grade maps with custom styling and data visualization techniques. Delve into the QGIS 3.4 processing toolbox, enhancing analysis workflows. Build bespoke QGIS plugins using Python and QT Designer for tailored solutions. Use QGIS 3.4's advanced features like 3D views and GeoPackage efficiently. Author(s) None Cutts and Anita Graser bring their extensive technical expertise to 'Learn QGIS.' None Cutts has a background in geospatial technologies and a focus on practical GIS applications. Anita Graser is a recognized QGIS expert, experienced in both software development and geospatial analysis. Together, they share their knowledge in an accessible style, ensuring readers of different levels can benefit. Who is it for? This book is ideal for developers, consultants, or GIS enthusiasts who want to expand their skills in using QGIS 3.4 for geospatial data analysis and mapping. Beginners looking to understand core QGIS capabilities will also find value. If you're aiming to develop professional maps and customize QGIS, this is the resource for you.

Introduction and Implementation of Data Reduction Pools and Deduplication

2018-11-26 O'Reilly Amazon

book

Dionysios Kalofonos Jon Tate Carsten Larsen, Atif Syed, Kendall Williams

data data-engineering IBM

Abstract Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® introduces Data Reduction Pools (DRP) and Deduplication powered by IBM Spectrum™ Virtualize, which are innovative storage features that deliver essential storage efficiency technologies and exceptional ease of use and performance, all integrated into a proven design. This book discusses Data Reduction Pools (DRP) and Deduplication and is intended for experienced storage administrators who are fully familiar with IBM Spectrum Virtualize, SAN Volume Controller, and the Storwize family of products.

IBM Power Systems E870C and E880C Technical Overview and Introduction

2018-11-14 O'Reilly Amazon

book

Scott Vetter , Volker Haug , Alexandre Bicas Caldeira

data data-engineering IBM Cloud Computing Linux Marketing

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power® System E870C (9080-MME) and IBM Power System E880C (9080-MHE) servers that support IBM AIX®, IBM i, and Linux operating systems. The objective of this paper is to introduce the major innovative Power E870C and Power E880C offerings and their relevant functions. The new Power E870C and Power E880C servers with OpenStack-based cloud management and open source automation enables clients to accelerate the transformation of their IT infrastructure for cloud while providing tremendous flexibility during the transition. In addition, the Power E870C and Power E880C models provide clients increased security, high availability, rapid scalability, simplified maintenance, and management, all while enabling business growth and dramatically reducing costs. The systems management capability of the Power E870C and Power E880C servers speeds up and simplifies cloud deployment by providing fast and automated VM deployments, prebuilt image templates, and self-service capabilities, all with an intuitive interface. Enterprise servers provide the highest levels of reliability, availability, flexibility, and performance to bring you a world-class enterprise private and hybrid cloud infrastructure. Through enterprise-class security, efficient built-in virtualization that drives industry-leading workload density, and dynamic resource allocation and management, the server consistently delivers the highest levels of service across hundreds of virtual workloads on a single system. The Power E870C and Power E880C server includes the cloud management software and services to assist with clients' move to the cloud, both private and hybrid. The following capabilities are included: Private cloud management with IBM Cloud PowerVC Manager, Cloud-based HMC Apps as a service, and open source cloud automation and configuration tooling for AIX Hybrid cloud support Hybrid infrastructure management tools Securely connect system of record workloads and data to cloud native applications IBM Cloud Starter Pack Flexible capacity on demand Power to Cloud Services This paper expands the current set of IBM Power Systems™ documentation by providing a desktop reference that offers a detailed technical description of the Power E870C and Power E880C systems. This paper does not replace the latest marketing materials and configuration tools. It is intended as another source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

Securing SQL Server: DBAs Defending the Database

2018-11-14 O'Reilly Amazon

book

Peter A Carter

data data-engineering relational-databases microsoft-sql-server GDPR/CCPA Cyber Security

Protect your data from attack by using SQL Server technologies to implement a defense-in-depth strategy for your database enterprise. This new edition covers threat analysis, common attacks and countermeasures, and provides an introduction to compliance that is useful for meeting regulatory requirements such as the GDPR. The multi-layered approach in this book helps ensure that a single breach does not lead to loss or compromise of confidential, or business sensitive data. Database professionals in today’s world deal increasingly with repeated data attacks against high-profile organizations and sensitive data. It is more important than ever to keep your company’s data secure. Securing SQL Server demonstrates how developers, administrators and architects can all play their part in the protection of their company’s SQL Server enterprise. This book not only provides a comprehensive guide to implementing the security model in SQLServer, including coverage of technologies such as Always Encrypted, Dynamic Data Masking, and Row Level Security, but also looks at common forms of attack against databases, such as SQL Injection and backup theft, with clear, concise examples of how to implement countermeasures against these specific scenarios. Most importantly, this book gives practical advice and engaging examples of how to defend your data, and ultimately your job, against attack and compromise. What You'll Learn Perform threat analysis Implement access level control and data encryption Avoid non-reputability by implementing comprehensive auditing Use security metadata to ensure your security policies are enforced Mitigate the risk of credentials being stolen Put countermeasures in place against common forms of attack Who This Book Is For Database administrators who need to understand and counteract the threat of attacks against their company’s data, and useful for SQL developers and architects

IBM DS8880 Thin Provisioning (Updated for Release 8.5)

2018-11-08 O'Reilly Amazon

book

Connie Riggins , Peter Kimmel , Andre Coelho , Bert Dufrasne

data data-engineering IBM

Ever-increasing storage demands have a negative effect on an organization's IT budget and complicate the overall storage infrastructure and management. Companies are looking at ways to use their storage resources more efficiently. Thin provisioning can help by reducing the amount of unused storage that is typically allocated to applications or users. Now available for the IBM® DS8880 for Fixed Block (FB) and Count Key Data (CKD) volumes, thin provisioning defers the allocation of actual space on the storage system until the time that the data must effectively be written to disk. This IBM Redpaper™ publication provides an overall understanding of how thin provisioning works on the IBM DS8880. It also provides insights into the functional design and its implementation on the DS8880 and includes illustrations for the configuration of thin-provisioned volumes from the DS GUI or the DS CLI. This edition applies to DS8880 Release 8.5 or later.

Business Models

2018-11-01 O'Reilly Amazon

book

Maurizio Massaro , John Dumay , Marco Montemari , Francesco Paolone , Christian Nielsen , Morten Lund

data data-engineering data-models

The growing body of research on business models draws upon a range of sub-disciplines, including strategic management, entrepreneurship, organization studies and management accounting. Business Models: A Research Overview provides a research map for business scholars, incorporating theoretical and applied perspectives.

Apache Hadoop 3 Quick Start Guide

2018-10-31 O'Reilly Amazon

book

Hrishikesh Vijay Karambelkar

data data-engineering Hadoop Analytics Big Data Data Analytics

Dive into the world of distributed data processing with the 'Apache Hadoop 3 Quick Start Guide.' This comprehensive resource equips you with the knowledge needed to handle large datasets effectively using Apache Hadoop. Learn how to set up and configure Hadoop, work with its core components, and explore its powerful ecosystem tools. What this Book will help me do Understand the fundamental concepts of Apache Hadoop, including HDFS, MapReduce, and YARN, and use them to store and process large datasets. Set up and configure Hadoop 3 in both developer and production environments to suit various deployment needs. Gain hands-on experience with Hadoop ecosystem tools like Hive, Kafka, and Spark to enhance your big data processing capabilities. Learn to manage, monitor, and troubleshoot Hadoop clusters efficiently to ensure smooth operations. Analyze real-time streaming data with tools like Apache Storm and perform advanced data analytics using Apache Spark. Author(s) The author of this guide, Vijay Karambelkar, brings years of experience working with big data technologies and Apache Hadoop in real-world applications. With a passion for teaching and simplifying complex topics, Vijay has compiled his expertise to help learners confidently approach Hadoop 3. His detailed, example-driven approach makes this book a practical resource for aspiring data professionals. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who aspire to dive into the field of big data. If you're new to Apache Hadoop or looking to upgrade your skills to include version 3, this guide is for you. A basic understanding of Java programming is recommended to make the most of the topics covered. Embark on this journey to enhance your career in data-intensive industries.

Mastering Apache Cassandra 3.x - Third Edition

2018-10-31 O'Reilly Amazon

book

Tejaswi Malepati , Aaron Ploetz

data data-engineering nosql-databases Cassandra Analytics Big Data

This expert guide, "Mastering Apache Cassandra 3.x," is designed for individuals looking to achieve scalable and fault-tolerant database deployment using Apache Cassandra. From mastering the foundational components of Cassandra architecture to advanced topics like clustering and analytics integration with Apache Spark, this book equips readers with practical, actionable skills. What this Book will help me do Understand and deploy Apache Cassandra clusters for fault-tolerant and scalable databases. Use advanced features of CQL3 to streamline database queries and operations. Optimize and configure Cassandra nodes to improve performance for demanding applications. Monitor and manage Cassandra clusters effectively using best practices. Combine Cassandra with Apache Spark to build robust data analytics pipelines. Author(s) None Ploetz and None Malepati are experienced technologists and software professionals with extensive expertise in distributed database systems and big data algorithms. They've combined their industry knowledge and teaching backgrounds to create accessible and practical guides for learners worldwide. Their collaborative work is focused on demystifying complex systems for maximum learning impact. Who is it for? This book is ideal for database administrators, software developers, and big data specialists seeking to expand their skill set into scalable data storage using Cassandra. Readers should have a basic understanding of database concepts and some programming experience. If you're looking to design robust databases optimized for modern big data use-cases, this book will serve as a valuable resource.

An Introduction to Cyber Modeling and Simulation

2018-10-30 O'Reilly Amazon

book

Jerry M. Couretas

data data-engineering data-models

Introduces readers to the field of cyber modeling and simulation and examines current developments in the US and internationally This book provides an overview of cyber modeling and simulation (M&S) developments. Using scenarios, courses of action (COAs), and current M&S and simulation environments, the author presents the overall information assurance process, incorporating the people, policies, processes, and technologies currently available in the field. The author ties up the various threads that currently compose cyber M&S into a coherent view of what is measurable, simulative, and usable in order to evaluate systems for assured operation. An Introduction to Cyber Modeling and Simulation provides the reader with examples of tools and technologies currently available for performing cyber modeling and simulation. It examines how decision-making processes may benefit from M&S in cyber defense. It also examines example emulators, simulators and their potential combination. The book also takes a look at corresponding verification and validation (V&V) processes, which provide the operational community with confidence in knowing that cyber models represent the real world. This book: Explores the role of cyber M&S in decision making Provides a method for contextualizing and understanding cyber risk Shows how concepts such the Risk Management Framework (RMF) leverage multiple processes and policies into a coherent whole Evaluates standards for pure IT operations, "cyber for cyber," and operational/mission cyber evaluations—"cyber for others" Develops a method for estimating both the vulnerability of the system (i.e., time to exploit) and provides an approach for mitigating risk via policy, training, and technology alternatives Uses a model-based approach An Introduction to Cyber Modeling and Simulation is a must read for all technical professionals and students wishing to expand their knowledge of cyber M&S for future professional work.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Apache Superset Quick Start Guide

Vertically Integrated Architectures: Versioned Data Models, Implicit Services, and Persistence-Aware Programming

IBM Tape Library Guide for Open Systems

Machine Learning with PySpark: With Natural Language Processing and Recommender Systems

Practical Apache Spark: Using the Scala API

IBM Power Systems RAID Solutions Introduction and Technical Overview

Dynamic Oracle Performance Analytics: Using Normalized Metrics to Improve Database Speed

IBM TS4500 R5 Tape Library Guide

Introducing the IBM DS8882F Rack Mounted Storage System

Migrating to MariaDB: Toward an Open Source Database Solution

IBM DS8880 High-Performance Flash Enclosure Gen2

IBM Storage Networking SAN768C-6 Product Guide

Hands-On Big Data Modeling

Hands-On Geospatial Analysis with R and QGIS

Hands-On Data Science with SQL Server 2017

PostgreSQL 11 Server Side Programming Quick Start Guide

Learn QGIS - Fourth Edition

Introduction and Implementation of Data Reduction Pools and Deduplication

IBM Power Systems E870C and E880C Technical Overview and Introduction

Securing SQL Server: DBAs Defending the Database

IBM DS8880 Thin Provisioning (Updated for Release 8.5)

Business Models

Apache Hadoop 3 Quick Start Guide

Mastering Apache Cassandra 3.x - Third Edition

An Introduction to Cyber Modeling and Simulation