data-engineering

Hands-On Graph Analytics with Neo4j

2020-08-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Estelle Scifo (Neo4j)

AI/ML Analytics Data Management Data Science Neo4j data graph-databases

This book is your gateway into the world of graph analytics with Neo4j, empowering you to reveal insights hidden in connected data. By diving into real-world examples, you'll learn how to implement algorithms to uncover relationships and patterns critical for applications such as fraud detection, recommendation systems, and more. What this Book will help me do Understand fundamental concepts of the Neo4j graph database, including nodes, relationships, and Cypher querying. Effectively explore and visualize data relationships, enhancing applications like search engines and recommendations. Gain proficiency in graph algorithms such as pathfinding and spatial search to solve key business challenges. Leverage Neo4j's Graph Data Science library for machine learning and predictive analysis tasks. Implement web applications that utilize Neo4j for scalable, production-ready graph data management. Author(s) None Scifo is an experienced author in graph technologies, extensively working with Neo4j. He brings practical knowledge and a hands-on approach to the forefront, making complex topics accessible to learners of all levels. Through his work, he continues to inspire readers to harness the power of connected data effectively. Who is it for? This book is perfect for professionals like data analysts, business analysts, graph analysts, and database developers aiming to delve into graph data. It caters to those seeking to solve problems through graph analytics, whether in fraud detection, recommendation systems, or other fields. Some prior experience with Neo4j is recommended for maximal benefit.

Microservices in SAP HANA XSA: A Guide to REST APIs Using Node.js

2020-08-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sergio Guerrero

API Cloud Computing JavaScript JSON SAP Cyber Security SQL data

Build enterprise-grade microservices in the SAP HANA Advanced Model (XSA). This book explains building scalable APIs in XSA and the benefits of building microservices with SAP HANA XSA. This book covers the cloud foundry (CF) architecture and how SAP HANA XSA follows the model. It begins with the details of the different architectural layers of applications hosted in XSA (specifically, microservices). Everything you need to know is presented, including analyzing requests, modularization, database ingestion, building JSON responses, and scaling your microservices. You will learn to use developmental tools such as the SAP WEB IDE, POSTMAN, and the SAP HANA Cockpit for XSA, including debugging examples on SAP HANA XSA with code snippets showing how microservices can be developed, debugged, scaled, and deployed on SAP HANA XSA. Microservices are divided into security and authentication, request handling, modularization of Node.js, and interaction with the SAP HANA database containers and response formatting. An end-to-end scenario is presented of a Node.js REST API that uses HTTP methods, concluding with deploying an SAP HANA XSA project to a production environment. This book is simple enough to help you implement a Node.js module in order to understand the development of microservices, and complex enough for architects to design their next business-ready solution integrating UAA security, application modularization, and an end-to-end REST API on SAP HANA XSA. What You Will Learn Know the definition and architecture of cloud foundry and its application on SAP HANA XSA Understand REST principles and different HTTP methods Explore microservices (Node.js) development Database interaction from Node (executing SQL statements and stored procedures) Who This Book Is For Architects designing business-ready solutions that integrate UAA security, application modularization, and an end-to-end REST API on SAP HANA XSA

RabbitMQ Essentials - Second Edition

2020-08-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by David Dossot , Lovisa Johansson

data rabbitmq streaming-messaging

Discover how to power your distributed and scalable applications using RabbitMQ in "RabbitMQ Essentials". This book provides a detailed journey into understanding and implementing message queuing architectures, guiding you from the basics through advanced techniques. Through a realistic case study, you'll gain the skills necessary to succeed with RabbitMQ. What this Book will help me do Understand the core concepts and architecture of RabbitMQ and message queuing. Learn how to configure and use RabbitMQ, including installation and plugin management. Master the use of channels, routing strategies, and exchange types for optimized message delivery. Apply strategies for ensuring message queue scalability and robust fault-tolerance. Gain insights and best practices directly from RabbitMQ experts for production-level deployment. Author(s) None Johansson and David Dossot bring a wealth of experience managing and deploying systems based on RabbitMQ. As part of CloudAMQP, they oversee the largest RabbitMQ installations globally. This book reflects their dedication to helping developers succeed with message queuing technology. Who is it for? This book is perfectly suited for developers and software engineers interested in designing scalable and distributed applications. Whether you're new to RabbitMQ or already familiar with microservices and message queuing, "RabbitMQ Essentials" provides clear guidance and real-world insights. Beginners will appreciate its accessible approach, while advanced developers will value its comprehensive coverage and best practices.

IBM z15 Configuration Setup

2020-08-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nelson Oliveira , Ryotaroh Sawada , Bill White , Octavian Lascu , Martin Söllig , Franco Pinto

IBM data

This IBM® Redbooks® publication helps you install, configure, and maintain the IBM z15™ (machine types 8561 and 8562) systems. The z15 systems offers new functions that require a comprehensive understanding of the available configuration options. This book presents configuration setup scenarios, and describes implementation examples in detail. This publication is intended for systems engineers, hardware planners, and anyone who needs to understand IBM Z® configuration and implementation. Readers should be familiar with IBM Z technology and terminology. For more information about the functions of the z15 systems, see IBM z15 Technical Introduction, SG24-8850, IBM z15 (8561) Technical Guide, SG24-8851 and IBM z15 (8562) Technical Guide, SG24-8852.

Ready-to-use Virtual Appliance for Hands-on IBM Spectrum Archive Evaluation

2020-08-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hiroshi Araki , Hiroyuki Miyoshi , Takeshi Ishimoto

IBM VirtualBox Virtual Machine data

IBM® Spectrum Archive Enterprise Edition for the IBM TS4500, IBM TS3500, IBM TS4300, and IBM TS3310 tape libraries provides seamless integration of IBM Linear Tape File System (LTFS) with IBM Spectrum® Scale by creating an LTFS tape tier. You can run any application that is designed for disk files on tape by using IBM Spectrum Archive. IBM Spectrum Archive can play an important role in reducing the cost of storage for data that does not need the access performance of primary disk. The IBM Spectrum Archive Virtual Appliance can be deployed in minutes and key features can be tried along with this user guide. The virtual machine (VM) has a pre-configured IBM Spectrum Scale and a virtual tape library that allows to quickly test the IBM Spectrum Archive features without connecting to a physical tape library. The virtual appliance is provided as a VirtualBox .ova file.

Red Hat OpenShift on Public Cloud with IBM Block Storage

2020-08-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

AWS Cloud Computing IBM cloud-storage data storage-repositories

The purpose of this document is to show how to install RedHat OpenShift Container Platform (OCP) on Amazon web services (AWS) public cloud with OpenShift installer, a method that is known as Installer-provisioned infrastructure (IPI). We also describe how to validate the installation of IBM container storage interface (CSI) driver on OCP 4.2 that is installed on AWS. This document also describes the installation of OCP 4.x on AWS with customization and OCP 4.x installation on IBM cloud. This document discusses how to provision internet small computer system interface (iSCSI) storage that is made available by IBM Spectrum® Virtualize for Public Cloud (SVPC) that is deployed on AWS. Finally, the document discusses the use of Red Hat OpenShift command line interface (CLI), OCP web console graphical user interface (GUI), and AWS console.

Data Management at Scale

2020-07-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Piethein Strengholt

Analytics Data Governance Data Management DWH Master Data Management Cyber Security data data-warehouse storage-repositories

As data management and integration continue to evolve rapidly, storing all your data in one place, such as a data warehouse, is no longer scalable. In the very near future, data will need to be distributed and available for several technological solutions. With this practical book, you’ll learnhow to migrate your enterprise from a complex and tightly coupled data landscape to a more flexible architecture ready for the modern world of data consumption. Executives, data architects, analytics teams, and compliance and governance staff will learn how to build a modern scalable data landscape using the Scaled Architecture, which you can introduce incrementally without a large upfront investment. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed. Examine data management trends, including technological developments, regulatory requirements, and privacy concerns Go deep into the Scaled Architecture and learn how the pieces fit together Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata

Learning ArcGIS Pro 2 - Second Edition

2020-07-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tripp Corbin, GISP

GIS Python arcgis data geographic-information-system-gis location-data

Learning ArcGIS Pro 2 is your comprehensive guide to mastering the capabilities of ArcGIS Pro for geospatial analysis and cartography. You'll learn to create both 2D and 3D maps, edit and visualize geospatial data, and automate workflows using Python and ModelBuilder. This book provides the foundational skills you need to effectively work with GIS data and projects. What this Book will help me do Navigate the ArcGIS Pro interface to create, analyze, and share GIS projects efficiently. Visualize and interpret geographic data using 2D and 3D mapping techniques. Use Arcade language to customize labels and symbology for better map clarity. Automate GIS workflows through Python scripts and ModelBuilder for increased efficiency. Create and share professional-quality map layouts and series with ease. Author(s) Tripp Corbin, GISP, is a GIS Professional with extensive experience in geographic data analysis and ArcGIS software. As a seasoned instructor and author, Tripp aims to make GIS accessible by breaking down complex topics into manageable concepts. His hands-on teaching approach is reflected throughout this book, providing clear guidance and practical knowledge. Who is it for? This book is ideal for beginner GIS enthusiasts or professionals looking to transition to ArcGIS Pro. It is well-suited for those with minimal exposure to GIS or no prior experience with ArcGIS software. Whether you aim to explore geospatial concepts or acquire skills for professional applications, this book provides a solid foundation.

IBM Storage Solutions for SAP Applications Version 1.4

2020-07-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

IBM Linux SAP data

This IBM® Redpaper™ publication is intended as an architecture and configuration guide to set up the IBM System Storage™ for the SAP HANA tailored data center integration (SAP HANA TDI) within a storage area network (SAN) environment. SAP HANA TDI allows the SAP customer to attach external storage to the SAP HANA server. The paper also describes the setup and configuration of SAP Landscape Management for SAP HANA systems on IBM infrastructure components: IBM Power Systems and IBM Storage based on IBM Spectrum® Virtualize. This document is written for IT technical specialists and architects with advanced skill levels on SUSE Linux Enterprise Server or Red Hat Enterprise Linux (RHEL) and IBM System Storage. This document provides the necessary information to select, verify, and connect IBM System Storage to the SAP HANA server through a Fibre Channel-based SAN. The recommendations in this Blueprint apply to single-node and scale-out configurations, and Intel and IBM Power based SAP HANA systems.

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 4

2020-07-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

Cloud Computing IBM data

This IBM® Blueprint is intended to facilitate the deployment of IBM Storage for Red Hat OpenShift Container Platform by using detailed hardware specifications to build a system. It describes the associated parameters for configuring persistent storage within a Red Hat OpenShift Container Platform environment. To complete the tasks, you should understand Red Hat OpenShift, IBM Storage, the IBM block storage Container Storage Interface (CSI) driver and the IBM Spectrum Scale CSI driver. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Storwize® or IBM FlashSystem® storage devices, ESS and Spectrum Scale are supported and entitled, and where the issues are not specific to a blueprint implementation. IBM Storage Suite for IBM Cloud® Paks is an offering bundle that includes software-defined storage from both IBM and Red Hat. Use this document for details on how to deploy IBM Storage product licenses obtained through Storage Suite for Cloud Paks (IBM Spectrum® Virtualize and IBM Spectrum Scale).

Learning Spark, 2nd Edition

2020-07-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Brooke Wenig , Jules S. Damji (Anyscale Inc) , Tathagata Das (Databricks)

AI/ML Analytics API Avro CSV Data Analytics Delta Hive Java JSON Kafka ORC +9 more

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

Learning RSLogix 5000 Programming - Second Edition

2020-07-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Austin Scott

data log-data

Learning RSLogix 5000 Programming helps you master the features of Studio 5000 and the Logix platform for developing advanced PLC-based automation solutions. You will learn how to apply efficient industrial automation programming techniques and discover how to implement cybersecurity best practices on Rockwell Automation systems. What this Book will help me do Gain comprehensive knowledge of Rockwell Automation's Logix platform, including ControlLogix and CompactLogix systems. Learn to program using Ladder Diagram, Function Block Diagram, Structured Text, and Sequential Function Chart in Studio 5000. Understand and configure Rockwell Automation industrial networking and communication protocols. Design and implement secure automation projects following cybersecurity best practices. Develop practical skills by creating advanced projects like a robot bartender control system. Author(s) Austin Scott is an experienced automation engineer with a passion for teaching advanced PLC programming. With years of experience working on Rockwell Automation technologies, Austin provides clear and thorough instructions to help readers develop robust PLC solutions efficiently. He brings practical insights and real-world applications to his expertly crafted guides. Who is it for? This book is ideal for PLC programmers, electricians, and automation professionals seeking to learn or enhance their skills using RSLogix 5000 and Studio 5000. If you have basic PLC knowledge but are new to Rockwell Automation software, this book will guide you step-by-step in mastering these tools. It is also valuable for those seeking to quickly gain expertise in troubleshooting and secure programming within industrial automation.

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

2020-06-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert Ilijason

AI/ML Analytics AWS Azure Big Data Cloud Computing Confluence Data Analytics Databricks Hadoop Hive Microsoft +5 more

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster. This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything aboutconfiguring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data. This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned. What You Will Learn Discover the value of big data analytics that leverage the power of the cloud Get started with Databricks using SQL and Python in either Microsoft Azure or AWS Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture See how these tools are used in the real world Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free Who This Book Is For Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

Spark in Action, Second Edition

2020-06-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jean-Georges Perrin (Actian)

AI/ML Analytics API Big Data ELK GitHub Hadoop IBM Java Python Scala Spark +4 more

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop. About the Technology Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem. About the Book Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms. What's Inside Writing Spark applications in Java Spark application architecture Ingestion through files, databases, streaming, and Elasticsearch Querying distributed datasets with Spark SQL About the Reader This book does not assume previous experience with Spark, Scala, or Hadoop. About the Author Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years. Quotes This book reveals the tools and secrets you need to drive innovation in your company or community. - Rob Thomas, IBM An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing. - Anupam Sengupta, GuardHat Inc. This book will help spark a love affair with distributed processing. - Conor Redmond, InComm Product Control Currently the best book on the subject! - Markus Breuer, Materna IPS

Converting Adabas to IBM DB2 for z/OS with ConsistADS

2020-06-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Paolo Bruni

IBM data ibm-db2 relational-databases

Consist Advanced Development Solution (ConsistADS) is an end-to-end conversion solution that conversion and transparency methods for migrating to IBM® DB2® for z/OS® software. The solution includes DB2 for z/OS and several DB2 tools as part of the package. This IBM Redpaper™ publication explains the Natural and Adabas conversion to DB2 for z/OS by using ConsistADS. It includes prerequisite technical assessment requirements and conversion challenges. It also describes a real customer conversion scenario that was provided by the IBM Business Partners that facilitated these conversions for customers. Originally published in 2015, this paper has been updated in 2020 to include additional information about ConsistADS.

SQL Server on Azure Virtual Machines

2020-06-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Joey D'Antoni , Tim Radney , Randolph West , Anthony Nocentino (Pure Storage) , Allan Hirt , John Martin , Louis Davidson

Azure Cloud Computing Linux Microsoft SQL azure-sql-database data relational-databases

Would you like to master deploying SQL Server in the cloud using Microsoft's Azure platform? With the hands-on guidance in this book, you'll explore how to set up and configure SQL Server on Azure Virtual Machines effectively. By the end, you'll have the knowledge to optimize, manage, and deploy your solutions. What this Book will help me do Understand platform availability for SQL Server in Azure Explore SQL Server IaaS and optimize its configuration Master deploying SQL Server on Linux and Windows in Azure Configure high-performance storage options tailored to SQL Server Learn disaster recovery strategies for SQL Server in Azure Author(s) Joey D'Antoni, Louis Davidson, Allan Hirt, and their co-authors bring years of experience in database management, cloud architecture, and technical writing. They aim to provide clear and actionable advice for working efficiently with SQL Server on Azure. Their insights come from real-world projects. Who is it for? This book is for developers, database administrators, and cloud architects who are looking to learn how to deploy SQL Server solutions on Azure Virtual Machines. If you are transitioning workloads to the cloud or need to manage or optimize such environments, this book will equip you with the skills you need. Basic SQL Server knowledge is helpful.

Data Lakes

2020-06-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dominique Laurent , Anne Laurent , Cédrine Madera

Data Lake data data-lake storage-repositories

The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata – supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.

Best practices and Getting Started Guide for Oracle on IBM LinuxONE

2020-06-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by David J Simpson , Paul Novak , Sam Amsavelu

IBM Linux Oracle data oracle-database-solutions

IBM® is a Platinum level Partner in the Oracle Partner Network, which delivers the proven combination of industry insight, extensive real-world Oracle applications experience, deep technical skills, and high-performance servers and storage to create a complete business solution with a defined return on investment. From application selection, purchase, and implementation to upgrade and maintenance, we help organizations reduce the total cost of ownership and the complexity of managing their current and future applications environment while building a solid base for business growth. Oracle Database running on Linux is available for deployment on IBM LinuxONE by using Redhat Enterprise Linux (RHEL) or SUSE Linux Enterprise Server (SLES). This enterprise-grade solution is designed to add value to Oracle Database solutions. This IBM Redpaper® publication focuses on accepted good practices for installing and getting started by using Oracle Database, which provides you with an environment that is optimized for performance, scalability, flexibility, and ease-of-management.

IBM GDPS Family: An Introduction to Concepts and Capabilities

2020-05-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lydia Parziale

IBM data

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform

2020-05-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Benjamin Weissman , Enrico van de Laar

AI/ML Analytics BI Big Data Cloud Computing Data Analytics Data Lake HDFS Kubernetes Linux Spark SQL +4 more

Use this guide to one of SQL Server 2019’s most impactful features—Big Data Clusters. You will learn about data virtualization and data lakes for this complete artificial intelligence (AI) and machine learning (ML) platform within the SQL Server database engine. You will know how to use Big Data Clusters to combine large volumes of streaming data for analysis along with data stored in a traditional database. For example, you can stream large volumes of data from Apache Spark in real time while executing Transact-SQL queries to bring in relevant additional data from your corporate, SQL Server database. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will Learn Install, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it wererelational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For Data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environments

talk-data.com

Activity Trend

Top Events

Top Speakers

Hands-On Graph Analytics with Neo4j

Microservices in SAP HANA XSA: A Guide to REST APIs Using Node.js

RabbitMQ Essentials - Second Edition

IBM z15 Configuration Setup

Ready-to-use Virtual Appliance for Hands-on IBM Spectrum Archive Evaluation

Red Hat OpenShift on Public Cloud with IBM Block Storage

Data Management at Scale

Learning ArcGIS Pro 2 - Second Edition

IBM Storage Solutions for SAP Applications Version 1.4

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 4

Learning Spark, 2nd Edition

Learning RSLogix 5000 Programming - Second Edition

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

Spark in Action, Second Edition

Converting Adabas to IBM DB2 for z/OS with ConsistADS

SQL Server on Azure Virtual Machines

Data Lakes

Best practices and Getting Started Guide for Oracle on IBM LinuxONE

IBM GDPS Family: An Introduction to Concepts and Capabilities

SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform