talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3377

Collection of O'Reilly books on Data Engineering.

Filtering by: data-engineering ×

Sessions & talks

Showing 701–725 of 3377 · Newest first

Search within this event →
SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

Get up to speed on the game-changing developments in SQL Server 2019. No longer just a database engine, SQL Server 2019 is cutting edge with support for machine learning (ML), big data analytics, Linux, containers, Kubernetes, Java, and data virtualization to Azure. This is not a book on traditional database administration for SQL Server. It focuses on all that is new for one of the most successful modernized data platforms in the industry. It is a book for data professionals who already know the fundamentals of SQL Server and want to up their game by building their skills in some of the hottest new areas in technology. SQL Server 2019 Revealed begins with a look at the project's team goal to integrate the world of big data with SQL Server into a major product release. The book then dives into the details of key new capabilities in SQL Server 2019 using a “learn by example” approach for Intelligent Performance, security, mission-criticalavailability, and features for the modern developer. Also covered are enhancements to SQL Server 2019 for Linux and gain a comprehensive look at SQL Server using containers and Kubernetes clusters. The book concludes by showing you how to virtualize your data access with Polybase to Oracle, MongoDB, Hadoop, and Azure, allowing you to reduce the need for expensive extract, transform, and load (ETL) applications. You will then learn how to take your knowledge of containers, Kubernetes, and Polybase to build a comprehensive solution called Big Data Clusters, which is a marquee feature of 2019. You will also learn how to gain access to Spark, SQL Server, and HDFS to build intelligence over your own data lake and deploy end-to-end machine learning applications. What You Will Learn Implement Big Data Clusters with SQL Server, Spark, and HDFS Create a Data Hub with connections to Oracle, Azure, Hadoop, and other sources Combine SQL and Spark to build a machine learning platform for AI applications Boost your performance with no application changes using Intelligent Performance Increase security of your SQL Server through Secure Enclaves and Data Classification Maximize database uptime through online indexing and Accelerated Database Recovery Build new modern applications with Graph, ML Services, and T-SQL Extensibility with Java Improve your ability to deploy SQL Server on Linux Gain in-depth knowledge to run SQL Server with containers and Kubernetes Know all the new database engine features for performance, usability, and diagnostics Use the latest tools and methods to migrate your database to SQL Server 2019 Apply your knowledge of SQL Server 2019 to Azure Who This Book Is For IT professionals and developers who understand the fundamentals of SQL Server and wish to focus on learning about the new, modern capabilities of SQL Server 2019. The book is for those who want to learn about SQL Server 2019 and the new Big Data Clusters and AI feature set, support for machine learning and Java, how to run SQL Server with containers and Kubernetes, and increased capabilities around Intelligent Performance, advanced security, and high availability.

Cognitive Computing Featuring the IBM Power System AC922

This IBM® Redpaper publication describes the advantages of using IBM Power System AC922 for cognitive solutions, and how it can enhance clients' businesses. In order to optimize the hardware and software, IBM partners with NVIDIA, Mellanox, H2O.ai, SQream, Kinetica, and other prominent companies to design the Power AC922 server, specifically enhanced for the cognitive era. Most of its outstanding hardware features, such as NVIDIA NVLink 2.0 and PCIe 4.0, are described in this publication to illustrate the advantages that clients can realize in comparison with IBM competitors. We also include a brief description about what cognitive computing is, and how to use IBM Watson® Machine Learning cognitive solutions to bring more value to your business ecosystem. Additionally, we show performance charts that show the advantages of using Power AC922 versus x86 competitors. In the last chapter, we describe the most remarkable use cases in which IBM solves real problems using cognitive solutions. This IBM Redpaper publication is aimed at IT technical audiences, especially decision-making levels that need a full look at the benefits and improvements that an IBM Cognitive Solution can offer. It also provides valuable information to data science professionals, enabling them to plan their modeling needs. Finally, it offers information to the infrastructure support group in charge of maintaining the solution.

IBM Spectrum Scale Erasure Code Edition: Planning and Implementation Guide

This IBM® Redpaper introduces the IBM Spectrum® Scale Erasure Code Edition (ECE) as a scalable, high-performance data and file management solution. ECE is designed to run on any commodity server that meets the ECE minimum hardware requirements. ECE provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale with the added benefit of network-dispersed IBM Spectrum Scale RAID, which provides data protection, storage efficiency, and the ability to manage storage in hyperscale environments that are composed from commodity hardware. In this publication, we explain the benefits of ECE and the use cases where we believe it fits best. We also provide a technical introduction to IBM Spectrum Scale RAID. Next, we explain the key aspects of planning an installation, provide an example of an installation scenario, and describe the key aspects of day-to-day management and a process for problem determination. We conclude with an overview of possible enhancements that are being considered for future versions of IBM Spectrum Scale Erasure Code Edition. Overall knowledge of IBM Spectrum Scale Erasure Code Edition is critical to planning a successful storage system deployment. This paper is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost effective storage solutions. The goal of this paper is to describe the benefits of using IBM Spectrum Scale Erasure Code Edition for the creation of high performing storage systems.

Mastering Spark with R

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems. Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users. Analyze, explore, transform, and visualize data in Apache Spark with R Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows Perform analysis and modeling across many machines using distributed computing techniques Use large-scale data from multiple sources and different formats with ease from within Spark Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions

IBM z15 Technical Introduction

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform, the IBM z15™ (machine type 8561). It includes information about the Z environment and how it helps integrate data and transactions more securely. It also provides insight for faster and more accurate business decisions. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, which is in an industry-standard footprint. The z15 system excels at the following tasks: Using multicloud integration services Securing data with pervasive encryption Providing resilience with key to zero downtime Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 1

IBM Storage for Red Hat OpenShift is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift environment. This blueprint includes Red Hat OpenShift Container Platform and uses Container Storage Interface (CSI) standards. IBM Storage brings enterprise data services to containers. In this blueprint, learn how to: · Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! · Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform utilizing new open source Container Storage interface (CSI) drivers · Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform is designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

IBM Storage Solutions for SAP Applications Version 1.3

This paper is intended as an architecture and configuration guide to set up the IBM® System Storage® for the SAP HANA tailored data center integration (SAP HANA TDI) within a storage area network (SAN) environment. SAP HANA TDI allows the SAP customer to attach external storage to the SAP HANA server. The paper also describes the setup and configuration of SAP Landscape Management for SAP HANA systems on IBM infrastructure components: IBM Power Systems™ and IBM Storage based on IBM Spectrum™ Virtualize. This document is written for IT technical specialists and architects with advanced skill levels on SUSE Linux Enterprise Server (SLES) or Red Hat Enterprise Linux (RHEL) and IBM System Storage. This document provides the necessary information to select, verify, and connect IBM System Storage to the SAP HANA server through a Fibre Channel-based SAN. The recommendations in this Blueprint apply to single-node and scale-out configurations, and Intel and IBM Power based SAP HANA systems.

Database Internals

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals. Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed. This book examines: Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency

Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries

Apply the new Query Store feature to identify and fix poorly performing queries in SQL Server. Query Store is an important and recent feature in SQL Server that provides insight into the details of query execution and how that execution has changed over time. Query Store helps to identify queries that aren’t performing well, or that have regressed in their performance. Query Store provides detailed information such as wait stats that you need to resolve root causes, and it allows you to force the use of a known good execution plan. With SQL Server 2017 and later you can automate the correction of regressions in performance. Query Store for SQL Server 2019 helps you protect your database’s performance during upgrades of applications or version of SQL Server. The book provides fundamental information on how Query Store works and best practices for implementation and use. You will learn to run and interpret built-in reports, configure automatic plan correction, and troubleshoot queries using Query Store when needed. Query Store for SQL Server 2019 helps you master Query Store and bring value to your organization through consistent query execution times and automate correction of regressions. What You'll Learn Apply best practices in implementing Query Store on production servers Detect and correct regressions in query performance Lower the risk of performance degradation following an upgrade Use tools and techniques to get the most from Query Store Automate regression correction and other uses of Query Store Who This Book Is For SQL Server developers and administrators responsible for query performance on SQL Server. Anyone responsible for identifying poorly performing queries will be able to use Query Store to find these queries and resolve the underlying issues.

IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage

This IBM® Redpaper publication provides a comprehensive overview of the IBM Spectrum® Discover metadata management software platform. We give a detailed explanation of how the product creates, collects, and analyzes metadata. Several in-depth use cases are used that show examples of analytics, governance, and optimization. We also provide step-by-step information to install and set up the IBM Spectrum Discover trial environment. More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data such as: Pinpointing and activating relevant data for large-scale analytics Lacking the fine-grained visibility that is needed to map data to business priorities Removing redundant, obsolete, and trivial (ROT) data Identifying and classifying sensitive data IBM Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage. IBM Spectrum Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.

IBM Storage for Red Hat OpenShift Container Platform V3.11 Blueprint Version 1 Release 1

IBM Storage for Red Hat OpenShift Container Platform is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift Container Platform V3.11 environment. IBM Storage, bringing enterprise data services to containers. In this blueprint, learn how to: • Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! • Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform • Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform: designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

SAP ABAP Objects: A Practical Guide to the Basics and Beyond

Understand ABAP objects—the object-oriented extension of the SAP language ABAP—in the latest release of SAP NetWeaver 7.5, and its newest advancements. This book begins with the programming of objects in general and the basics of the ABAP language that a developer needs to know to get started. The most important topics needed to perform daily support jobs and ensure successful projects are covered. ABAP is a vast community with developers working in a variety of functional areas. You will be able to apply the concepts in this book to your area. SAP ABAP Objects is goal directed, rather than a collection of theoretical topics. It doesn't just touch on the surface of ABAP objects, but goes in depth from building the basic foundation (e.g., classes and objects created locally and globally) to the intermediary areas (e.g., ALV programming, method chaining, polymorphism, simple and nested interfaces), and then finally into the advanced topics (e.g., shared memory, persistent objects). You will know how to use best practices to make better programs via ABAP objects. What You’ll Learn Know the latest advancements in ABAP objects with the new SAP Netweaver system Understand object-oriented ABAP classes and their components Use object creation and instance-methods calls Be familiar with the functions of the global class builder Be exposed to advanced topics Incorporate best practices for making object-oriented ABAP programs Who This Book Is For ABAP developers, ABAP programming analysts, and junior ABAP developers. Included are: ABAP developers for all modules of SAP, both new learners and developers with some experience or little programming experience in general; students studying ABAP at the college/university level; senior non-ABAP programmers with considerable experience who are willing to switch to SAP/ABAP; and any functional consultants who want or have recently switched to ABAP technical.

IBM Power Systems Enterprise AI Solutions

This IBM® Redpaper publication helps the line of business (LOB), data science, and information technology (IT) teams develop an information architecture (IA) for their enterprise artificial intelligence (AI) environment. It describes the challenges that are faced by the three roles when creating and deploying enterprise AI solutions, and how they can collaborate for best results. This publication also highlights the capabilities of the IBM Cognitive Systems and AI solutions: IBM Watson® Machine Learning Community Edition IBM Watson Machine Learning Accelerator (WMLA) IBM PowerAI Vision IBM Watson Machine Learning IBM Watson Studio Local IBM Video Analytics H2O Driverless AI IBM Spectrum® Scale IBM Spectrum Discover This publication examines the challenges through five different use case examples: Artificial vision Natural language processing (NLP) Planning for the future Machine learning (ML) AI teaming and collaboration This publication targets readers from LOBs, data science teams, and IT departments, and anyone that is interested in understanding how to build an IA to support enterprise AI development and deployment.

IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Storage System: Host Attachment and Interoperability

This IBM® Redbooks® publication provides information for attaching the IBM FlashSystem® A9000, IBM FlashSystem A9000R, and IBM XIV® Storage System to various host operating system platforms, such as IBM AIX® and Microsoft Windows. This publication was last updated in May 2019 to cover the VLAN tagging and port trunking support available with software version 12.3.2 (see in particular section 2.4, "VLAN tagging" on page 67. The goal is to give an overview of the versatility and compatibility of the IBM Spectrum™ Accelerate family of storage systems with various platforms and environments. The information that is presented here is not meant as a replacement or substitute for the IBM Storage Host Attachment Kit publications or other product publications. It is meant as a complement and to provide usage guidance and practical illustrations. This publication does not address attachments to a secondary system used for Remote Mirroring or data migration. These topics are covered in IBM FlashSystem A9000 and IBM FlashSystem A9000 and A9000R Business Continuity Solutions, REDP-5401.

Enhanced Cyber Security with IBM Spectrum Scale and IBM QRadar

Having appropriate storage for hosting business-critical data and advanced Security Information and Event Management software for deep inspection, detection, and prioritization of threats has become a necessity of any business. This IBM® Redpaper publication explains how the storage features of IBM Spectrum® Scale, combined with the log analysis, deep inspection, and detection of threats provided by IBM QRadar®, helps reduce the impact of incidents on business data. Such integration provides an excellent platform for hosting unstructured business data that is subject to regulatory compliance requirements. This paper describes how IBM Spectrum Scale file audit logging can be integrated with IBM QRadar. Using QRadar, an administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data stored on IBM Spectrum Scale. When the threats are identified, you can quickly act on them to mitigate or reduce the impact of incidents. This paper is intended for chief technology officers, solution engineers, security architects, and systems administrators. NOTE: This paper assumes a basic understanding of IBM Spectrum Scale, IBM QRadar, and their administration.

Practical Data Science with SAP

Learn how to fuse today's data science tools and techniques with your SAP enterprise resource planning (ERP) system. With this practical guide, SAP veterans Greg Foss and Paul Modderman demonstrate how to use several data analysis tools to solve interesting problems with your SAP data. Data engineers and scientists will explore ways to add SAP data to their analysis processes, while SAP business analysts will learn practical methods for answering questions about the business. By focusing on grounded explanations of both SAP processes and data science tools, this book gives data scientists and business analysts powerful methods for discovering deep data truths. You'll explore: Examples of how data analysis can help you solve several SAP challenges Natural language processing for unlocking the secrets in text Data science techniques for data clustering and segmentation Methods for detecting anomalies in your SAP data Data visualization techniques for making your data come to life

Cyber Resiliency Solution for IBM Spectrum Scale

This document is intended to facilitate the deployment of the Cyber Resiliency solution for IBM® Spectrum Scale. This solution is designed to protect the data on IBM Spectrum™ Scale from external cyberattacks or insider attacks using its integration with IBM Spectrum Protect™ and IBM Tape Storage. To complete the tasks that it describes, you must understand IBM Spectrum Scale™, IBM Spectrum Protect, and IBM Tape Storage architecture, concepts, and configuration. The information in this document is distributed on an as-is basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM Spectrum Protect are supported and entitled, and where the issues are specific to a blueprint implementation.

IBM PowerHA SystemMirror V7.2.3 for IBM AIX and V7.22 for Linux

This IBM® Redbooks® publication helps strengthen the position of the IBM PowerHA® SystemMirror® for Linux solution with well-defined and documented deployment models within an IBM Power Systems™ environment, which provides customers a planned foundation for business resilience and disaster recovery (DR) for their IBM Power Systems infrastructure solutions. This book addresses topics to help answer customers' complex high availability (HA) and DR requirements for IBM AIX® and Linux on IBM Power Systems servers to help maximize system availability and resources and provide technical documentation to transfer the how-to-skills to users and support teams. This publication is targeted at technical professionals (consultants, technical support staff, IT architects, and IT specialists) who are responsible for providing HA and DR solutions and support for IBM PowerHA SystemMirror for AIX and Linux Standard and Enterprise Editions on IBM Power Systems servers.

Implementing SAP S/4HANA: A Framework for Planning and Executing SAP S/4HANA Projects

Gain a better understanding of implementing SAP S/4HANA-based digital transformations. This book helps you understand the various components involved in the planning and execution of successful SAP S/4HANA projects. Learn how to ensure success by building a solid business case for SAP S/4HANA up front and track business value generated throughout the implementation. Implementing SAP S/4HANA provides a framework for planning and executing SAP S/4HANA projects by articulating the implementation approach used by different components in SAP S/4HANA implementations. Whether you are mid-way through the SAP S/4HANA program or about to embark on it, this book will help you throughout the journey. If you are looking for answers on why SAP S/4HANA requires special considerations as compared to a traditional SAP implementation, this book is for you. What You Will Learn Understand various components of your SAP S/4HANA project Forecast and track your success throughout the SAP S/4HANA implementation Build a solid business case for your SAP S/4HANA program Discover how the implementation approach varies across these components Who This Book Is For SAP S/4HANA clients (line managers and consultants).

Analytic SQL in SQL Server 2014/2016

Business Intelligence (BI) has emerged as a field which seeks to support managers in decision-making. It encompasses the techniques, methods and tools for conducting analytically-based IT solutions, which are referred to as OLAP (OnLine Analytical Processing). Within this field, SQL has a role as a leader and is continuously evolving to cover both transactional and analytical data management. This book discusses the functions provided by Microsoft® SQL Server 2014/2016 in terms of business intelligence. The analytic functions are considered as an enrichment of the SQL language. They combine a series of practical functions to answer complex analysis requests with all the simplicity, elegance and acquired performance of the SQL language. Drawing on the wide experience of the author in teaching and research, as well as insights from contacts in the industry, this book focuses on the issues and difficulties faced by academics (students and teachers) and professionals engaged in data analysis with the SQL Server 2014/2016 database management system.

IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences

This IBM® Redpaper publication provides an update to the original description of IBM Reference Architecture for Genomics. This paper expands the reference architecture to cover all of the major vertical areas of healthcare and life sciences industries, such as genomics, imaging, and clinical and translational research. The architecture was renamed IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences to reflect the fact that it incorporates key building blocks for high-performance computing (HPC) and software-defined storage, and that it supports an expanding infrastructure of leading industry partners, platforms, and frameworks. The reference architecture defines a highly flexible, scalable, and cost-effective platform for accessing, managing, storing, sharing, integrating, and analyzing big data, which can be deployed on-premises, in the cloud, or as a hybrid of the two. IT organizations can use the reference architecture as a high-level guide for overcoming data management challenges and processing bottlenecks that are frequently encountered in personalized healthcare initiatives, and in compute-intensive and data-intensive biomedical workloads. This reference architecture also provides a framework and context for modern healthcare and life sciences institutions to adopt cutting-edge technologies, such as cognitive life sciences solutions, machine learning and deep learning, Spark for analytics, and cloud computing. To illustrate these points, this paper includes case studies describing how clients and IBM Business Partners alike used the reference architecture in the deployments of demanding infrastructures for precision medicine. This publication targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing life sciences solutions and support.

Simplify Management of IT Security and Compliance with IBM PowerSC in Cloud and Virtualized Environments

This IBM® Redbooks® publication provides a security and compliance solution that is optimized for virtualized environments on IBM Power Systems™ servers, running IBM PowerVM® and IBM AIX®. Security control and compliance are some of the key components that are needed to defend the virtualized data center and cloud infrastructure against ever evolving new threats. The IBM business-driven approach to enterprise security that is used with solutions, such as IBM PowerSC™, makes IBM the premier security vendor in the market today. The book explores, tests, and documents scenarios using IBM PowerSC that leverage IBM Power Systems servers architecture and software solutions from IBM to help defend the virtualized data center and cloud infrastructure against ever evolving new threats. This publication helps IT and Security managers, architects, and consultants to strengthen their security and compliance posture in a virtualized environment running IBM PowerVM.

Learn PySpark: Build Python-based Machine Learning and Deep Learning Models

Leverage machine and deep learning models to build applications on real-time data using PySpark. This book is perfect for those who want to learn to use this language to perform exploratory data analysis and solve an array of business challenges. You'll start by reviewing PySpark fundamentals, such as Spark’s core architecture, and see how to use PySpark for big data processing like data ingestion, cleaning, and transformations techniques. This is followed by building workflows for analyzing streaming data using PySpark and a comparison of various streaming platforms. You'll then see how to schedule different spark jobs using Airflow with PySpark and book examine tuning machine and deep learning models for real-time predictions. This book concludes with a discussion on graph frames and performing network analysis using graph algorithms in PySpark. All the code presented in the book will be available in Python scripts on Github. What You'll Learn Develop pipelines for streaming data processing using PySpark Build Machine Learning & Deep Learning models using PySpark latest offerings Use graph analytics using PySpark Create Sequence Embeddings from Text data Who This Book is For Data Scientists, machine learning and deep learning engineers who want to learn and use PySpark for real time analysis on streaming data.

Introducing MySQL Shell: Administration Made Easy with Python

Use MySQL Shell, the first modern and advanced client for connecting to and interacting with MySQL. It supports SQL, Python, and JavaScript. That’s right! You can write Python scripts and execute them within the shell interactively, or in batch mode. The level of automation available from Python combined with batch mode is especially helpful to those practicing DevOps methods in their database environments. Introducing MySQL Shell covers everything you need to know about MySQL Shell. You will learn how to use the shell for SQL, as well as the new application programming interfaces for working with a document store and even automating your management of MySQL servers using Python. The book includes a look at the supporting technologies and concepts such as JSON, schema-less documents, NoSQL, MySQL Replication, Group Replication, InnoDB Cluster, and more. MySQL Shell is the client that developers and databaseadministrators have been waiting for. Far more powerful than the legacy client, MySQL Shell enables levels of automation that are useful not only for MySQL, but in the broader context of your career as well. Automate your work and build skills in one of the most in-demand languages. With MySQL Shell, you can do both! What You'll Learn Use MySQL Shell with the newest features in MySQL 8 Discover what a Document Store is and how to manage it with MySQL Shell Configure Group Replication and InnoDB Cluster from MySQL Shell Understand the new MySQL Python application programming interfaces Write Python scripts for managing your data and the MySQL high availability features Who This Book Is For Developers and database professionals who want to automate their work and remain on the cutting edge of what MySQLhas to offer. Anyone not happy with the limited automation capabilities of the legacy command-line client will find much to like in this book on the MySQL Shell that supports powerful automation through the Python scripting language.