talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

Elasticsearch 7 Quick Start Guide

Elasticsearch 7 Quick Start Guide introduces the core capabilities of Elasticsearch, one of the most powerful distributed search and analytics tools available. Through this concise and practical guide, you will learn how to install, configure, and effectively utilize Elasticsearch while exploring its powerful features, including real-time search and data aggregation. What this Book will help me do Install and configure Elasticsearch to create secure and scalable deployments. Understand and utilize analyzers, filters, and mappings to optimize search results. Perform data aggregations using advanced techniques in metric and bucket operations. Identify and troubleshoot common Elasticsearch performance issues for smooth operation. Leverage best practices to ensure effective deployment in production environments. Author(s) None Srivastava and None Miller are experienced writers and technologists who bring real-world expertise in search systems and analytics. With practical backgrounds in distributed systems and data management, the authors deliver a straightforward and hands-on approach in their writing. They aim to make Elasticsearch concepts approachable and practical for developers and administrators alike. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who are seeking to implement Elasticsearch within their projects. It is particularly suited for those with basic to intermediate technical experience and a need for robust search and analytics solutions. If you're aiming to learn the fundamentals and acquire practical skills in Elasticsearch 7, this book will serve as an excellent resource for you.

Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries

Become an expert who can use window functions to solve T-SQL query problems. Replace slow cursors and self-joins with queries that are easy to write and perform better. This new edition provides expanded examples, including a chapter from the world of sports, and covers the latest performance enhancements through SQL Server 2019. Window functions are useful in analytics and business intelligence reporting. They came into full blossom with SQL Server 2012, yet they are not as well known and used as often as they ought to be. This group of functions is one of the most notable developments in SQL, and this book shows how every developer and DBA can benefit from their expressive power in solving day-to-day business problems. Once you begin using window functions, such as ROW_NUMBER and LAG, you will discover many ways to use them. You will approach SQL Server queries in a different way, thinking about sets of data instead of individual rows. Your querieswill run faster, be easier to write, and easier to deconstruct, maintain, and enhance in the future. Just knowing and using these functions is not enough. You also need to understand how to tune the queries. Expert T-SQL Window Functions in SQL Server clearly explains how to get the best performance. The book also covers the rare cases when older techniques are the best bet. What You Will Learn Solve complex query problems without cumbersome self-joins that run slowly and are difficult to read Create sliding windows in a result set for computing such as running totals and moving averages Return aggregate and detail data simultaneously from the same SELECT statement Compute lag and lead and other values that access data from multiple rows in a result set Understand the OVER clause syntax and how to control the window Avoid framing errors that can lead to unexpected results Who This Book Is For Anyone who writes T-SQL queries, including database administrators, developers, business analysts, and data scientists. Before reading this book, you should understand how to join tables, write WHERE clauses, and build aggregate queries.

Pro SQL Server 2019 Administration: A Guide for the Modern DBA

Use this comprehensive guide for the SQL Server DBA, covering all that practicing database administrators need to know to get their daily work done. Updated for SQL Server 2019, this edition includes coverage of new features such as Memory-optimized TempDB Metadata, and Always Encrypted with Secure Enclaves. Other new content includes coverage of Query Store, resumable index operations, installation on Linux, and containerized SQL. Pro SQL Server 2019 Administration takes DBAs on a journey that begins with planning their SQL Server deployment and runs through installing and configuring the instance, administering and optimizing database objects, and ensuring that data is secure and highly available. Finally, readers will learn how to perform advanced maintenance and tuning techniques. This book teaches you to make the most of new SQL Server 2019 functionality, including Data Discovery and Classification. The bookpromotes best-practice installation, shows how to configure for scalability and high workloads, and demonstrates the gamut of database-level maintenance tasks such as index maintenance, database consistency checks, and table optimizations. What You Will Learn Install and configure SQL Server on Windows through the GUI and with PowerShell Install and configure SQL Server on Linux and in Containers Optimize tables through in-memory OLTP, table partitioning, and the creation of indexes Secure and encrypt data to protect against embarrassing data breaches Ensure 24x7x365 access through high-availability and disaster recovery features Back up your data to ensure against loss, and recover data when needed Perform routine maintenance tasks such as database consistency checks Troubleshoot and solve performance problems inSQL queries and in the database engine Who This Book Is For SQL Server DBAs who manage on-premise installations of SQL Server. This book is also useful for DBAs who wish to learn advanced features such as Query Store, Extended Events, Distributed Replay, and Policy-Based Management, or those who need to install SQL Server in a variety of environments.

SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

Get up to speed on the game-changing developments in SQL Server 2019. No longer just a database engine, SQL Server 2019 is cutting edge with support for machine learning (ML), big data analytics, Linux, containers, Kubernetes, Java, and data virtualization to Azure. This is not a book on traditional database administration for SQL Server. It focuses on all that is new for one of the most successful modernized data platforms in the industry. It is a book for data professionals who already know the fundamentals of SQL Server and want to up their game by building their skills in some of the hottest new areas in technology. SQL Server 2019 Revealed begins with a look at the project's team goal to integrate the world of big data with SQL Server into a major product release. The book then dives into the details of key new capabilities in SQL Server 2019 using a “learn by example” approach for Intelligent Performance, security, mission-criticalavailability, and features for the modern developer. Also covered are enhancements to SQL Server 2019 for Linux and gain a comprehensive look at SQL Server using containers and Kubernetes clusters. The book concludes by showing you how to virtualize your data access with Polybase to Oracle, MongoDB, Hadoop, and Azure, allowing you to reduce the need for expensive extract, transform, and load (ETL) applications. You will then learn how to take your knowledge of containers, Kubernetes, and Polybase to build a comprehensive solution called Big Data Clusters, which is a marquee feature of 2019. You will also learn how to gain access to Spark, SQL Server, and HDFS to build intelligence over your own data lake and deploy end-to-end machine learning applications. What You Will Learn Implement Big Data Clusters with SQL Server, Spark, and HDFS Create a Data Hub with connections to Oracle, Azure, Hadoop, and other sources Combine SQL and Spark to build a machine learning platform for AI applications Boost your performance with no application changes using Intelligent Performance Increase security of your SQL Server through Secure Enclaves and Data Classification Maximize database uptime through online indexing and Accelerated Database Recovery Build new modern applications with Graph, ML Services, and T-SQL Extensibility with Java Improve your ability to deploy SQL Server on Linux Gain in-depth knowledge to run SQL Server with containers and Kubernetes Know all the new database engine features for performance, usability, and diagnostics Use the latest tools and methods to migrate your database to SQL Server 2019 Apply your knowledge of SQL Server 2019 to Azure Who This Book Is For IT professionals and developers who understand the fundamentals of SQL Server and wish to focus on learning about the new, modern capabilities of SQL Server 2019. The book is for those who want to learn about SQL Server 2019 and the new Big Data Clusters and AI feature set, support for machine learning and Java, how to run SQL Server with containers and Kubernetes, and increased capabilities around Intelligent Performance, advanced security, and high availability.

Cognitive Computing Featuring the IBM Power System AC922

This IBM® Redpaper publication describes the advantages of using IBM Power System AC922 for cognitive solutions, and how it can enhance clients' businesses. In order to optimize the hardware and software, IBM partners with NVIDIA, Mellanox, H2O.ai, SQream, Kinetica, and other prominent companies to design the Power AC922 server, specifically enhanced for the cognitive era. Most of its outstanding hardware features, such as NVIDIA NVLink 2.0 and PCIe 4.0, are described in this publication to illustrate the advantages that clients can realize in comparison with IBM competitors. We also include a brief description about what cognitive computing is, and how to use IBM Watson® Machine Learning cognitive solutions to bring more value to your business ecosystem. Additionally, we show performance charts that show the advantages of using Power AC922 versus x86 competitors. In the last chapter, we describe the most remarkable use cases in which IBM solves real problems using cognitive solutions. This IBM Redpaper publication is aimed at IT technical audiences, especially decision-making levels that need a full look at the benefits and improvements that an IBM Cognitive Solution can offer. It also provides valuable information to data science professionals, enabling them to plan their modeling needs. Finally, it offers information to the infrastructure support group in charge of maintaining the solution.

IBM Spectrum Scale Erasure Code Edition: Planning and Implementation Guide

This IBM® Redpaper introduces the IBM Spectrum® Scale Erasure Code Edition (ECE) as a scalable, high-performance data and file management solution. ECE is designed to run on any commodity server that meets the ECE minimum hardware requirements. ECE provides all the functionality, reliability, scalability, and performance of IBM Spectrum Scale with the added benefit of network-dispersed IBM Spectrum Scale RAID, which provides data protection, storage efficiency, and the ability to manage storage in hyperscale environments that are composed from commodity hardware. In this publication, we explain the benefits of ECE and the use cases where we believe it fits best. We also provide a technical introduction to IBM Spectrum Scale RAID. Next, we explain the key aspects of planning an installation, provide an example of an installation scenario, and describe the key aspects of day-to-day management and a process for problem determination. We conclude with an overview of possible enhancements that are being considered for future versions of IBM Spectrum Scale Erasure Code Edition. Overall knowledge of IBM Spectrum Scale Erasure Code Edition is critical to planning a successful storage system deployment. This paper is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost effective storage solutions. The goal of this paper is to describe the benefits of using IBM Spectrum Scale Erasure Code Edition for the creation of high performing storage systems.

Mastering Spark with R

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems. Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users. Analyze, explore, transform, and visualize data in Apache Spark with R Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows Perform analysis and modeling across many machines using distributed computing techniques Use large-scale data from multiple sources and different formats with ease from within Spark Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions

IBM z15 Technical Introduction

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform, the IBM z15™ (machine type 8561). It includes information about the Z environment and how it helps integrate data and transactions more securely. It also provides insight for faster and more accurate business decisions. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, which is in an industry-standard footprint. The z15 system excels at the following tasks: Using multicloud integration services Securing data with pervasive encryption Providing resilience with key to zero downtime Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 1

IBM Storage for Red Hat OpenShift is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift environment. This blueprint includes Red Hat OpenShift Container Platform and uses Container Storage Interface (CSI) standards. IBM Storage brings enterprise data services to containers. In this blueprint, learn how to: · Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! · Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform utilizing new open source Container Storage interface (CSI) drivers · Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform is designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

IBM Storage Solutions for SAP Applications Version 1.3

This paper is intended as an architecture and configuration guide to set up the IBM® System Storage® for the SAP HANA tailored data center integration (SAP HANA TDI) within a storage area network (SAN) environment. SAP HANA TDI allows the SAP customer to attach external storage to the SAP HANA server. The paper also describes the setup and configuration of SAP Landscape Management for SAP HANA systems on IBM infrastructure components: IBM Power Systems™ and IBM Storage based on IBM Spectrum™ Virtualize. This document is written for IT technical specialists and architects with advanced skill levels on SUSE Linux Enterprise Server (SLES) or Red Hat Enterprise Linux (RHEL) and IBM System Storage. This document provides the necessary information to select, verify, and connect IBM System Storage to the SAP HANA server through a Fibre Channel-based SAN. The recommendations in this Blueprint apply to single-node and scale-out configurations, and Intel and IBM Power based SAP HANA systems.

Database Internals

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals. Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed. This book examines: Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency

Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries

Apply the new Query Store feature to identify and fix poorly performing queries in SQL Server. Query Store is an important and recent feature in SQL Server that provides insight into the details of query execution and how that execution has changed over time. Query Store helps to identify queries that aren’t performing well, or that have regressed in their performance. Query Store provides detailed information such as wait stats that you need to resolve root causes, and it allows you to force the use of a known good execution plan. With SQL Server 2017 and later you can automate the correction of regressions in performance. Query Store for SQL Server 2019 helps you protect your database’s performance during upgrades of applications or version of SQL Server. The book provides fundamental information on how Query Store works and best practices for implementation and use. You will learn to run and interpret built-in reports, configure automatic plan correction, and troubleshoot queries using Query Store when needed. Query Store for SQL Server 2019 helps you master Query Store and bring value to your organization through consistent query execution times and automate correction of regressions. What You'll Learn Apply best practices in implementing Query Store on production servers Detect and correct regressions in query performance Lower the risk of performance degradation following an upgrade Use tools and techniques to get the most from Query Store Automate regression correction and other uses of Query Store Who This Book Is For SQL Server developers and administrators responsible for query performance on SQL Server. Anyone responsible for identifying poorly performing queries will be able to use Query Store to find these queries and resolve the underlying issues.

IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage

This IBM® Redpaper publication provides a comprehensive overview of the IBM Spectrum® Discover metadata management software platform. We give a detailed explanation of how the product creates, collects, and analyzes metadata. Several in-depth use cases are used that show examples of analytics, governance, and optimization. We also provide step-by-step information to install and set up the IBM Spectrum Discover trial environment. More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data such as: Pinpointing and activating relevant data for large-scale analytics Lacking the fine-grained visibility that is needed to map data to business priorities Removing redundant, obsolete, and trivial (ROT) data Identifying and classifying sensitive data IBM Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage. IBM Spectrum Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.

IBM Storage for Red Hat OpenShift Container Platform V3.11 Blueprint Version 1 Release 1

IBM Storage for Red Hat OpenShift Container Platform is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift Container Platform V3.11 environment. IBM Storage, bringing enterprise data services to containers. In this blueprint, learn how to: • Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! • Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform • Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform: designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

SAP ABAP Objects: A Practical Guide to the Basics and Beyond

Understand ABAP objects—the object-oriented extension of the SAP language ABAP—in the latest release of SAP NetWeaver 7.5, and its newest advancements. This book begins with the programming of objects in general and the basics of the ABAP language that a developer needs to know to get started. The most important topics needed to perform daily support jobs and ensure successful projects are covered. ABAP is a vast community with developers working in a variety of functional areas. You will be able to apply the concepts in this book to your area. SAP ABAP Objects is goal directed, rather than a collection of theoretical topics. It doesn't just touch on the surface of ABAP objects, but goes in depth from building the basic foundation (e.g., classes and objects created locally and globally) to the intermediary areas (e.g., ALV programming, method chaining, polymorphism, simple and nested interfaces), and then finally into the advanced topics (e.g., shared memory, persistent objects). You will know how to use best practices to make better programs via ABAP objects. What You’ll Learn Know the latest advancements in ABAP objects with the new SAP Netweaver system Understand object-oriented ABAP classes and their components Use object creation and instance-methods calls Be familiar with the functions of the global class builder Be exposed to advanced topics Incorporate best practices for making object-oriented ABAP programs Who This Book Is For ABAP developers, ABAP programming analysts, and junior ABAP developers. Included are: ABAP developers for all modules of SAP, both new learners and developers with some experience or little programming experience in general; students studying ABAP at the college/university level; senior non-ABAP programmers with considerable experience who are willing to switch to SAP/ABAP; and any functional consultants who want or have recently switched to ABAP technical.

IBM Power Systems Enterprise AI Solutions

This IBM® Redpaper publication helps the line of business (LOB), data science, and information technology (IT) teams develop an information architecture (IA) for their enterprise artificial intelligence (AI) environment. It describes the challenges that are faced by the three roles when creating and deploying enterprise AI solutions, and how they can collaborate for best results. This publication also highlights the capabilities of the IBM Cognitive Systems and AI solutions: IBM Watson® Machine Learning Community Edition IBM Watson Machine Learning Accelerator (WMLA) IBM PowerAI Vision IBM Watson Machine Learning IBM Watson Studio Local IBM Video Analytics H2O Driverless AI IBM Spectrum® Scale IBM Spectrum Discover This publication examines the challenges through five different use case examples: Artificial vision Natural language processing (NLP) Planning for the future Machine learning (ML) AI teaming and collaboration This publication targets readers from LOBs, data science teams, and IT departments, and anyone that is interested in understanding how to build an IA to support enterprise AI development and deployment.

IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Storage System: Host Attachment and Interoperability

This IBM® Redbooks® publication provides information for attaching the IBM FlashSystem® A9000, IBM FlashSystem A9000R, and IBM XIV® Storage System to various host operating system platforms, such as IBM AIX® and Microsoft Windows. This publication was last updated in May 2019 to cover the VLAN tagging and port trunking support available with software version 12.3.2 (see in particular section 2.4, "VLAN tagging" on page 67. The goal is to give an overview of the versatility and compatibility of the IBM Spectrum™ Accelerate family of storage systems with various platforms and environments. The information that is presented here is not meant as a replacement or substitute for the IBM Storage Host Attachment Kit publications or other product publications. It is meant as a complement and to provide usage guidance and practical illustrations. This publication does not address attachments to a secondary system used for Remote Mirroring or data migration. These topics are covered in IBM FlashSystem A9000 and IBM FlashSystem A9000 and A9000R Business Continuity Solutions, REDP-5401.

Enhanced Cyber Security with IBM Spectrum Scale and IBM QRadar

Having appropriate storage for hosting business-critical data and advanced Security Information and Event Management software for deep inspection, detection, and prioritization of threats has become a necessity of any business. This IBM® Redpaper publication explains how the storage features of IBM Spectrum® Scale, combined with the log analysis, deep inspection, and detection of threats provided by IBM QRadar®, helps reduce the impact of incidents on business data. Such integration provides an excellent platform for hosting unstructured business data that is subject to regulatory compliance requirements. This paper describes how IBM Spectrum Scale file audit logging can be integrated with IBM QRadar. Using QRadar, an administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data stored on IBM Spectrum Scale. When the threats are identified, you can quickly act on them to mitigate or reduce the impact of incidents. This paper is intended for chief technology officers, solution engineers, security architects, and systems administrators. NOTE: This paper assumes a basic understanding of IBM Spectrum Scale, IBM QRadar, and their administration.

Practical Data Science with SAP

Learn how to fuse today's data science tools and techniques with your SAP enterprise resource planning (ERP) system. With this practical guide, SAP veterans Greg Foss and Paul Modderman demonstrate how to use several data analysis tools to solve interesting problems with your SAP data. Data engineers and scientists will explore ways to add SAP data to their analysis processes, while SAP business analysts will learn practical methods for answering questions about the business. By focusing on grounded explanations of both SAP processes and data science tools, this book gives data scientists and business analysts powerful methods for discovering deep data truths. You'll explore: Examples of how data analysis can help you solve several SAP challenges Natural language processing for unlocking the secrets in text Data science techniques for data clustering and segmentation Methods for detecting anomalies in your SAP data Data visualization techniques for making your data come to life