talk-data.com talk-data.com

Topic

data-engineering

3377

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Professional Azure SQL Database Administration - Second Edition

Professional Azure SQL Database Administration serves as your comprehensive guide to mastering the management and optimization of cloud-based Azure SQL Database solutions. With the differences and unique features of Azure SQL Database compared to the on-premise SQL Server, this book offers a clear roadmap to efficiently migrate, secure, scale, and maintain these databases in the cloud. What this Book will help me do Understand the differences between Azure SQL Database and on-premise SQL Server and their practical implications. Learn techniques to migrate existing SQL Server databases to Azure SQL Database seamlessly. Discover advanced ways to optimize database performance and scalability leveraging cloud capabilities. Master security strategies for Azure SQL databases, including backup, disaster recovery, and automated tasks. Develop proficiency in using tools such as PowerShell to automate and manage routine database administration tasks. Author(s) Ahmad Osama is an experienced database professional and author specializing in SQL Server and Azure SQL Database administration. With a robust background in database migration, maintenance, and performance tuning, Ahmad expertly bridges the gap between theory and practice. His approachable writing style makes complex database topics accessible to professionals seeking to expand their expertise. Who is it for? Professional Azure SQL Database Administration is an essential resource for database administrators, developers, and IT professionals keen on developing their knowledge about Azure SQL Database administration and cloud database solutions. Whether you're transitioning from traditional SQL Server environments or looking to optimize your database strategies in the cloud, this book caters to professionals with intermediate to advanced experience in database management and programming with SQL.

IBM Spectrum Virtualize: Hot-Spare Node and NPIV Target Ports

The use of N_Port ID Virtualization (NPIV) to provide host-only ports (NPIV target ports) and spare nodes improves the host failover characteristics by separating out host communications from communication tasks on the same port and providing standby hardware, which can be automatically introduced into the cluster to reintroduce redundancy. Because the host ports are not used for internode communications, they can freely move between nodes, and this includes spare nodes that are added to the cluster automatically. This IBM® Redpaper™ publication describes the use of the IBM Spectrum™ Virtualize Hot-Spare Node function to provide a high availability storage infrastructure. This paper focuses on the functional behavior of hot-spare node when subjected to various failure conditions. This paper does not provide the details necessary to implement the reference architectures (although some implementation detail is provided).

IBM Spectrum Scale: Big Data and Analytics Solution Brief

This IBM® Redguide™ publication describes big data and analytics deployments that are built on IBM Spectrum Scale™. IBM Spectrum Scale is a proven enterprise-level distributed file system that is a high-performance and cost-effective alternative to Hadoop Distributed File System (HDFS) for Hadoop analytics services. IBM Spectrum Scale includes NFS, SMB, and Object services and meets the performance that is required by many industry workloads, such as technical computing, big data, analytics, and content management. IBM Spectrum Scale provides world-class, web-based storage management with extreme scalability, flash accelerated performance, and automatic policy-based storage tiering from flash through disk to the cloud, which reduces storage costs up to 90% while improving security and management efficiency in cloud, big data, and analytics environments. This Redguide publication is intended for technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing Hadoop analytics services and are interested in learning about the benefits of the use of IBM Spectrum Scale as an alternative to HDFS.

SAP HANA on IBM Power Systems: High Availability and Disaster Recovery Implementation Updates

This IBM® Redbooks® publication updates Implementing High Availability and Disaster Recovery Solutions with SAP HANA on IBM Power Systems, REDP-5443 with the latest technical content that describes how to implement an SAP HANA on IBM Power Systems™ high availability (HA) and disaster recovery (DR) solution by using theoretical knowledge and sample scenarios. This book describes how all the pieces of the reference architecture work together (IBM Power Systems servers, IBM Storage servers, IBM Spectrum™ Scale, IBM PowerHA® SystemMirror® for Linux, IBM VM Recovery Manager DR for Power Systems, and Linux distributions) and demonstrates the resilience of SAP HANA with IBM Power Systems servers. This publication is for architects, brand specialists, distributors, resellers, and anyone developing and implementing SAP HANA on IBM Power Systems integration, automation, HA, and DR solutions. This publication provides documentation to transfer the how-to-skills to the technical teams, and documentation to the sales team.

IBM Personal Communications and IBM z/OS TTLS Enablement: Technical Enablement Series

The purpose of this document is to complete the task of introducing Transport Layer Security to z/OS® so IBM Personal Communications (PCOMM) uses TLS security. This document walks you through enabling Tunneled Transport Layer Security (TTLS) on your IBM z/OS for use with a PCOMM TN3270 connection. When you complete this task, you require a certificate to access your TN3270 PCOMM session. You work with the following products and components: TN3270 TCPIP PAGENT INET (maybe) IBM RACF® This document assumes that the reader has extensive knowledge of z/OS security administration and these products and components. This document is part of the Technical Enablement Series that was created at the IBM Client Experience Centers.

IBM FlashSystem 900 Model AE3 Product Guide

Today's global organizations depend on the ability to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900 Model AE3, they can make faster decisions based on real-time insights. Thus, they unleash the power of demanding applications, including these: Online transaction processing (OLTP) and analytical databases Virtual desktop infrastructures (VDIs) Technical computing applications Cloud environments Easy to deploy and manage, IBM FlashSystem® 900 Model AE3 is designed to accelerate the applications that drive your business. Powered by IBM FlashCore® Technology, IBM FlashSystem Model AE3 provides the following characteristics: Accelerate business-critical workloads, real-time analytics, and cognitive applications with the consistent microsecond latency and extreme reliability of IBM FlashCore technology Improve performance and help lower cost with new inline data compression Help reduce capital and operational expenses with IBM enhanced 3D triple-level cell (3D TLC) flash Protect critical data assets with patented IBM Variable Stripe RAID™ Power faster insights with IBM FlashCore including hardware-accelerated nonvolatile memory (NVM) architecture, purpose-engineered IBM MicroLatency® modules and advanced flash management FlashSystem 900 Model AE3 can be configured in capacity points as low as 14.4 TB to 180 TB usable and up to 360 TB effective capacity after RAID 5 protection and compression. You can couple this product with either 16 Gbps, 8 Gbps Fibre Channel, 16 Gbps NVMe over Fibre Channel, or 40 Gbps InfiniBand connectivity. Thus, the IBM FlashSystem 900 Model AE3 provides extreme performance to existing and next generation infrastructure.

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V8.2.1

This IBM® Redbooks® publication is a detailed technical guide to the IBM System Storage® SAN Volume Controller (SVC), which is powered by IBM Spectrum™ Virtualize V8.2.1. IBM SAN Volume Controller is a virtualization appliance solution that maps virtualized volumes that are visible to hosts and applications to physical volumes on storage devices. Each server within the storage area network (SAN) has its own set of virtual storage addresses that are mapped to physical addresses. If the physical addresses change, the server continues running by using the same virtual addresses that it had before. Therefore, volumes or storage can be added or moved while the server is still running. The IBM virtualization technology improves the management of information at the block level in a network, which enables applications and servers to share storage devices on a network.

IBM Hybrid Solution for Scalable Data Solutions using IBM Spectrum Scale

This document is intended to facilitate the deployment of the scalable hybrid cloud solution for data agility and collaboration using IBM® Spectrum Scale across multiple public clouds. To complete the tasks it describes, you must understand IBM Spectrum Scale and IBM Spectrum Scale Active File Management (AFM). The information in this document is distributed on an basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM Spectrum Scale Active File Management are supported and entitled, and where the issues are specific to a blueprint implementation.

Big Data Simplified
"Big Data Simplified blends technology with strategy and delves into applications of big data in specialized areas, such as recommendation engines, data science and Internet of Things (IoT) and enables a practitioner to make the right technology choice. The steps to strategize a big data implementation are also discussed in detail. This book presents a holistic approach to the topic, covering a wide landscape of big

data technologies like Hadoop 2.0 and package implementations, such as Cloudera. In-depth discussion of associated technologies, such as MapReduce, Hive, Pig, Oozie, ApacheZookeeper, Flume, Kafka, Spark, Python and NoSQL databases like Cassandra, MongoDB, GraphDB, etc., is also included.

Multicloud Storage as a Service using vRealize Automation and IBM Spectrum Storage

This document is intended to facilitate the deployment of the Multicloud Solution for Business Continuity and Storage as service by using IBM Spectrum Virtualize for Public Cloud on Amazon Web Services (AWS). To complete the tasks it describes, you must understand IBM FlashSystem 9100, IBM Spectrum Virtualize for Public Cloud, IBM Spectrum Connect, VMware vRealize Orchestrator, and vRealize Automation and AWS Cloud. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Storwize or IBM FlashSystem storage devices are supported and entitled and where the issues are specific to a blueprint implementation.

Streaming Data

Managers and staff responsible for planning, hiring, and allocating resources need to understand how streaming data can fundamentally change their organizations. Companies everywhere are disrupting business, government, and society by using data and analytics to shape their business. Even if you don’t have deep knowledge of programming or digital technology, this high-level introduction brings data streaming into focus. You won’t find math or programming details here, or recommendations for particular tools in this rapidly evolving space. But you will explore the decision-making technologies and practices that organizations need to process streaming data and respond to fast-changing events. By describing the principles and activities behind this new phenomenon, author Andy Oram shows you how streaming data provides hidden gems of information that can transform the way your business works. Learn where streaming data comes from and how companies put it to work Follow a simple data processing project from ingesting and analyzing data to presenting results Explore how (and why) big data processing tools have evolved from MapReduce to Kubernetes Understand why streaming data is particularly useful for machine learning projects Learn how containers, microservices, and cloud computing led to continuous integration and DevOps

Deep Learning for Search

Deep Learning for Search teaches you how to improve the effectiveness of your search by implementing neural network-based techniques. By the time you're finished with the book, you'll be ready to build amazing search engines that deliver the results your users need and that get better as time goes on! About the Technology Deep learning handles the toughest search challenges, including imprecise search terms, badly indexed data, and retrieving images with minimal metadata. And with modern tools like DL4J and TensorFlow, you can apply powerful DL techniques without a deep background in data science or natural language processing (NLP). This book will show you how. About the Book Deep Learning for Search teaches you to improve your search results with neural networks. You’ll review how DL relates to search basics like indexing and ranking. Then, you’ll walk through in-depth examples to upgrade your search with DL techniques using Apache Lucene and Deeplearning4j. As the book progresses, you’ll explore advanced topics like searching through images, translating user queries, and designing search engines that improve as they learn! What's Inside Accurate and relevant rankings Searching across languages Content-based image search Search with recommendations About the Reader For developers comfortable with Java or a similar language and search basics. No experience with deep learning or NLP needed. About the Author Tommaso Teofili is a software engineer with a passion for open source and machine learning. As a member of the Apache Software Foundation, he contributes to a number of open source projects, ranging from topics like information retrieval (such as Lucene and Solr) to natural language processing and machine translation (including OpenNLP, Joshua, and UIMA). He currently works at Adobe, developing search and indexing infrastructure components, and researching the areas of natural language processing, information retrieval, and deep learning. He has presented search and machine learning talks at conferences including BerlinBuzzwords, International Conference on Computational Science, ApacheCon, EclipseCon, and others. You can find him on Twitter at @tteofili. Quotes A practical approach that shows you the state of the art in using neural networks, AI, and deep learning in the development of search engines. - From the Foreword by Chris Mattmann, NASA JPL A thorough and thoughtful synthesis of traditional search and the latest advancements in deep learning. - Greg Zanotti, Marquette Partners A well-laid-out deep dive into the latest technologies that will take your search engine to the next level. - Andrew Wyllie, Thynk Health Hands-on exercises teach you how to master deep learning for search-based products. - Antonio Magnaghi, System1

IBM Storage Solutions for Blockchain Platform Version 1.2

This Blueprint is intended to define the infrastructure that is required for a blockchain remote peer and to facilitate the deployment of IBM Blockchain Platform on IBM Cloud Private using that infrastructure. This infrastructure includes the necessary document handler components, such as IBM Blockchain Document Store, and covers the required storage for on-chain and off-chain blockchain data. To complete these tasks, you must have a basic understanding of each of the used components or have access the correct educational material to gain that knowledge.

IBM FlashSystem A9000 Product Guide (Version 12.3.2)

This IBM® Redbooks® Product Guide is an overview of the main characteristics, features, and technology that are used in IBM FlashSystem® A9000Model 425, with IBM FlashSystem A9000 Software V12.3.2. Software version 12.3.2, with Hyper-Scale Manager version 5.6 or later, introduces support for VLAN tagging and port trunking. IBM FlashSystem A9000 storage system uses the IBM FlashCore® technology to help realize higher capacity and improved response times over disk-based systems and other competing flash and solid-state drive (SSD)-based storage. The extreme performance of IBM FlashCore technology with a grid architecture and comprehensive data reduction creates one powerful solution. Whether you are a service provider who requires highly efficient management or an enterprise that is implementing cloud on a budget, FlashSystem A9000 provides consistent and predictable microsecond response times and the simplicity that you need. The A9000 features always on data reduction and now offers intelligent capacity management for deduplication. As a cloud optimized solution, FlashSystem A9000 suits the requirements of public and private cloud providers who require features, such as inline data deduplication, multi-tenancy, and quality of service. It also uses powerful software-defined storage capabilities from IBM Spectrum™ Accelerate, such as Hyper-Scale technology, VMware, and storage container integration.

IBM FlashSystem A9000R Product Guide (Version 12.3.2)

This IBM® Redbooks® Product Guide is an overview of the main characteristics, features, and technology that are used in IBM FlashSystem® A9000R Model 415 and Model 425, with IBM FlashSystem A9000R Software V12.3.2. Software version 12.3.2, with Hyper-Scale Manager version 5.6 or later, introduces support for VLAN tagging and port trunking.. IBM FlashSystem A9000R is a grid-scale, all-flash storage platform designed for industry leaders with rapidly growing cloud storage and mixed workload environments to help drive your business into the cognitive era. FlashSystem A9000R provides consistent, extreme performance for dynamic data at scale, integrating the microsecond latency and high availability of IBM FlashCore® technology. The rack-based offering comes integrated with the world class software features that are built with IBM Spectrum™ Accelerate. For example, comprehensive data reduction, including inline pattern removal, data deduplication, and compression, helps lower total cost of ownership (TCO) while the grid architecture and IBM Hyper-Scale framework simplify and automate storage administration. The A9000R features always on data reduction and now offers intelligent capacity management for deduplication. Ready for the cloud and well-suited for large deployments, FlashSystem A9000R delivers predictable high performance and ultra-low latency, even under heavy workloads with full data reduction enabled. As a result, the grid-scale architecture maintains this performance by automatically self-optimizing workloads across all storage resources without manual intervention.

Managing Your Data Science Projects: Learn Salesmanship, Presentation, and Maintenance of Completed Models

At first glance, the skills required to work in the data science field appear to be self-explanatory. Do not be fooled. Impactful data science demands an interdisciplinary knowledge of business philosophy, project management, salesmanship, presentation, and more. In Managing Your Data Science Projects, author Robert de Graaf explores important concepts that are frequently overlooked in much of the instructional literature that is available to data scientists new to the field. If your completed models are to be used and maintained most effectively, you must be able to present and sell them within your organization in a compelling way. The value of data science within an organization cannot be overstated. Thus, it is vital that strategies and communication between teams are dexterously managed. Three main ways that data science strategy is used in a company is to research its customers, assess risk analytics, and log operational measurements. These all require different managerial instincts, backgrounds, and experiences, and de Graaf cogently breaks down the unique reasons behind each. They must align seamlessly to eventually be adopted as dynamic models. Data science is a relatively new discipline, and as such, internal processes for it are not as well-developed within an operational business as others. With Managing Your Data Science Projects, you will learn how to create products that solve important problems for your customers and ensure that the initial success is sustained throughout the product’s intended life. Your users will trust you and your models, and most importantly, you will be a more well-rounded and effectual data scientist throughout your career. Who This Book Is For Early-career data scientists, managers of data scientists, and those interested in entering the fieldof data science

Stream Processing with Apache Spark

Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write streaming jobs in almost the same way you write batch jobs. Authors Gerard Maas and François Garillot help you explore the theoretical underpinnings of Apache Spark. This comprehensive guide features two sections that compare and contrast the streaming APIs Spark now supports: the original Spark Streaming library and the newer Structured Streaming API. Learn fundamental stream processing concepts and examine different streaming architectures Explore Structured Streaming through practical examples; learn different aspects of stream processing in detail Create and operate streaming jobs and applications with Spark Streaming; integrate Spark Streaming with other Spark APIs Learn advanced Spark Streaming techniques, including approximation algorithms and machine learning algorithms Compare Apache Spark to other stream processing projects, including Apache Storm, Apache Flink, and Apache Kafka Streams

Obtaining Value from Big Data for Service Systems, Volume II, 2nd Edition

Volume II of this series discusses the technology used to implement a big data analysis capability within a service-oriented organization. It discusses the technical architecture necessary to implement a big data analysis capability, some issues and challenges in big data analysis and utilization that an organization will face, and how to capture value from it. It will help readers understand what technology is required for a basic capability and what the expected benefits are from establishing a big data capability within their organization.

Pro SQL Server 2019 Wait Statistics: A Practical Guide to Analyzing Performance in SQL Server

Here is a practical guide for analyzing and troubleshooting SQL Server performance using wait statistics. Learn to identify precisely why your queries are running slowly. Measure the amount of time consumed by each bottleneck so that you can focus attention on making the largest improvements first. This edition is updated to cover analysis of wait statistics inside Query Store, the CXCONSUMER wait event, and to be current with SQL Server 2019. Whether you are new to wait statistics, or already familiar with them, this book provides a deeper understanding on how wait statistics are generated and what they can mean for your SQL Server instance’s performance. Pro SQL Server 2019 Wait Statistics goes beyond the most common wait types into the more complex and performance-threatening wait types. You’ll learn about per-query wait statistics and session-based wait statistics, and the types of problems they each can help you solve. The different wait types are categorized by their area of impact, including CPU, IO, Lock, and many more. The book presents clear examples to help you gain practical knowledge of why and how specific wait times increase or decrease, and how they impact your SQL Server’s performance. After reading this book you won’t want to be without the valuable information that wait statistics provide regarding where you should be spending your limited tuning time to maximize performance and value to your business. What You'll Learn Identify resource bottlenecks in a running SQL Server instance Locate wait statistics information inside DMVs and Query Store Analyze the root cause of sub-optimal performance Diagnose I/O contention and locking contention Benchmark SQL Server performance Lower the wait time of the most popular wait types Who This Book Is For Database administrators who want to identify and resolve performance bottlenecks, those who want to learn more about how the SQL Server engine accesses and uses resources inside SQL Server, and administrators concerned with achieving—and knowing they have achieved—optimal performance