talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

499

Collection of O'Reilly books on Data Engineering.

Filtering by: Cloud Computing ×

Sessions & talks

Showing 176–200 of 499 · Newest first

Search within this event →
Automating the Modern Data Warehouse

The opportunity to modernize and improve the enterprise data warehouse is one of the best reasons for moving your application to the cloud. A data warehouse can access a greater diversity of use cases and practices than is possible in an existing environment. In this report, researcher and analyst Stephen Swoyer offers a comprehensive overview of the benefits and challenges of implementing a cloud-based data warehouse. Senior IT decision makers, chief data officers, and data professionals will learn about the shifts and new trends in the data management landscape. Explore ways to improve data management, build a data warehouse strategy, and learn how to modernize a data warehouse effectively. Understand how AI, machine learning, self-service data integration, and built-in developer-oriented services have transformed the data warehouse role Use data warehouses to work with cloud-based data lakes for end-to-end data management and data governance Explore how data warehouse platforms as a service (PaaS) pave the way to automation Migrate, manage, and secure a data warehouse in a hybrid or multicloud environment

IBM TS7700 Series DS8000 Object Store User's Guide Version 2.0

The IBM® TS7700 features a functional enhancement that allows for the TS7700 to act as an object store for transparent cloud tiering with IBM DS8000® (DS8K), DFSMShsm (HSM), and native DFSMSdss (DSS). This function can be used to move data sets directly from DS8000 to TS7700. This IBM Redpaper publication describes the client value, and how DFSMS, DS8000, and TS7700 are set up to enable and use the function.

Professional Azure SQL Managed Database Administration - Third Edition

Professional Azure SQL Managed Database Administration is a comprehensive guide to mastering data management with Azure's managed database services. Packed with real-world exercises and updated to cover the latest Azure features, this book provides actionable insights into migration, performance tuning, scaling, and securing Azure SQL databases. What this Book will help me do Master the configuration and pricing options for Azure SQL databases to make cost-effective choices. Learn the processes to provision new SQL databases or migrate existing on-premises SQL databases to Azure. Acquire skills in implementing high availability and disaster recovery for ensuring data resilience. Understand the strategies for monitoring, tuning, and optimizing the performance of Azure SQL databases. Discover techniques for scaling uses through elastic pools and securing databases comprehensively. Author(s) Ahmad Osama and Shashikant Shakya are experienced professionals in SQL Server and Azure SQL technologies. With decades of combined experience in database administration and cloud computing, they bring a depth of understanding to the content of this book. Their hands-on teaching approach is evident in the practical exercises and real-world scenarios included. Who is it for? This book is specifically tailored for database administrators, developers, and application developers looking to leverage Azure SQL databases. If you are tasked with migrating applications to the cloud or ensuring top performance and resilience for cloud databases, you will find this book highly valuable. Prior experience with on-premises SQL services will help contextualize the content, making it suitable for professionals with intermediate SQL experience. Readers aiming to deepen their Azure SQL expertise will also greatly benefit.

Enhanced Cyber Resilience Solution by Threat Detection using IBM Cloud Object Storage System and IBM QRadar SIEM

This Solution Redpaper™ publication explains how the features of IBM Cloud® Object Storage System reduces the effect of incidents on business data when combined with log analysis, deep inspection, and detection of threats that IBM QRadar SIEM provides. This paper also demonstrates how to integrate IBM Cloud Object Storage's access logs with IBM QRadar SIEM. An administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data that is stored on IBM Cloud Object Storage. Also, IBM QRadar SIEM can proactively trigger cyber resiliency workflow in IBM Cloud Object Storage remotely to protect the data based on threat detection. This publication is intended for chief technology officers, solution and security architects, and systems administrators.

Securing Your Critical Workloads with IBM Hyper Protect Services

Many organizations must protect their mission-critical applications in production, but security threats can also surface during the development and pre-production phases. Also, during deployment and production, insiders who manage the infrastructure that hosts critical applications can pose a threat given their super-user credentials and level of access to secrets or encryption keys. Organizations must incorporate secure design practices in their development operations and embrace DevSecOps to protect their applications from the vulnerabilities and threat vectors that can compromise their data and potentially threaten their business. IBM® Cloud Hyper Protect Services provide built-in data-at-rest and data-in-flight protection to help developers easily build secure cloud applications by using a portfolio of cloud services that are powered by IBM LinuxONE. The LinuxONE platform ensures that client data is always encrypted, whether at rest or in transit. This feature gives customers complete authority over sensitive data and associated workloads (which restricts access, even for cloud admins) and helps them meet regulatory compliance requirements. LinuxONE also allows customers to build mission-critical applications that require quick time to market and dependable rapid expansion. The purpose of this IBM Redbooks® publication is to: Introduce the IBM Hyper Protect Services that are running on IBM LinuxONE on the IBM Cloud™ and on-premises Provide high-level design architectures Describe deployment best practices Provide guides to getting started and examples of the use of the Hyper Protect Services The target audience for this book is IBM Hyper Protect Virtual Services technical specialists, IT architects, and system administrators.

Snowflake Cookbook

The "Snowflake Cookbook" is your guide to mastering Snowflake's unique cloud-centric architecture. This book provides detailed recipes for building modern data pipelines, configuring efficient virtual warehouses, ensuring robust data protection, and optimizing cost-performance-all while leveraging Snowflake's distinctive features such as data sharing and time travel. What this Book will help me do Set up and configure Snowflake's architecture for optimized performance and cost efficiency. Design and implement robust data pipelines using SQL and Snowflake's specialized features. Secure, manage, and share data efficiently with built-in Snowflake capabilities. Apply performance tuning techniques to enhance your Snowflake implementations. Extend Snowflake's functionality with tools like Spark Connector for advanced workflows. Author(s) Hamid Mahmood Qureshi and Hammad Sharif are both seasoned experts in data warehousing and cloud computing technologies. With extensive experience implementing analytics solutions, they bring a hands-on approach to teaching Snowflake. They are ardent proponents of empowering readers towards creating effective and scalable data solutions. Who is it for? This book is perfect for data warehouse developers, data analysts, cloud architects, and anyone managing cloud data solutions. If you're familiar with basic database concepts or just stepping into Snowflake, you'll find practical guidance here to deepen your understanding and functional expertise in cloud data warehousing.

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

Build custom SQL Server Integration Services (SSIS) tasks using Visual Studio Community Edition and C#. Bring all the power of Microsoft .NET to bear on your data integration and ETL processes, and for no added cost over what you’ve already spent on licensing SQL Server. New in this edition is a demonstration deploying a custom SSIS task to the Azure Data Factory (ADF) Azure-SSIS Integration Runtime (IR). All examples in this new edition are implemented in C#. Custom task developers are shown how to implement custom tasks using the widely accepted and default language for .NET development. Why are custom components necessary? Because even though the SSIS catalog of built-in tasks and components is a marvel of engineering, gaps remain in the available functionality. One such gap is a constraint of the built-in SSIS Execute Package Task, which does not allow SSIS developers to select SSIS packages from other projects in the SSIS Catalog. Examples in this bookshow how to create a custom Execute Catalog Package task that allows SSIS developers to execute tasks from other projects in the SSIS Catalog. Building on the examples and patterns in this book, SSIS developers may create any task to which they aspire, custom tailored to their specific data integration and ETL needs. What You Will Learn Configure and execute Visual Studio in the way that best supports SSIS task development Create a class library as the basis for an SSIS task, and reference the needed SSIS assemblies Properly sign assemblies that you create in order to invoke them from your task Implement source code control via Azure DevOps, or your own favorite tool set Troubleshoot and execute custom tasks as part of your own projects Create deployment projects (MSIs) for distributing code-complete tasks Deploy custom tasks to Azure Data Factory Azure-SSIS IRs in the cloud Create advanced editors for custom task parameters Who This Book Is For For database administrators and developers who are involved in ETL projects built around SQL Server Integration Services (SSIS). Readers do not need a background in software development with C#. Most important is a desire to optimize ETL efforts by creating custom-tailored tasks for execution in SSIS packages, on-premises or in ADF Azure-SSIS IRs.

Data Pipelines Pocket Reference

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting

IBM Integrated Synchronization: Incremental Updates Unleashed

The IBM® Db2® Analytics Accelerator (Accelerator) is a logical extension of Db2 for IBM z/OS® that provides a high-speed query engine that efficiently and cost-effectively runs analytics workloads. The Accelerator is an integrated back-end component of Db2 for z/OS. Together, they provide a hybrid workload-optimized database management system that seamlessly manages queries that are found in transactional workloads to Db2 for z/OS and queries that are found in analytics applications to Accelerator. Each query runs in its optimal environment for maximum speed and cost efficiency. The incremental update function of Db2 Analytics Accelerator for z/OS updates Accelerator-shadow tables continually. Changes to the data in original Db2 for z/OS tables are propagated to the corresponding target tables with a high frequency and a brief delay. Query results from Accelerator are always extracted from recent, close-to-real-time data. An incremental update capability that is called IBM InfoSphere® Change Data Capture (InfoSphere CDC) is provided by IBM InfoSphere Data Replication for z/OS up to Db2 Analytics Accelerator V7.5. Since then, an extra new replication protocol between Db2 for z/OS and Accelerator that is called IBM Integrated Synchronization was introduced. With Db2 Analytics Accelerator V7.5, customers can choose which one to use. IBM Integrated Synchronization is a built-in product feature that you use to set up incremental updates. It does not require InfoSphere CDC, which is bundled with IBM Db2 Analytics Accelerator. In addition, IBM Integrated Synchronization has more advantages: Simplified administration, packaging, upgrades, and support. These items are managed as part of the Db2 for z/OS maintenance stream. Updates are processed quickly. Reduced CPU consumption on the mainframe due to a streamlined, optimized design where most of the processing is done on the Accelerator. This situation provides reduced latency. Uses IBM Z® Integrated Information Processor (zIIP) on Db2 for z/OS, which leads to reduced CPU costs on IBM Z and better overall performance data, such as throughput and synchronized rows per second. On z/OS, the workload to capture the table changes was reduced, and the remainder can be handled by zIIPs. With the introduction of an enterprise-grade Hybrid Transactional Analytics Processing (HTAP) enabler that is also known as the Wait for Data protocol, the integrated low latency protocol is now enabled to support more analytical queries running against the latest committed data. IBM Db2 for z/OS Data Gate simplifies delivering data from IBM Db2 for z/OS to IBM Cloud® Pak® for Data for direct access by new applications. It uses the special-purpose integrated synchronization protocol to maintain data currency with low latency between Db2 for z/OS and dedicated target databases on IBM Cloud Pak for Data.

IBM Power Systems H922 and H924 Technical Overview and Introduction

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power System H922 (9223-22S), and IBM Power System H924 (9223-42S) servers that support memory-intensive workloads, such as SAP HANA, and deliver superior price and performance for mission-critical applications in IBM AIX®, IBM i, and Linux® operating systems. The goal of this paper is to provide a hardware architecture analysis and highlight the changes, new technologies, and major features that are being introduced in these systems' 2020 release, such as the following examples: Availability of new IBM POWER9™ processor configurations for the number of cores per socket. More performance by using industry-leading IBM Peripheral Component Interconnect® Express (PCIe) Gen4 slots. Enhanced internal disk configuration options, with up to 14 NVMe adapters (four U.2 NVMe plus up to 10 PCIe add-in cards). Twice as fast back-end I/O enables seamless maximum speed and throughput between on-premises and multiple public cloud infrastructures with high availability (HA). This publication is for professionals who want to acquire a better understanding of IBM Power Systems products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power H922 and Power H924 systems.

Modernizing Applications with IBM CICS

IBM® CICS® is a mixed language application server that runs on IBM Z®. Over the 50 years since CICS was introduced in 1969, enterprises have used the qualities of service (QoSs) that CICS provides to allow them to create high throughput and secure transactional applications that have powered their business. As the IT landscape has evolved, so has CICS to allow these applications to integrate with new platforms and still provide value to the rest of the business. Because of this capability, many businesses still rely on CICS to power their core applications. This IBM Redpaper publication focuses on modernizing these CICS applications, allowing them to integrate with cloud-native applications. This modernization can be achieved either by constructing application programming interfaces (APIs) that allow new cloud-native applications to connect to your existing assets, rewriting parts of your application in newer languages and hosting them back on CICS, or by using CICS capabilities to extend your applications to provide new capabilities and functions. The paper takes a traditional example application and shows you how it works. Then, the paper extends the example, rewrites portions of its functions, and enables its APIs. It also explains how CICS applications can use continuous integration (CI) and continuous delivery (CD) to deliver, test, and deploy code into CICS easily and with quality.

MongoDB Fundamentals

This book, "MongoDB Fundamentals", is the ideal hands-on guide to learning MongoDB. By starting from the basics of NoSQL databases and progressing to cloud integration using MongoDB Atlas, you will gain practical experience managing, querying, and visualizing data effectively for real-world applications. What this Book will help me do Set up and manage a MongoDB database with both local and cloud environments. Master querying and modifying data using the aggregation framework for complex operations. Implement effective database architecture with replication and sharding techniques. Ensure data security and resilience through user management and efficient backup/restore methods. Visualize data insights through dynamic reports and charts using MongoDB Charts. Author(s) Amit Phaltankar, Juned Ahsan, Michael Harrison, and Liviu Nedov are seasoned professionals in the field of database management systems, each bringing extensive experience working with MongoDB and cloud technologies. They excel at translating technical concepts into accessible, actionable insights, and have a passion for enabling IT professionals to create high-performance database solutions. Who is it for? "MongoDB Fundamentals" is tailored for developers, database administrators, system administrators, and cloud architects who are new to MongoDB but are looking to integrate it into their data processing workflows. It's perfect for those who aim to enhance their skills in handling data within cloud computing environments and have some basic programming or database experience.

IBM Storage for Red Hat OpenShift Blueprint

This IBM® Blueprint is intended to facilitate the deployment of IBM Storage for Red Hat OpenShift Container Platform by using detailed hardware specifications to build a system. It describes the associated parameters for configuring persistent storage within a Red Hat OpenShift Container Platform environment. To complete the tasks, you must understand Red Hat OpenShift, IBM Storage, the IBM block storage Container Storage Interface (CSI) driver, and the IBM Spectrum Scale CSI driver. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Storwize® or IBM FlashSystem® storage devices, Enterprise Storage Server®, and IBM Spectrum® Scale are supported and entitled, and where the issues are not specific to a blueprint implementation. IBM Storage Suite for IBM Cloud® Paks is an offering bundle that includes software-defined storage from IBM and Red Hat. Use this document for more information about how to deploy IBM Storage product licenses that are obtained through Storage Suite for Cloud Paks (IBM Spectrum Virtualize and IBM Spectrum Scale).

Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance

Learn effective and scalable database design techniques in SQL Server 2019 and other recent SQL Server versions. This book is revised to cover additions to SQL Server that include SQL graph enhancements, in-memory online transaction processing, temporal data storage, row-level security, and other design-related features. This book will help you design OLTP databases that are high-quality, protect the integrity of your data, and perform fast on-premises, in the cloud, or in hybrid configurations. Designing an effective and scalable database using SQL Server is a task requiring skills that have been around for well over 30 years, using technology that is constantly changing. This book covers everything from design logic that business users will understand to the physical implementation of design in a SQL Server database. Grounded in best practices and a solid understanding of the underlying theory, author Louis Davidson shows you how to "getit right" in SQL Server database design and lay a solid groundwork for the future use of valuable business data. What You Will Learn Develop conceptual models of client data using interviews and client documentation Implement designs that work on premises, in the cloud, or in a hybrid approach Recognize and apply common database design patterns Normalize data models to enhance integrity and scalability of your databases for the long-term use of valuable data Translate conceptual models into high-performing SQL Server databases Secure and protect data integrity as part of meeting regulatory requirements Create effective indexing to speed query performance Understand the concepts of concurrency Who This Book Is For Programmers and database administrators of all types who want to use SQL Server to store transactional data. The book is especially useful to those wanting to learn the latest database design features in SQL Server 2019 (features that include graph objects, in-memory OLTP, temporal data support, and more). Chapters on fundamental concepts, the language of database modeling, SQL implementation, and the normalization process lay a solid groundwork for readers who are just entering the field of database design. More advanced chapters serve the seasoned veteran by tackling the latest in physical implementation features that SQL Server has to offer. The book has been carefully revised to cover all the design-related features that are new in SQL Server 2019.

IBM FlashSystem 9100 Architecture, Performance, and Implementation

IBM® FlashSystem 9100 combines the performance of flash and Non-Volatile Memory Express (NVMe) with the reliability and innovation of IBM FlashCore® technology and the rich features of IBM Spectrum™ Virtualize — all in a powerful 2U storage system. Providing intensive data driven multi-cloud storage capacity, FlashSystem 9100 is deeply integrated with the software-defined capabilities of IBM Spectrum Storage™, which allows you to easily add the multi-cloud solutions that best support your business. In this IBM Redbooks® publication, we discuss the product's features and planning steps, architecture, installation, configuration, and hints and tips.

What Is a Data Lake?

A revolution is occurring in data management regarding how data is collected, stored, processed, governed, managed, and provided to decision makers. The data lake is a popular approach that harnesses the power of big data and marries it with the agility of self-service. With this report, IT executives and data architects will focus on the technical aspects of building a data lake for your organization. Alex Gorelik from Facebook explains the requirements for building a successful data lake that business users can easily access whenever they have a need. You'll learn the phases of data lake maturity, common mistakes that lead to data swamps, and the importance of aligning data with your company's business strategy and gaining executive sponsorship. You'll explore: The ingredients of modern data lakes, such as the use of different ingestion methods for different data formats, and the importance of the three Vs: volume, variety, and velocity Building blocks of successful data lakes, including data ingestion, integration, persistence, data governance, and business intelligence and self-service analytics State-of-the-art data lake architectures offered by Amazon Web Services, Microsoft Azure, and Google Cloud

Cyber Resilience Solution Across Hybrid Cloud Using IBM Storage Solutions

In today's data driven world, the information and data of an organization is considered as the most important asset to its business. It can serve as key asset for growth of an organization. As more data are collected by organizations, it is growing at a staggering pace. With this exponential data growth, there is an increase need to protect the data from the various cyberattacks in the form of malware and ransomware that is trying to steal precious data and information. These cyberattacks can have catastrophic impact on the organization and result in devastating financial losses and affect the organization's reputation for years. This document is intended to facilitate the deployment of the Hybrid Cloud Cyber Resilience solution for storage system data that it backed up in IBM Spectrum Protect Plus from external cyberattacks or insider attacks by using its integration with IBM Cloud Object Storage. You must understand IBM FlashSystem, IBM Spectrum Protect Plus, and IBM Cloud Object Storage architecture concepts and its configuration across hybrid cloud. The information in this document is distributed on an as-is basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM FlashSystem, IBM Spectrum Protect Plus or IBM Cloud Object Storage are supported and entitled, and where the issues are specific to a solution technical paper implementation.

Practical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud

Here is the expert-level, insider guidance you need on using Azure SQL Database as your back-end data store. This book highlights best practices in everything ranging from full-stack projects to mobile applications to critical, back-end APIs. The book provides instruction on accessing your data from any language and platform. And you learn how to push processing-intensive work into the database engine to be near the data and avoid undue networking traffic. Azure SQL is explained from a developer's point of view, helping you master its feature set and create applications that perform well and delight users. Core to the book is showing you how Azure SQL Database provides relational and post-relational support so that any workload can be managed with easy accessibility from any platform and any language. You will learn about features ranging from lock-free tables to columnstore indexes, and about support for data formats ranging from JSON and key-values to the nodes and edges in the graph database paradigm. Reading this book prepares you to deal with almost all data management challenges, allowing you to create lean and specialized solutions having the elasticity and scalability that are needed in the modern world. What You Will Learn Master Azure SQL Database in your development projects from design to the CI/CD pipeline Access your data from any programming language and platform Combine key-value, JSON, and relational data in the same database Push data-intensive compute work into the database for improved efficiency Delight your customers by detecting and improving poorly performing queries Enhance performance through features such as columnstore indexes and lock-free tables Build confidence in your mastery of Azure SQL Database's feature set Who This Book Is For Developers of applications and APIs that benefit from cloud database support, developers who wish to master their tools (including Azure SQL Database, and those who want their applications to be known for speedy performance and the elegance of their code

Azure SQL Revealed: A Guide to the Cloud for SQL Server Professionals

Access detailed content and examples on Azure SQL, a set of cloud services that allows for SQL Server to be deployed in the cloud. This book teaches the fundamentals of deployment, configuration, security, performance, and availability of Azure SQL from the perspective of these same tasks and capabilities in SQL Server. This distinct approach makes this book an ideal learning platform for readers familiar with SQL Server on-premises who want to migrate their skills toward providing cloud solutions to an enterprise market that is increasingly cloud-focused. If you know SQL Server, you will love this book. You will be able to take your existing knowledge of SQL Server and translate that knowledge into the world of cloud services from the Microsoft Azure platform, and in particular into Azure SQL. This book provides information never seen before about the history and architecture of Azure SQL. Author Bob Ward is a leading expert with access to and support fromthe Microsoft engineering team that built Azure SQL and related database cloud services. He presents powerful, behind-the-scenes insights into the workings of one of the most popular database cloud services in the industry. What You Will Learn Know the history of Azure SQL Deploy, configure, and connect to Azure SQL Choose the correct way to deploy SQL Server in Azure Migrate existing SQL Server instances to Azure SQL Monitor and tune Azure SQL’s performance to meet your needs Ensure your data and application are highly available Secure your data from attack and theft Who This Book Is For This book is designed to teach SQL Server in the Azure cloud to the SQL Server professional. Anyone who operates, manages, or develops applications for SQL Server will benefit from this book. Readers will be able to translate their current knowledge of SQL Server—especially of SQL Server 2019—directly to Azure. This book is ideal for database professionals looking to remain relevant as their customer base moves into the cloud.

Hybrid Multicloud Business Continuity for OpenShift Workloads with IBM Spectrum Virtualize in AWS

This publication is intended to facilitate the deployment of the hybrid cloud business continuity solution with Red Hat OpenShift Container Platform and IBM® block CSI (Container Storage Interface) driver plug-in for IBM Spectrum® Virtualize on Public Cloud AWS (Amazon Web Services). This solution is designed to protect the data by using IBM Storage-based Global Mirror replication. For demonstration purposes, MySQL containerized database is installed on the on-premises IBM FlashSystem® that is connected to the Red Hat OpenShift Container Platform (OCP) cluster in the vSphere environment through the IBM block CSI driver. The volume (LUN) on IBM FlashSystem storage system is replicated by using global mirror on IBM Spectrum Virtualize for Public Cloud on AWS. Red Hat OpenShift cluster (OCP cluster) and the IBM block CSI driver plug-in are installed on AWS by using Installer-Provisioned Infrastructure (IPI) methodology. The information in this document is distributed on an as-is basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Virtualize for Public Cloud is supported and entitled, and where the issues are specific to this Blueprint implementation.

Making Data Smarter with IBM Spectrum Discover: Practical AI Solutions

More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data, such as the following examples: Pinpointing and activating relevant data for large-scale analytics Lacking the fine-grained visibility that is needed to map data to business priorities Removing redundant, obsolete, and trivial (ROT) data Identifying and classifying sensitive data IBM® Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on-premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage. IBM Spectrum® Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research. This IBM Redbooks® publication presents several use cases that are focused on artificial intelligence (AI) solutions with IBM Spectrum Discover. This book helps storage administrators and technical specialists plan and implement AI solutions by using IBM Spectrum Discover and several other IBM Storage products.

SQL Server Data Automation Through Frameworks: Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory

Learn to automate SQL Server operations using frameworks built from metadata-driven stored procedures and SQL Server Integration Services (SSIS). Bring all the power of Transact-SQL (T-SQL) and Microsoft .NET to bear on your repetitive data, data integration, and ETL processes. Do this for no added cost over what you’ve already spent on licensing SQL Server. The tools and methods from this book may be applied to on-premises and Azure SQL Server instances. The SSIS framework from this book works in Azure Data Factory (ADF) and provides DevOps personnel the ability to execute child packages outside a project—functionality not natively available in SSIS. Frameworks not only reduce the time required to deliver enterprise functionality, but can also accelerate troubleshooting and problem resolution. You'll learn in this book how frameworks also improve code quality by using metadata to drive processes. Much of the work performed by data professionals can be classified as “drudge work”—tasks that are repetitive and template-based. The frameworks-based approach shown in this book helps you to avoid that drudgery by turning repetitive tasks into "one and done" operations. Frameworks as described in this book also support enterprise DevOps with built-in logging functionality. What You Will Learn Create a stored procedure framework to automate SQL process execution Base your framework on a working system of stored procedures and execution logging Create an SSIS framework to reduce the complexity of executing multiple SSIS packages Deploy stored procedure and SSIS frameworks to Azure Data Factory environments in the cloud Who This Book Is For Database administrators and developers who are involved in enterprise data projects built around stored procedures and SQL Server Integration Services (SSIS). Readersshould have a background in programming along with a desire to optimize their data efforts by implementing repeatable processes that support enterprise DevOps.

Security and Privacy Issues in IoT Devices and Sensor Networks

Security and Privacy Issues in IoT Devices and Sensor Networks investigates security breach issues in IoT and sensor networks, exploring various solutions. The book follows a two-fold approach, first focusing on the fundamentals and theory surrounding sensor networks and IoT security. It then explores practical solutions that can be implemented to develop security for these elements, providing case studies to enhance understanding. Machine learning techniques are covered, as well as other security paradigms, such as cloud security and cryptocurrency technologies. The book highlights how these techniques can be applied to identify attacks and vulnerabilities, preserve privacy, and enhance data security. This in-depth reference is ideal for industry professionals dealing with WSN and IoT systems who want to enhance the security of these systems. Additionally, researchers, material developers and technology specialists dealing with the multifarious aspects of data privacy and security enhancement will benefit from the book's comprehensive information. Provides insights into the latest research trends and theory in the field of sensor networks and IoT security Presents machine learning-based solutions for data security enhancement Discusses the challenges to implement various security techniques Informs on how analytics can be used in security and privacy

BigQuery for Data Warehousing: Managed Data Analysis in the Google Cloud

Create a data warehouse, complete with reporting and dashboards using Google’s BigQuery technology. This book takes you from the basic concepts of data warehousing through the design, build, load, and maintenance phases. You will build capabilities to capture data from the operational environment, and then mine and analyze that data for insight into making your business more successful. You will gain practical knowledge about how to use BigQuery to solve data challenges in your organization. BigQuery is a managed cloud platform from Google that provides enterprise data warehousing and reporting capabilities. Part I of this book shows you how to design and provision a data warehouse in the BigQuery platform. Part II teaches you how to load and stream your operational data into the warehouse to make it ready for analysis and reporting. Parts III and IV cover querying and maintaining, helping you keep your information relevant with other Google Cloud Platform services and advanced BigQuery. Part V takes reporting to the next level by showing you how to create dashboards to provide at-a-glance visual representations of your business situation. Part VI provides an introduction to data science with BigQuery, covering machine learning and Jupyter notebooks. What You Will Learn Design a data warehouse for your project or organization Load data from a variety of external and internal sources Integrate other Google Cloud Platform services for more complex workflows Maintain and scale your data warehouse as your organization grows Analyze, report, and create dashboards on the information in the warehouse Become familiar with machine learning techniques using BigQuery ML Who This Book Is For Developers who want to provide business users with fast, reliable, and insightful analysis from operational data, and data analysts interested in a cloud-based solution that avoids the pain of provisioning their own servers.

ETL with Azure Cookbook

ETL with Azure Cookbook is a comprehensive guide to building effective and scalable ETL solutions using the Azure cloud platform. Through hands-on recipes, this book explores the features and capabilities of Azure services for data integration and transformation, guiding you in creating efficient processes for moving and handling data. What this Book will help me do Master the basics and advanced techniques for building ETL processes on Azure. Learn practical skills in designing solutions that integrate multiple Azure services. Understand how to migrate existing on-premises ETL solutions to Azure successfully. Acquire knowledge of SQL Server and Azure Big Data Clusters for data integration. Gain experience in automating and optimizing data processes with BIML and Azure Databricks. Author(s) The authors of ETL with Azure Cookbook are experienced data engineers and Azure specialists with years of expertise in designing and implementing robust data solutions. Their professional journey includes hands-on work with SQL Server, Azure services, and scalable ETL frameworks. They aim to provide practical insights and actionable guidance to help readers achieve success in data engineering projects. Who is it for? This book is ideal for data architects, ETL developers, and IT professionals seeking to enhance their skills in data integration and transformation, particularly within the Azure ecosystem. It's suitable for individuals with some knowledge of data engineering principles, SQL, and familiarity with ETL processes who aim to adopt modern cloud-based approaches.