talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Data Engineering with Apache Spark, Delta Lake, and Lakehouse is a comprehensive guide packed with practical knowledge for building robust and scalable data pipelines. Throughout this book, you will explore the core concepts and applications of Apache Spark and Delta Lake, and learn how to design and implement efficient data engineering workflows using real-world examples. What this Book will help me do Master the core concepts and components of Apache Spark and Delta Lake. Create scalable and secure data pipelines for efficient data processing. Learn best practices and patterns for building enterprise-grade data lakes. Discover how to operationalize data models into production-ready pipelines. Gain insights into deploying and monitoring data pipelines effectively. Author(s) None Kukreja is a seasoned data engineer with over a decade of experience working with big data platforms. He specializes in implementing efficient and scalable data solutions to meet the demands of modern analytics and data science. Writing with clarity and a practical approach, he aims to provide actionable insights that professionals can apply to their projects. Who is it for? This book is tailored for aspiring data engineers and data analysts who wish to delve deeper into building scalable data platforms. It is suitable for those with basic knowledge of Python, Spark, and SQL, and seeking to learn Delta Lake and advanced data engineering concepts. Readers should be eager to develop practical skills for tackling real-world data engineering challenges.

Computational Intelligence and Healthcare Informatics

COMPUTATIONAL INTELLIGENCE and HEALTHCARE INFORMATICS The book provides the state-of-the-art innovation, research, design, and implements methodological and algorithmic solutions to data processing problems, designing and analysing evolving trends in health informatics, intelligent disease prediction, and computer-aided diagnosis. Computational intelligence (CI) refers to the ability of computers to accomplish tasks that are normally completed by intelligent beings such as humans and animals. With the rapid advance of technology, artificial intelligence (AI) techniques are being effectively used in the fields of health to improve the efficiency of treatments, avoid the risk of false diagnoses, make therapeutic decisions, and predict the outcome in many clinical scenarios. Modern health treatments are faced with the challenge of acquiring, analyzing and applying the large amount of knowledge necessary to solve complex problems. Computational intelligence in healthcare mainly uses computer techniques to perform clinical diagnoses and suggest treatments. In the present scenario of computing, CI tools present adaptive mechanisms that permit the understanding of data in difficult and changing environments. The desired results of CI technologies profit medical fields by assembling patients with the same types of diseases or fitness problems so that healthcare facilities can provide effectual treatments. This book starts with the fundamentals of computer intelligence and the techniques and procedures associated with it. Contained in this book are state-of-the-art methods of computational intelligence and other allied techniques used in the healthcare system, as well as advances in different CI methods that will confront the problem of effective data analysis and storage faced by healthcare institutions. The objective of this book is to provide researchers with a platform encompassing state-of-the-art innovations; research and design; implementation of methodological and algorithmic solutions to data processing problems; and the design and analysis of evolving trends in health informatics, intelligent disease prediction and computer-aided diagnosis. Audience The book is of interest to artificial intelligence and biomedical scientists, researchers, engineers and students in various settings such as pharmaceutical & biotechnology companies, virtual assistants developing companies, medical imaging & diagnostics centers, wearable device designers, healthcare assistance robot manufacturers, precision medicine testers, hospital management, and researchers working in healthcare system.

Computation in BioInformatics

COMPUTATION IN BIOINFORMATICS Bioinformatics is a platform between the biology and information technology and this book provides readers with an understanding of the use of bioinformatics tools in new drug design. The discovery of new solutions to pandemics is facilitated through the use of promising bioinformatics techniques and integrated approaches. This book covers a broad spectrum of the bioinformatics field, starting with the basic principles, concepts, and application areas. Also covered is the role of bioinformatics in drug design and discovery, including aspects of molecular modeling. Some of the chapters provide detailed information on bioinformatics related topics, such as silicon design, protein modeling, DNA microarray analysis, DNA-RNA barcoding, and gene sequencing, all of which are currently needed in the industry. Also included are specialized topics, such as bioinformatics in cancer detection, genomics, and proteomics. Moreover, a few chapters explain highly advanced topics, like machine learning and covalent approaches to drug design and discovery, all of which are significant in pharma and biotech research and development. Audience Researchers and engineers in computation biology, information technology, bioinformatics, drug design, biotechnology, pharmaceutical sciences.

Enhanced Cyber Resilience Threat Detection with IBM FlashSystem Safeguarded Copy and IBM QRadar

The focus of this document is to demonstrate an early threat detection by using IBM® QRadar® and the Safeguarded Copy feature that is available as part of IBM FlashSystem® and IBM SAN Volume Controller. Such early detection protects and quickly recovers the data if a cyberattack occurs. This document describes integrating IBM FlashSystem audit logs with IBM QRadar, and the configuration steps for IBM FlashSystem and IBM QRadar. It also explains how to use the IBM QRadar's device support module (DSM) editor to normalize events and assign IBM QRadar identifier (QID) map to the events. Post IBM QRadar configuration, we review configuring Safeguarded Copy on the application volumes by using volume groups and applying Safeguarded backup polices on the volume group. Finally, we demonstrate the use of orchestration software IBM Copy Services Manager to start a recovery, restore operations for data restoration on online volumes, and start a backup of data volumes.

IBM Spectrum Protect Plus Protecting Database Applications

IBM® Spectrum Protect Plus is a data protection solution that provides near-instant recovery, replication, retention management, and reuse for virtual machines, databases, and application backups in hybrid multicloud environments. This IBM Redpaper publication focuses on protecting database applications. IBM Spectrum® Protect Plus supports backup, restore, and data reuse for multiple databases, such as Oracle, IBM Db2®, MongoDB, Microsoft Exchange, and Microsoft SQL Server. Although other IBM Spectrum Protect Plus features focus on virtual environments, the database and application support of IBM Spectrum Protect Plus includes databases on virtual physical servers.

Power Query Cookbook

The "Power Query Cookbook" is your comprehensive guide to mastering data preparation and transformation using Power Query. With this book, you'll learn to connect to data sources, reshape data to fit business requirements, and use both no-code transformations and custom M code solutions to unlock the full potential of your data. Step-by-step examples will guide you through optimizing dataflows in Power BI. What this Book will help me do Master connecting to various data sources and performing intuitive transformations using Power Query. Learn to reshape and enrich data to meet complex business requirements efficiently. Explore advanced capabilities of Power Query, including M code and online dataflows. Develop custom data transformations with a blend of GUI-based and M code techniques. Optimize the performance of Power BI Dataflows using best practices and diagnostics tools. Author(s) None Janicijevic is a seasoned expert in data analytics, specializing in Microsoft Power BI and Power Query. With years of experience in data engineering and a passion for teaching, None brings a clear, actionable, and results-driven approach to demystifying complex technical concepts. Their work empowers professionals with the tools they need to excel in data-driven decision-making. Who is it for? This book is designed for data analysts, business intelligence developers, and data engineers aiming to enhance their skills in data preparation using Power Query. If you have a basic understanding of Power BI and want to delve into integrating and optimizing data from multiple sources, this book is for you. It's ideal for professionals seeking practical insights and techniques to improve data transformations. Novices with some exposure to BI tools will also find the material accessible and rewarding.

2021 Data/AI Salary Survey

Curious about what technologies will have the biggest impact on salaries in the coming year? Want to determine whether a particular certification is worth going for? Looking for the most lucrative programming language to learn next? Are you hiring for a data team? Or do you just want to see how your skills and salary compare to others in the field? Get answers to your salary questions in the 2021 Data/AI Salary Survey .

IBM FlashSystem Best Practices and Performance Guidelines

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM FlashSystem products. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, remote copy services, and hosts. It explains how you can optimize disk performance with the IBM System Storage® Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting. This book is intended for experienced storage, SAN, IBM FlashSystem, SAN Volume Controller (SVC), and IBM Storwize® administrators and technicians. Understanding this book requires advanced knowledge of these environments.

Storage Systems

Storage Systems: Organization, Performance, Coding, Reliability and Their Data Processing was motivated by the 1988 Redundant Array of Inexpensive/Independent Disks proposal to replace large form factor mainframe disks with an array of commodity disks. Disk loads are balanced by striping data into strips—with one strip per disk— and storage reliability is enhanced via replication or erasure coding, which at best dedicates k strips per stripe to tolerate k disk failures. Flash memories have resulted in a paradigm shift with Solid State Drives (SSDs) replacing Hard Disk Drives (HDDs) for high performance applications. RAID and Flash have resulted in the emergence of new storage companies, namely EMC, NetApp, SanDisk, and Purestorage, and a multibillion-dollar storage market. Key new conferences and publications are reviewed in this book.The goal of the book is to expose students, researchers, and IT professionals to the more important developments in storage systems, while covering the evolution of storage technologies, traditional and novel databases, and novel sources of data. We describe several prototypes: FAWN at CMU, RAMCloud at Stanford, and Lightstore at MIT; Oracle's Exadata, AWS' Aurora, Alibaba's PolarDB, Fungible Data Center; and author's paper designs for cloud storage, namely heterogeneous disk arrays and hierarchical RAID. Surveys storage technologies and lists sources of data: measurements, text, audio, images, and video Familiarizes with paradigms to improve performance: caching, prefetching, log-structured file systems, and merge-trees (LSMs) Describes RAID organizations and analyzes their performance and reliability Conserves storage via data compression, deduplication, compaction, and secures data via encryption Specifies implications of storage technologies on performance and power consumption Exemplifies database parallelism for big data, analytics, deep learning via multicore CPUs, GPUs, FPGAs, and ASICs, e.g., Google's Tensor Processing Units

Pro Data Visualization Using R and JavaScript: Analyze and Visualize Key Data on the Web

Use R 4, RStudio, Tidyverse, and Shiny to interrogate and analyze your data, and then use the D3 JavaScript library to format and display that data in an elegant, informative, and interactive way. You will learn how to gather data effectively, and also how to understand the philosophy and implementation of each type of chart, so as to be able to represent the results visually. With the popularity of the R language, the art and practice of creating data visualizations is no longer the preserve of mathematicians, statisticians, or cartographers. As technology leaders, we can gather metrics around what we do and use data visualizations to communicate that information. Pro Data Visualization Using R and JavaScript combines the power of the R language with the simplicity and familiarity of JavaScript to display clear and informative data visualizations. Gathering and analyzing empirical data is the key to truly understanding anything. We can track operational metrics to quantify the health of our products in production. We can track quality metrics of our projects, and even use our data to identify bad code. Visualizing this data allows anyone to read our analysis and easily get a deep understanding of the story the data tells. This book makes the R language approachable, and promotes the idea of data gathering and analysis mostly using web interfaces. What You Will Learn Carry out data visualization using R and JavaScript Use RStudio for data visualization Harness Tidyverse data pipelines Apply D3 and R Notebooks towards your data Work with the R Plumber API generator, Shiny, and more Who This Book Is For Programmers and data scientists/analysts who have some prior experience with R and JavaScript.

Snowflake Security: Securing Your Snowflake Data Cloud

This book is your complete guide to Snowflake security, covering account security, authentication, data access control, logging and monitoring, and more. It will help you make sure that you are using the security controls in a right way, are on top of access control, and making the most of the security features in Snowflake. Snowflake is the fastest growing cloud data warehouse in the world, and having the right methodology to protect the data is important both to data engineers and security teams. It allows for faster data enablement for organizations, as well as reducing security risks, meeting compliance requirements, and solving data privacy challenges. There are currently tens of thousands of people who are either data engineers/data ops in Snowflake-using organizations, or security people in such organizations. This book provides guidance when you want to apply certain capabilities, such as data masking, row-level security, column-level security, tackling rolehierarchy, building monitoring dashboards, etc., to your organizations. What You Will Learn Implement security best practices for Snowflake Set up user provisioning, MFA, OAuth, and SSO Set up a Snowflake security model Design roles architecture Use advanced access control such as row-based security and dynamic masking Audit and monitor your Snowflake Data Cloud Who This Book Is For Data engineers, data privacy professionals, and security teams either with security knowledge (preferably some data security knowledge) or with data engineering knowledge; in other words, either “Snowflake people” or “data people” who want to get security right, or “security people” who want to make sure that Snowflake gets handled right in terms of security

Text as Data

Text As Data: Combining qualitative and quantitative algorithms within the SAS system for accurate, effective and understandable text analytics The need for powerful, accurate and increasingly automatic text analysis software in modern information technology has dramatically increased. Fields as diverse as financial management, fraud and cybercrime prevention, Pharmaceutical R&D, social media marketing, customer care, and health services are implementing more comprehensive text-inclusive, analytics strategies. Text as Data: Computational Methods of Understanding Written Expression Using SAS presents an overview of text analytics and the critical role SAS software plays in combining linguistic and quantitative algorithms in the evolution of this dynamic field. Drawing on over two decades of experience in text analytics, authors Barry deVille and Gurpreet Singh Bawa examine the evolution of text mining and cloud-based solutions, and the development of SAS Visual Text Analytics. By integrating quantitative data and textual analysis with advanced computer learning principles, the authors demonstrate the combined advantages of SAS compared to standard approaches, and show how approaching text as qualitative data within a quantitative analytics framework produces more detailed, accurate, and explanatory results. Understand the role of linguistics, machine learning, and multiple data sources in the text analytics workflow Understand how a range of quantitative algorithms and data representations reflect contextual effects to shape meaning and understanding Access online data and code repositories, videos, tutorials, and case studies Learn how SAS extends quantitative algorithms to produce expanded text analytics capabilities Redefine text in terms of data for more accurate analysis This book offers a thorough introduction to the framework and dynamics of text analytics—and the underlying principles at work—and provides an in-depth examination of the interplay between qualitative-linguistic and quantitative, data-driven aspects of data analysis. The treatment begins with a discussion on expression parsing and detection and provides insight into the core principles and practices of text parsing, theme, and topic detection. It includes advanced topics such as contextual effects in numeric and textual data manipulation, fine-tuning text meaning and disambiguation. As the first resource to leverage the power of SAS for text analytics, Text as Data is an essential resource for SAS users and data scientists in any industry or academic application.

IBM FlashSystem 5000 and 5200 for Mid-Market

The IBM® FlashSystem 5015, 5035, and 5200 help you meet the challenges of rapid data growth while staying within limited IT budgets. These systems allow you to quickly consolidate, simplify, and optimize your IT infrastructure with an efficient, highly flexible, yet easy-to-use storage system with powerful virtualization features. This IBM Redpaper™ publication is intended for mid-market clients.

Fabric Resiliency and Best Practices for IBM c-type Products

This IBM Redpaper publication describes best practices for deploying and using advanced Cisco NX-OS features to identify, monitor, and protect Fibre Channel (FC) Storage Area Networks (SANs) from problematic devices and media behavior. The paper focuses on the IBM c-type SAN switches with firmware Cisco MDS NX-OS Release 8.4(2a).

Microsoft Power BI Cookbook - Second Edition

"Microsoft Power BI Cookbook" is an advanced reference for professionals working with Power BI. Featuring over 90 practical, hands-on recipes, this book allows you to master Power BI for data modeling, creating dashboards, and optimizing queries. You will learn practical tips and techniques, enabling you to create effective and customized Power BI solutions for various business needs. What this Book will help me do Master advanced data cleansing and integration techniques in Power BI's Power Query Editor. Develop intuitive, efficient dashboards and reports using best practices for data visualization. Optimize performance for large datasets using aggregation tables and efficient query techniques. Implement sophisticated analysis and business logic using the power of DAX programming language. Deploy and manage Power BI solutions leveraging integration with Microsoft ecosystem tools. Author(s) Greg Deckler and None Powell are seasoned Power BI experts with extensive backgrounds in business intelligence and data solutions. Greg is a recognized Power BI consultant and author with a focus on delivering impactful BI solutions. None brings their experience in utilizing Power BI for diverse organizational needs. Together, they emphasize hands-on learning and actionable insights in their collaborative writing. Who is it for? This book is aimed at business intelligence professionals who already have a basic understanding of Power BI. Ideal readers are those seeking to deepen their knowledge of advanced features and apply best practices in their projects. Whether you're enhancing your existing Power BI skills or managing complex datasets, this book will provide the techniques and insights to excel in your role.

Pandas in Action

Take the next steps in your data science career! This friendly and hands-on guide shows you how to start mastering Pandas with skills you already know from spreadsheet software. In Pandas in Action you will learn how to: Import datasets, identify issues with their data structures, and optimize them for efficiency Sort, filter, pivot, and draw conclusions from a dataset and its subsets Identify trends from text-based and time-based data Organize, group, merge, and join separate datasets Use a GroupBy object to store multiple DataFrames Pandas has rapidly become one of Python's most popular data analysis libraries. In Pandas in Action, a friendly and example-rich introduction, author Boris Paskhaver shows you how to master this versatile tool and take the next steps in your data science career. You’ll learn how easy Pandas makes it to efficiently sort, analyze, filter and munge almost any type of data. About the Technology Data analysis with Python doesn’t have to be hard. If you can use a spreadsheet, you can learn pandas! While its grid-style layouts may remind you of Excel, pandas is far more flexible and powerful. This Python library quickly performs operations on millions of rows, and it interfaces easily with other tools in the Python data ecosystem. It’s a perfect way to up your data game. About the Book Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you’ll find in the real world. What's Inside Organize, group, merge, split, and join datasets Find trends in text-based and time-based data Sort, filter, pivot, optimize, and draw conclusions Apply aggregate operations About the Reader For readers experienced with spreadsheets and basic Python programming. About the Author Boris Paskhaver is a software engineer, Agile consultant, and online educator. His programming courses have been taken by 300,000 students across 190 countries. Quotes Of all the introductory pandas books I’ve read—and I did read a few—this is the best, by a mile. - Erico Lendzian, idibu.com This approachable guide will get you up and running quickly with all the basics you need to analyze your data. - Jonathan Sharley, SiriusXM Media Understanding and putting in practice the concepts of this book will help you increase productivity and make you look like a pro. - Jose Apablaza, Steadfast Networks Teaches both novice and expert Python users the essential concepts required for data analysis and data science. - Ben McNamara, DataGeek

Azure Databricks Cookbook

Azure Databricks is a robust analytics platform that leverages Apache Spark and seamlessly integrates with Azure services. In the Azure Databricks Cookbook, you'll find hands-on recipes to ingest data, build modern data pipelines, and perform real-time analytics while learning to optimize and secure your solutions. What this Book will help me do Design advanced data workflows integrating Azure Synapse, Cosmos DB, and streaming sources with Databricks. Gain proficiency in using Delta Tables and Spark for efficient data storage and analysis. Learn to create, deploy, and manage real-time dashboards with Databricks SQL. Master CI/CD pipelines for automating deployments of Databricks solutions. Understand security best practices for restricting access and monitoring Azure Databricks. Author(s) None Raj and None Jaiswal are experienced professionals in the field of big data and analytics. They are well-versed in implementing Azure Databricks solutions for real-world problems. Their collaborative writing approach ensures clarity and practical focus. Who is it for? This book is tailored for data engineers, scientists, and big data professionals who want to apply Azure Databricks and Apache Spark to their analytics workflows. A basic familiarity with Spark and Azure is recommended to make the best use of the recipes provided. If you're looking to scale and optimize your analytics pipelines, this book is for you.

Securing Data on Threat Detection by Using IBM Spectrum Scale and IBM QRadar: An Enhanced Cyber Resiliency Solution

Having appropriate storage for hosting business-critical data and advanced Security Information and Event Management (SIEM) software for deep inspection, detection, and prioritization of threats has become a necessity for any business. This IBM® Redpaper publication explains how the storage features of IBM Spectrum® Scale, when combined with the log analysis, deep inspection, and detection of threats that are provided by IBM QRadar®, help reduce the impact of incidents on business data. Such integration provides an excellent platform for hosting unstructured business data that is subject to regulatory compliance requirements. This paper describes how IBM Spectrum Scale File Audit Logging can be integrated with IBM QRadar. Using IBM QRadar, an administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data that is stored on IBM Spectrum Scale. When the threats are identified, you can quickly act on them to mitigate or reduce the impact of incidents. We further demonstrate how the threat detection by IBM QRadar can proactively trigger data snapshots or cyber resiliency workflow in IBM Spectrum Scale to protect the data during threat. This third edition has added the section "Ransomware threat detection", where we describe a ransomware attack scenario within an environment to leverage IBM Spectrum Scale File Audit logs integration with IBM QRadar. This paper is intended for chief technology officers, solution engineers, security architects, and systems administrators. This paper assumes a basic understanding of IBM Spectrum Scale and IBM QRadar and their administration.

PostGIS in Action, Third Edition

In PostGIS in Action, Third Edition you will learn: An introduction to spatial databases Geometry, geography, raster, and topology spatial types, functions, and queries Applying PostGIS to real-world problems Extending PostGIS to web and desktop applications Querying data from external sources using PostgreSQL Foreign Data Wrappers Optimizing queries for maximum speed Simplifying geometries for greater efficiency PostGIS in Action, Third Edition teaches readers of all levels to write spatial queries for PostgreSQL. You’ll start by exploring vector-, raster-, and topology-based GIS before quickly progressing to analyzing, viewing, and mapping data. This fully updated third edition covers key changes in PostGIS 3.1 and PostgreSQL 13, including parallelization support, partitioned tables, and new JSON functions that help in creating web mapping applications. About the Technology PostGIS is a spatial database extender for PostgreSQL. It offers the features and firepower you need to take on nearly any geodata task. PostGIS lets you create location-aware queries with a few lines of SQL code, then build the backend for mapping, raster analysis, or routing application with minimal effort. About the Book PostGIS in Action, Third Edition shows you how to solve real-world geodata problems. You’ll go beyond basic mapping, and explore custom functions for your applications. Inside this fully updated edition, you’ll find coverage of new PostGIS features such as PostGIS Window functions, parallelization of queries, and outputting data for applications using JSON and Vector Tile functions. What's Inside Fully revised for PostGIS version 3.1 and PostgreSQL 13 Optimize queries for maximum speed Simplify geometries for greater efficiency Extend PostGIS to web and desktop applications About the Reader For readers familiar with relational databases and basic SQL. No prior geodata or GIS experience required. About the Authors Regina Obe and Leo Hsu are database consultants and authors. Regina is a member of the PostGIS core development team and the Project Steering Committee. Quotes The best introduction I’ve seen for engineers who want to get ramped up quickly and build advanced GIS applications. - Ikechukwu Okonkwo, Orum.io A wealth of information that showcases how powerful PostGIS is. - Luis Moux-Dominguez, EMO An extraordinary book for the world of GIS. Truly learned a lot! - DeUndre’ Rushon, DigiDiscover LLC Gives you insight into how best to provide map services for a wide audience. - Marcus Brown, Enel Green Power