talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

Data structures based on non-linear relations and data processing methods

The systematic description starts with basic theory and applications of different kinds of data structures, including storage structures and models. It also explores on data processing methods such as sorting, index and search technologies. Due to its numerous exercises the book is a helpful reference for graduate students, lecturers.

Mathematical Foundations of Data Science Using R

In order best exploit the incredible quantities of data being generated in most diverse disciplines data sciences increasingly gain worldwide importance. The book gives the mathematical foundations to handle data properly. It introduces basics and functionalities of the R programming language which has become the indispensable tool for data sciences. Thus it delivers the reader the skills needed to build own tool kits of a modern data scientist.

SAS Stored Processes: A Practical Guide to Developing Web Applications

Customize the SAS Stored Process web application to create amazing tools for end users. This book shows you how to use stored processes—SAS programs stored on a server and executed as required by requesting applications. Never before have there been so many ways to turn data into information and build applications with SAS. This book teaches you how to use the web technologies that you frequently see used on impressive websites. By using SAS Stored Processes, you will be able to build applications that exploit CSS, JavaScript, and HTML libraries and enable you to build powerful and impressive web applications using SAS as the backend.While this approach is not common with SAS users, some have had amazing results. People who have SAS skills usually do not have web development skills, and those with web development skills usually do not have SAS skills. Some people have both skills but are unaware of how to connect them with the SAS Stored Process web application. This book shows you how to leverage your skills for success. What You Will Learn Know the benefits of stored processes Write your own tools in SAS Make a stored process generate its own HTML menu Pass data between stored processes Use stored processes to generate pure JavaScript Utilize data generated by SAS Convert a SAS program into a stored process Who This Book Is For SAS programmers looking to improve their existing programming skills to develop web applications, and programming managers who want to make better use of the SAS software they already license

Modern Data Mining Algorithms in C++ and CUDA C: Recent Developments in Feature Extraction and Selection Algorithms for Data Science

Discover a variety of data-mining algorithms that are useful for selecting small sets of important features from among unwieldy masses of candidates, or extracting useful features from measured variables. As a serious data miner you will often be faced with thousands of candidate features for your prediction or classification application, with most of the features being of little or no value. You’ll know that many of these features may be useful only in combination with certain other features while being practically worthless alone or in combination with most others. Some features may have enormous predictive power, but only within a small, specialized area of the feature space. The problems that plague modern data miners are endless. This book helps you solve this problem by presenting modern feature selection techniques and the code to implement them. Some of these techniques are: Forward selection component analysis Local feature selection Linking features and a target with a hidden Markov model Improvements on traditional stepwise selection Nominal-to-ordinal conversion All algorithms are intuitively justified and supported by the relevant equations and explanatory material. The author also presents and explains complete, highly commented source code. The example code is in C++ and CUDA C but Python or other code can be substituted; the algorithm is important, not the code that's used to write it. What You Will Learn Combine principal component analysis with forward and backward stepwise selection to identify a compact subset of a large collection of variables that captures the maximum possible variation within the entire set. Identify features that may have predictive power over only a small subset of the feature domain. Such features can be profitably used by modern predictive models but may be missed by other feature selection methods. Find an underlying hidden Markov model that controls the distributions of feature variables and the target simultaneously. The memory inherent in this method is especially valuable in high-noise applications such as prediction of financial markets. Improve traditional stepwise selection in three ways: examine a collection of 'best-so-far' feature sets; test candidate features for inclusion with cross validation to automatically and effectively limit model complexity; and at each step estimate the probability that our results so far could be just the product of random good luck. We also estimate the probability that the improvement obtained by adding a new variable could have been just good luck. Take a potentially valuable nominal variable (a category or class membership) that is unsuitable for input to a prediction model, and assign to each category a sensible numeric value that can be used as a model input. Who This Book Is For Intermediate to advanced data science programmers and analysts.

Spark in Action, Second Edition

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop. About the Technology Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem. About the Book Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms. What's Inside Writing Spark applications in Java Spark application architecture Ingestion through files, databases, streaming, and Elasticsearch Querying distributed datasets with Spark SQL About the Reader This book does not assume previous experience with Spark, Scala, or Hadoop. About the Author Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years. Quotes This book reveals the tools and secrets you need to drive innovation in your company or community. - Rob Thomas, IBM An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing. - Anupam Sengupta, GuardHat Inc. This book will help spark a love affair with distributed processing. - Conor Redmond, InComm Product Control Currently the best book on the subject! - Markus Breuer, Materna IPS

Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way

Understand and implement big data analysis solutions in pandas with an emphasis on performance. This book strengthens your intuition for working with pandas, the Python data analysis library, by exploring its underlying implementation and data structures. Thinking in Pandas introduces the topic of big data and demonstrates concepts by looking at exciting and impactful projects that pandas helped to solve. From there, you will learn to assess your own projects by size and type to see if pandas is the appropriate library for your needs. Author Hannah Stepanek explains how to load and normalize data in pandas efficiently, and reviews some of the most commonly used loaders and several of their most powerful options. You will then learn how to access and transform data efficiently, what methods to avoid, and when to employ more advanced performance techniques. You will also go over basic data access and munging in pandas and the intuitive dictionary syntax. Choosing the right DataFrame format, working with multi-level DataFrames, and how pandas might be improved upon in the future are also covered. By the end of the book, you will have a solid understanding of how the pandas library works under the hood. Get ready to make confident decisions in your own projects by utilizing pandas—the right way. What You Will Learn Understand the underlying data structure of pandas and why it performs the way it does under certain circumstances Discover how to use pandas to extract, transform, and load data correctly with an emphasis on performance Choose the right DataFrame so that the data analysis is simple and efficient. Improve performance of pandas operations with other Python libraries Who This Book Is For Software engineers with basic programming skills in Python keen on using pandas for a big data analysis project. Python software developers interested in big data.

Converting Adabas to IBM DB2 for z/OS with ConsistADS

Consist Advanced Development Solution (ConsistADS) is an end-to-end conversion solution that conversion and transparency methods for migrating to IBM® DB2® for z/OS® software. The solution includes DB2 for z/OS and several DB2 tools as part of the package. This IBM Redpaper™ publication explains the Natural and Adabas conversion to DB2 for z/OS by using ConsistADS. It includes prerequisite technical assessment requirements and conversion challenges. It also describes a real customer conversion scenario that was provided by the IBM Business Partners that facilitated these conversions for customers. Originally published in 2015, this paper has been updated in 2020 to include additional information about ConsistADS.

SQL Server on Azure Virtual Machines

Would you like to master deploying SQL Server in the cloud using Microsoft's Azure platform? With the hands-on guidance in this book, you'll explore how to set up and configure SQL Server on Azure Virtual Machines effectively. By the end, you'll have the knowledge to optimize, manage, and deploy your solutions. What this Book will help me do Understand platform availability for SQL Server in Azure Explore SQL Server IaaS and optimize its configuration Master deploying SQL Server on Linux and Windows in Azure Configure high-performance storage options tailored to SQL Server Learn disaster recovery strategies for SQL Server in Azure Author(s) Joey D'Antoni, Louis Davidson, Allan Hirt, and their co-authors bring years of experience in database management, cloud architecture, and technical writing. They aim to provide clear and actionable advice for working efficiently with SQL Server on Azure. Their insights come from real-world projects. Who is it for? This book is for developers, database administrators, and cloud architects who are looking to learn how to deploy SQL Server solutions on Azure Virtual Machines. If you are transitioning workloads to the cloud or need to manage or optimize such environments, this book will equip you with the skills you need. Basic SQL Server knowledge is helpful.

Data Lakes

The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata – supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.

Smarter Data Science

Organizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how. Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments. When an organization manages its data effectively, its data science program becomes a fully scalable function that’s both prescriptive and repeatable. With an understanding of data science principles, practitioners are also empowered to lead their organizations in establishing and deploying viable AI. They employ the tools of machine learning, deep learning, and AI to extract greater value from data for the benefit of the enterprise. By following a ladder framework that promotes prescriptive capabilities, organizations can make data science accessible to a range of team members, democratizing data science throughout the organization. Companies that collect, organize, and analyze data can move forward to additional data science achievements: Improving time-to-value with infused AI models for common use cases Optimizing knowledge work and business processes Utilizing AI-based business intelligence and data visualization Establishing a data topology to support general or highly specialized needs Successfully completing AI projects in a predictable manner Coordinating the use of AI from any compute node. From inner edges to outer edges: cloud, fog, and mist computing When they climb the ladder presented in this book, businesspeople and data scientists alike will be able to improve and foster repeatable capabilities. They will have the knowledge to maximize their AI and data assets for the benefit of their organizations.

Best practices and Getting Started Guide for Oracle on IBM LinuxONE

IBM® is a Platinum level Partner in the Oracle Partner Network, which delivers the proven combination of industry insight, extensive real-world Oracle applications experience, deep technical skills, and high-performance servers and storage to create a complete business solution with a defined return on investment. From application selection, purchase, and implementation to upgrade and maintenance, we help organizations reduce the total cost of ownership and the complexity of managing their current and future applications environment while building a solid base for business growth. Oracle Database running on Linux is available for deployment on IBM LinuxONE by using Redhat Enterprise Linux (RHEL) or SUSE Linux Enterprise Server (SLES). This enterprise-grade solution is designed to add value to Oracle Database solutions. This IBM Redpaper® publication focuses on accepted good practices for installing and getting started by using Oracle Database, which provides you with an environment that is optimized for performance, scalability, flexibility, and ease-of-management.

Machine Learning with SAS Viya

Master machine learning with SAS Viya! Machine learning can feel intimidating for new practitioners. Machine Learning with SAS Viya provides everything you need to know to get started with machine learning in SAS Viya, including decision trees, neural networks, and support vector machines. The analytics life cycle is covered from data preparation and discovery to deployment. Working with open-source code? Machine Learning with SAS Viya has you covered – step-by-step instructions are given on how to use SAS Model Manager tools with open source. SAS Model Studio features are highlighted to show how to carry out machine learning in SAS Viya. Demonstrations, practice tasks, and quizzes are included to help sharpen your skills. In this book, you will learn about: Supervised and unsupervised machine learning Data preparation and dealing with missing and unstructured data Model building and selection Improving and optimizing models Model deployment and monitoring performance

IBM GDPS Family: An Introduction to Concepts and Capabilities

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex® (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform

Use this guide to one of SQL Server 2019’s most impactful features—Big Data Clusters. You will learn about data virtualization and data lakes for this complete artificial intelligence (AI) and machine learning (ML) platform within the SQL Server database engine. You will know how to use Big Data Clusters to combine large volumes of streaming data for analysis along with data stored in a traditional database. For example, you can stream large volumes of data from Apache Spark in real time while executing Transact-SQL queries to bring in relevant additional data from your corporate, SQL Server database. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will Learn Install, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it wererelational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For Data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environments

CCBA® and CBAP® Certifications Study Guide

This comprehensive study guide is your companion to passing the CCBA® and CBAP® certification exams on your first attempt. Covering all knowledge areas from the BABOK Guide v3 in depth, it uses real-world scenarios to make concepts relatable and practical. You'll gain the skills and confidence needed to excel in business analysis and advance your career. What this Book will help me do Understand and apply the core topics of the BABOK® Guide v3 effectively. Acquire skills for planning, monitoring, and managing business analysis tasks. Learn techniques to handle elicitation, collaboration, and stakeholder engagement. Gain practical experience through case studies and mock exam questions. Prepare for the IIBA certification exams with guidance tailored to ensure your success. Author(s) Esta Lessing is a seasoned business analysis trainer and practitioner with over 18 years of experience in the field. As a licensed CBAP® trainer, she has helped numerous professionals achieve their certification goals. Her teaching approach integrates clear explanations, practical examples, and actionable advice to ensure a deep understanding of business analysis principles. Who is it for? This book is perfect for business analysts, consultants, and professionals aspiring to earn their IIBA certifications. It caters to those with foundation-level business analysis experience seeking structured guidance to enhance their skills and career opportunities. If you're preparing for the CCBA® or CBAP® certification exams, this guide is tailored for you.

Analytical Skills for AI and Data Science

While several market-leading companies have successfully transformed their business models by following data- and AI-driven paths, the vast majority have yet to reap the benefits. How can your business and analytics units gain a competitive advantage by capturing the full potential of this predictive revolution? This practical guide presents a battle-tested end-to-end method to help you translate business decisions into tractable prescriptive solutions using data and AI as fundamental inputs. Author Daniel Vaughan shows data scientists, analytics practitioners, and others interested in using AI to transform their businesses not only how to ask the right questions but also how to generate value using modern AI technologies and decision-making principles. You’ll explore several use cases common to many enterprises, complete with examples you can apply when working to solve your own issues. Break business decisions into stages that can be tackled using different skills from the analytical toolbox Identify and embrace uncertainty in decision making and protect against common human biases Customize optimal decisions to different customers using predictive and prescriptive methods and technologies Ask business questions that create high value through AI- and data-driven technologies

IBM FlashSystem 9100 Product Guide

This IBM® Redbooks® Product Guide publication describes IBM FlashSystem® 9100 solution, which is a comprehensive, all-flash, and NVMe-enabled enterprise storage solution that delivers the full capabilities of IBM FlashCore® technology. In addition, it provides a rich set of software-defined storage (SDS) features, including data reduction and de-duplication, dynamic tiering, thin-provisioning, snapshots, cloning, replication, data copy services, and IBM HyperSwap® for high availability (HA). Scale-out and scale-up configurations further enhance capacity and throughput for better availability.

Evolutionary Computation in Scheduling

Presents current developments in the field of evolutionary scheduling and demonstrates the applicability of evolutionary computational techniques to solving scheduling problems This book provides insight into the use of evolutionary computations (EC) in real-world scheduling, showing readers how to choose a specific evolutionary computation and how to validate the results using metrics and statistics. It offers a spectrum of real-world optimization problems, including applications of EC in industry and service organizations such as healthcare scheduling, aircraft industry, school timetabling, manufacturing systems, and transportation scheduling in the supply chain. It also features problems with different degrees of complexity, practical requirements, user constraints, and MOEC solution approaches. Evolutionary Computation in Scheduling starts with a chapter on scientometric analysis to analyze scientific literature in evolutionary computation in scheduling. It then examines the role and impacts of ant colony optimization (ACO) in job shop scheduling problems, before presenting the application of the ACO algorithm in healthcare scheduling. Other chapters explore task scheduling in heterogeneous computing systems and truck scheduling using swarm intelligence, application of sub-population scheduling algorithm in multi-population evolutionary dynamic optimization, task scheduling in cloud environments, scheduling of robotic disassembly in remanufacturing using the bees algorithm, and more. This book: Provides a representative sampling of real-world problems currently being tackled by practitioners Examines a variety of single-, multi-, and many-objective problems that have been solved using evolutionary computations, including evolutionary algorithms and swarm intelligence Consists of four main parts: Introduction to Scheduling Problems, Computational Issues in Scheduling Problems, Evolutionary Computation, and Evolutionary Computations for Scheduling Problems Evolutionary Computation in Scheduling is ideal for engineers in industries, research scholars, advanced undergraduates and graduate students, and faculty teaching and conducting research in Operations Research and Industrial Engineering.

IBM DS8900F Architecture and Implementation

This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM DS8900F family. This book provides reference information to assist readers who need to plan for, install, and configure the DS8900F systems. This edition applies to DS8900F systems running microcode Release 9.0 (Bundle 89.0 / Licensed Machine Code (LMC) 7.9.0.xxx). The DS8900F family offers two new classes: IBM DS8910F: Flexibility Class all-flash: The Flexibility Class is designed to reduce complexity while addressing various workloads at the lowest DS8900F family entry cost. IBM DS8950F: Agility Class all-flash: The Agility Class is designed to consolidate all your mission-critical workloads for IBM Z®, IBM LinuxONE, IBM Power Systems, and distributed environments under a single all-flash storage solution. The DS8900F architecture relies on powerful IBM POWER9™ processor-based servers that manage the cache to streamline disk input/output (I/O), which maximizes performance and throughput. These capabilities are further enhanced by High-Performance Flash Enclosures (HPFE) Gen2. Like its predecessors, the DS8900F supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. The IBM DS8910F Rack-Mounted model 993 is described in a separate publication, IIBM DS8910F Model 993 Rack-Mounted Storage System, REDP-5566.