talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

R in Action, Second Edition

R in Action, Second Edition presents both the R language and the examples that make it so useful for business developers. Focusing on practical solutions, the book offers a crash course in statistics and covers elegant methods for dealing with messy and incomplete data that are difficult to analyze using traditional methods. You'll also master R's extensive graphical capabilities for exploring and presenting data visually. And this expanded second edition includes new chapters on time series analysis, cluster analysis, and classification methodologies, including decision trees, random forests, and support vector machines. About the Technology Business pros and researchers thrive on data, and R speaks the language of data analysis. R is a powerful programming language for statistical computing. Unlike general-purpose tools, R provides thousands of modules for solving just about any data-crunching or presentation challenge you're likely to face. R runs on all important platforms and is used by thousands of major corporations and institutions worldwide. About the Book R in Action, Second Edition teaches you how to use the R language by presenting examples relevant to scientific, technical, and business developers. Focusing on practical solutions, the book offers a crash course in statistics, including elegant methods for dealing with messy and incomplete data. You'll also master R's extensive graphical capabilities for exploring and presenting data visually. And this expanded second edition includes new chapters on forecasting, data mining, and dynamic report writing. What's Inside Complete R language tutorial Using R to manage, analyze, and visualize data Techniques for debugging programs and creating packages OOP in R Over 160 graphs About the Reader This book is designed for readers who need to solve practical data analysis problems using the R language and tools. Some background in mathematics and statistics is helpful, but no prior experience with R or computer programming is required. About the Author Dr. Rob Kabacoff is a seasoned researcher who specializes in data analysis. He has taught graduate courses in statistical programming and manages the Quick-R website at statmethods.net. Quotes Essential to anyone doing data analysis with R, whether in industry or academia. - Cristofer Weber, NeoGrid A go-to reference for general R and many statistics questions. - George Gaines, KYOS Systems Inc. Accessible language, realistic examples, and clear code. - Samuel D. McQuillin, University of Houston Offers a gentle learning curve to those starting out with R for the first time. - Indrajit Sen Gupta, Mu Sigma Business Solutions

Simple Statistical Methods for Software Engineering

Although there are countless books on statistics, few are dedicated to the application of statistical methods to software engineering. This book fills that void. Instead of delving into overly complex statistics, it focuses on simpler solutions that are just as effective. The authors not only explain the required statistical methods, but also supply detailed examples, stories, and case studies that facilitate the understanding required to apply those methods in real-world software engineering applications.

Navigating the Health Data Ecosystem

Data-driven technologies are now being adopted, developed, funded, and deployed throughout the health care market at an unprecedented scale. But, as this O'Reilly report reveals, health care innovation contains more hurdles and requires more finesse than many tech startups expect. By paying attention to the lessons from the report's findings, innovation teams can better anticipate what they'll face, and plan accordingly. Simply put, teams looking to apply collective intelligence and "big data" platforms to health and health care problems often don't appreciate the messy details of using and making sense of data in the heavily regulated hospital IT environment. Download this report today and learn how it helps prepare startups in six areas: Complexity: An enormous domain with noisy data not designed for machine consumption Computing: Lack of standard, interoperable schema for documenting human health in a digital format Context: Lack of critical contextual metadata for interpreting health data Culture: Startup difficulties in hospital ecosystems: why innovation can be a two-edged sword Contracts: Navigating the IRB, HIPAA, and EULA frameworks Commerce: The problem of how digital health startups get paid This report represents the initial findings of a study funded by a grant from the Robert Wood Johnson Foundation. Subsequent reports will explore the results of three deep-dive projects the team pursued during the study.

PROC TABULATE by Example, Second Edition

An abundance of real-world examples highlights Lauren Haworth Lake’s and Julie McKnight's PROC TABULATE by Example, Second Edition. Beginning and intermediate SAS® users will find this step-by-step guide to producing tables and reports using the TABULATE procedure both convenient and inviting. Topics are presented in order of increasing complexity, making this an excellent training manual or self-tutorial. The concise format also makes this a quick reference guide for specific applications for more advanced users. A very handy section on common problems and their solutions is also included. With this book, you will quickly learn how to generate tables using macros, handle percentages and missing data, modify row and column headings, and produce one-, two-, and three-dimensional tables using PROC TABULATE. Also provided are more advanced tips on complex formatting with the Output Delivery System (ODS) and exporting PROC TABULATE output to other applications.

FileMaker Pro 14: The Missing Manual

You don’t need a technical background to build powerful databases with FileMaker Pro 14. This crystal-clear, objective guide shows you how to create a database that lets you do almost anything with your data so you can quickly achieve your goals. Whether you’re creating catalogs, managing inventory and billing, or planning a wedding, you’ll learn how to customize your database to run on a PC, Mac, web browser, or iOS device. The important stuff you need to know: Dive into relational data. Solve problems quickly by connecting and combining data from different tables. Create professional documents. Publish reports, charts, invoices, catalogs, and other documents with ease. Access data anywhere. Use FileMaker Go on your iPad or iPhone—or share data on the Web. Harness processing power. Use new calculation and scripting tools to crunch numbers, search text, and automate tasks. Run your database on a secure server. Learn the high-level features of FileMaker Pro Advanced. Keep your data safe. Set privileges and allow data sharing with FileMaker’s streamlined security features.

Apache Oozie

Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases. Once you set up your Oozie server, you’ll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie’s security capabilities. Install and configure an Oozie server, and get an overview of basic concepts Journey through the world of writing and configuring workflows Learn how the Oozie coordinator schedules and executes workflows based on triggers Understand how Oozie manages data dependencies Use Oozie bundles to package several coordinator apps into a data pipeline Learn about security features and shared library management Implement custom extensions and write your own EL functions and actions Debug workflows and manage Oozie’s operational details

Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python

Now a leader of Northwestern University's prestigious analytics program presents a fully-integrated treatment of both the business and academic elements of marketing applications in predictive analytics. Writing for both managers and students, Thomas W. Miller explains essential concepts, principles, and theory in the context of real-world applications. , Building on Miller's pioneering program, thoroughly addresses segmentation, target marketing, brand and product positioning, new product development, choice modeling, recommender systems, pricing research, retail site selection, demand estimation, sales forecasting, customer retention, and lifetime value analysis. Marketing Data Science Starting where Miller's widely-praised Modeling Techniques in Predictive Analytics left off, he integrates crucial information and insights that were previously segregated in texts on web analytics, network science, information technology, and programming. Coverage includes: The role of analytics in delivering effective messages on the web Understanding the web by understanding its hidden structures Being recognized on the web – and watching your own competitors Visualizing networks and understanding communities within them Measuring sentiment and making recommendations Leveraging key data science methods: databases/data preparation, classical/Bayesian statistics, regression/classification, machine learning, and text analytics Six complete case studies address exceptionally relevant issues such as: separating legitimate email from spam; identifying legally-relevant information for lawsuit discovery; gleaning insights from anonymous web surfing data, and more. This text's extensive set of web and network problems draw on rich public-domain data sources; many are accompanied by solutions in Python and/or R. will be an invaluable resource for all students, faculty, and professional marketers who want to use business analytics to improve marketing performance. Marketing Data Science

Implementation Best Practices for IBM DB2 BLU Acceleration with SAP BW on IBM Power Systems

BLU Acceleration is a new technology that has been developed by IBM® and integrated directly into the IBM DB2® engine. BLU Acceleration is a new storage engine along with integrated run time (directly into the core DB2 engine) to support the storage and analysis of column-organized tables. The BLU Acceleration processing is parallel to the regular, row-based table processing found in the DB2 engine. This is not a bolt-on technology nor is it a separate analytic engine that sits outside of DB2. Much like when IBM added XML data as a first class object within the database along with all the storage and processing enhancements that came with XML, now IBM has added column-organized tables directly into the storage and processing engine of DB2. This IBM Redbooks® publication shows examples on an IBM Power Systems™ entry server as a starter configuration for small organizations, and build larger configurations with IBM Power Systems larger servers. This publication takes you through how to build a BLU Acceleration solution on IBM POWER® having SAP Landscape integrated to it. This publication implements SAP NetWeaver Business Warehouse Systems as part of the scenario using another DB2 Feature called Near-Line Storage (NLS), on IBM POWER virtualization features to develop and document best recommendation scenarios. This publication is targeted towards technical professionals (DBAs, data architects, consultants, technical support staff, and IT specialists) responsible for delivering cost-effective data management solutions to provide the best system configuration for their clients' data analytics on Power Systems.

Statistical Learning with Sparsity

Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of ℓ 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

Current State of Big Data Use in Retail Supply Chains

Innovation, consisting of invention, adoption, and deployment of new technology and associated process improvements, is a key source of competitive advantages. Big Data is an innovation that has been gaining prominence in retailing and other industries. In fact, managers working in retail supply chain member firms (that is, retailers, manufacturers, distributors, wholesalers, logistics providers, and other service providers) have increasingly been trying to understand what Big Data entails, what it may be used for, and how to make it an integral part of their businesses. This report covers Big Data use, with focus on applications for retail supply chains. The authors’ findings suggest that Big Data use in retail supply chains is still generally elusive. Although most managers have reported initial, and in some cases some significant efforts in analyzing large sets of data for decision making, various challenges confine these data to a range of use spanning traditional, transactional data.

Meta-Analysis: A Structural Equation Modeling Approach

Presents a novel approach to conducting meta-analysis using structural equation modeling. Structural equation modeling (SEM) and meta-analysis are two powerful statistical methods in the educational, social, behavioral, and medical sciences. They are often treated as two unrelated topics in the literature. This book presents a unified framework on analyzing meta-analytic data within the SEM framework, and illustrates how to conduct meta-analysis using the metaSEM package in the R statistical environment. Meta-Analysis: A Structural Equation Modeling Approach begins by introducing the importance of SEM and meta-analysis in answering research questions. Key ideas in meta-analysis and SEM are briefly reviewed, and various meta-analytic models are then introduced and linked to the SEM framework. Fixed-, random-, and mixed-effects models in univariate and multivariate meta-analyses, three-level meta-analysis, and meta-analytic structural equation modeling, are introduced. Advanced topics, such as using restricted maximum likelihood estimation method and handling missing covariates, are also covered. Readers will learn a single framework to apply both meta-analysis and SEM. Examples in R and in Mplus are included. This book will be a valuable resource for statistical and academic researchers and graduate students carrying out meta-analyses, and will also be useful to researchers and statisticians using SEM in biostatistics. Basic knowledge of either SEM or meta-analysis will be helpful in understanding the materials in this book.

Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators

Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators provides a uniquely broad compendium of the key mathematical concepts and results that are relevant for the theoretical development of functional data analysis (FDA). The self-contained treatment of selected topics of functional analysis and operator theory includes reproducing kernel Hilbert spaces, singular value decomposition of compact operators on Hilbert spaces and perturbation theory for both self-adjoint and non self-adjoint operators. The probabilistic foundation for FDA is described from the perspective of random elements in Hilbert spaces as well as from the viewpoint of continuous time stochastic processes. Nonparametric estimation approaches including kernel and regularized smoothing are also introduced. These tools are then used to investigate the properties of estimators for the mean element, covariance operators, principal components, regression function and canonical correlations. A general treatment of canonical correlations in Hilbert spaces naturally leads to FDA formulations of factor analysis, regression, MANOVA and discriminant analysis. This book will provide a valuable reference for statisticians and other researchers interested in developing or understanding the mathematical aspects of FDA. It is also suitable for a graduate level special topics course.

Elementary Statistics Using SAS

Bridging the gap between statistics texts and SAS documentation, Elementary Statistics Using SAS is written for those who want to perform analyses to solve problems. The first section of the book explains the basics of SAS data sets and shows how to use SAS for descriptive statistics and graphs. The second section discusses fundamental statistical concepts, including normality and hypothesis testing. The remaining sections of the book show analyses for comparing two groups, comparing multiple groups, fitting regression equations, and exploring contingency tables. For each analysis, author Sandra Schlotzhauer explains assumptions, statistical approach, and SAS methods and syntax, and makes conclusions from the results. Statistical methods covered include two-sample t-tests, paired-difference t-tests, analysis of variance, multiple comparison techniques, regression, regression diagnostics, and chi-square tests. Elementary Statistics Using SAS is a thoroughly revised and updated edition of Ramon Littell and Sandra Schlotzhauer's SAS System for Elementary Statistical Analysis. This book is part of the SAS Press program.

Beyond Basic Statistics: Tips, Tricks, and Techniques Every Data Analyst Should Know

Features basic statistical concepts as a tool for thinking critically, wading through large quantities of information, and answering practical, everyday questions Written in an engaging and inviting manner, Beyond Basic Statistics: Tips, Tricks, and Techniques Every Data Analyst Should Know presents the more subjective side of statistics—the art of data analytics. Each chapter explores a different question using fun, common sense examples that illustrate the concepts, methods, and applications of statistical techniques. Without going into the specifics of theorems, propositions, or formulas, the book effectively demonstrates statistics as a useful problem-solving tool. In addition, the author demonstrates how statistics is a tool for thinking critically, wading through large volumes of information, and answering life's important questions. Beyond Basic Statistics: Tips, Tricks, and Techniques Every Data Analyst Should Know also features: Plentiful examples throughout aimed to strengthen readers' understanding of the statistical concepts and methods A step-by-step approach to elementary statistical topics such as sampling, hypothesis tests, outlier detection, normality tests, robust statistics, and multiple regression A case study in each chapter that illustrates the use of the presented techniques Highlights of well-known shortcomings that can lead to false conclusions An introduction to advanced techniques such as validation and bootstrapping Featuring examples that are engaging and non-application specific, the book appeals to a broad audience of students and professionals alike, specifically students of undergraduate statistics, managers, medical professionals, and anyone who has to make decisions based on raw data or compiled results.

Google Analytics Integrations

Get a complete view of your customers and make your marketing analysis more meaningful How well do you really know your customers? Find out with the help of expert author Daniel Waisberg and Google Analytics Integrations. This unique guide takes you well beyond the basics of using Google Analytics to track metrics, showing you how to transform this simple data collection tool into a powerful, central marketing analysis platform for your organization. You'll learn how Google AdWords, AdSense, CRMs, and other data sources can be used together to deliver actionable insights about your customers and their behavior. Explains proven techniques and best practices for collecting clean and accurate information from the start Shows you how to import your organization's marketing and customer data into Google Analytics Illustrates the importance of taking a holistic view of your customers and how this knowledge can transform your business Provides step-by-step guidance on using the latest analytical tools and services to gain a complete understanding of your customers, their needs, and what motivates them to take action Google Analytics Integration is your in-depth guide to improving your data integration, behavioral analysis, and ultimately, your bottom line.

Big Data

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built. About the Technology About the Book Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive. Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm, and NoSQL databases. What's Inside Introduction to big data systems Real-time processing of web-scale data Tools like Hadoop, Cassandra, and Storm Extensions to traditional database skills About the Reader This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful. About the Authors Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing. Quotes Transcends individual tools or platforms. Required reading for anyone working with big data systems. - Jonathan Esterhazy, Groupon A comprehensive, example-driven tour of the Lambda Architecture with its originator as your guide. - Mark Fisher, Pivotal Contains wisdom that can only be gathered after tackling many big data projects. A must-read. - Pere Ferrera Bertran, Datasalt The de facto guide to streamlining your data pipeline in batch and near-real time. - Alex Holmes, Author of "Hadoop in Practice"

DS8870 Data Migration Techniques

This IBM® Redbooks® publication describes data migrations between IBM DS8000® storage systems, where in most cases one or more older DS8000 models are being replaced by the newer DS8870 model. Most of the migration methods are based on the DS8000 Copy Services. The book includes considerations for solutions such as IBM Tivoli® Productivity Center for Replication and the IBM Geographically Dispersed Parallel Sysplex™ (GDPS®) used in IBM z/OS® environments. Both offerings are primarily designed to enable a disaster recovery using DS8000 Copy Services. In most data migration cases, Tivoli Productivity Center for Replication or GDPS will not directly provide functions for the data migration itself. However, this book explains how to bring the new migrated environment back into the control of GDPS or Tivoli Productivity Center for Replication. In addition to the Copy Services based migrations, the book also covers host-based mirroring techniques, using IBM Transparent Data Migration Facility (TDMF®) for z/OS and the z/OS Dataset Mobility Facility (zDMF).

PostgreSQL 9 Administration Cookbook - Second Edition

Master PostgreSQL 9.4 with this hands-on cookbook featuring over 150 practical and easy-to-follow recipes that will bring you up to speed with PostgreSQL's latest features. You'll learn how to create, manage, and optimize a PostgreSQL-based database, focusing on vital aspects like performance and reliability. What this Book will help me do Efficiently configure PostgreSQL databases for optimal performance. Deploy robust backup and recovery strategies to ensure data reliability. Utilize PostgreSQL's replication features for improved high availability. Implement advanced queries and analyze large datasets effectively. Optimize database structure and functionality for application needs. Author(s) Simon Riggs, Gianni Ciolli, and their co-authors are seasoned database professionals with extensive experience in PostgreSQL administration and development. They have a complementary blend of skills, comprising practical system knowledge, teaching, and authoritative writing. Their hands-on experience translates seamlessly into accessible yet informative content. Who is it for? This book is ideal for database administrators and developers who are looking to enhance their skills with PostgreSQL, especially version 9. If you have some prior experience with relational databases and want practical guidance on optimizing, managing, and mastering PostgreSQL, this resource is tailored for you.

Hadoop Essentials

In 'Hadoop Essentials,' you'll embark on an engaging journey to master the Hadoop ecosystem. This book covers fundamental to advanced topics, from HDFS and MapReduce to real-time analytics with Spark, empowering you to handle modern data challenges efficiently. What this Book will help me do Understand the core components of Hadoop, including HDFS, YARN, and MapReduce, for foundational knowledge. Learn to optimize Big Data architectures and improve application performance. Utilize tools like Hive and Pig for efficient data querying and processing. Master data ingestion technologies like Sqoop and Flume for seamless data management. Achieve fluency in real-time data analytics using modern tools like Apache Spark and Apache Storm. Author(s) None Achari is a seasoned expert in Big Data and distributed systems with in-depth knowledge of the Hadoop ecosystem. With years of experience in both development and teaching, they craft content that bridges practical know-how with theoretical insights in a highly accessible style. Who is it for? This book is perfect for system and application developers aiming to learn practical applications of Hadoop. It suits professionals seeking solutions to real-world Big Data challenges as well as those familiar with distributed systems basics and looking to deepen their expertise in advanced data analysis.

IBM z13 Configuration Setup

This IBM® Redbooks® publication helps you install, configure, and maintain the IBM z13™. The z13 offers new functions that require a comprehensive understanding of the available configuration options. This book presents configuration setup scenarios, and describes implementation examples in detail. This publication is intended for systems engineers, hardware planners, and anyone who needs to understand IBM z Systems™ configuration and implementation. Readers should be generally familiar with current IBM z Systems technology and terminology. For details about the functions of the z13, see IBM z13 Technical Introduction, SG24-8250 and IBM z13 Technical Guide, SG24-8251.