talk-data.com talk-data.com

Topic

data-science

2252

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

2252 activities · Newest first

Becoming a Data Head
book
by Jordan Goldmeier (Booz Allen Hamilton; The Perduco Group; EY; Excel TV; Wake Forest University; Anarchy Data) , Alex J. Gutman

"Turn yourself into a Data Head. You'll become a more valuable employee and make your organization more successful."Thomas H. Davenport, Research Fellow, Author of Competing on Analytics, Big Data @ Work, and The AI Advantage You've heard the hype around data—now get the facts. In Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it. You'll learn how to: Think statistically and understand the role variation plays in your life and decision making Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace Understand what's really going on with machine learning, text analytics, deep learning, and artificial intelligence Avoid common pitfalls when working with and interpreting data Becoming a Data Head is a complete guide for data science in the workplace: covering everything from the personalities you’ll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head—an active participant in data science, statistics, and machine learning. Whether you're a business professional, engineer, executive, or aspiring data scientist, this book is for you.

Business Forecasting

Discover the role of machine learning and artificial intelligence in business forecasting from some of the brightest minds in the field In Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning accomplished authors Michael Gilliland, Len Tashman, and Udo Sglavo deliver relevant and timely insights from some of the most important and influential authors in the field of forecasting. You'll learn about the role played by machine learning and AI in the forecasting process and discover brand-new research, case studies, and thoughtful discussions covering an array of practical topics. The book offers multiple perspectives on issues like monitoring forecast performance, forecasting process, communication and accountability for forecasts, and the use of big data in forecasting. You will find: Discussions on deep learning in forecasting, including current trends and challenges Explorations of neural network-based forecasting strategies A treatment of the future of artificial intelligence in business forecasting Analyses of forecasting methods, including modeling, selection, and monitoring In addition to the Foreword by renowned researchers Spyros Makridakis and Fotios Petropoulos, the book also includes 16 "opinion/editorial" Afterwords by a diverse range of top academics, consultants, vendors, and industry practitioners, each providing their own unique vision of the issues, current state, and future direction of business forecasting. Perfect for financial controllers, chief financial officers, business analysts, forecast analysts, and demand planners, Business Forecasting will also earn a place in the libraries of other executives and managers who seek a one-stop resource to help them critically assess and improve their own organization's forecasting efforts.

Exam Ref DA-100 Analyzing Data with Microsoft Power BI

Prepare for Microsoft Exam DA-100 and help demonstrate your real-world mastery of Power BI data analysis and visualization. Designed for experienced data analytics professionals ready to advance their status, Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified Associate level. Focus on the expertise measured by these objectives: Prepare the data Model the data Visualize the data Analyze the data Deploy and maintain deliverables This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you are an experienced business intelligence professional or data analyst, or have a similar role Analyzing Data with Microsoft Power BI About the Exam Exam DA-100 focuses on skills and knowledge needed to acquire, profile, clean, transform, and load data; design and develop data models; create measures with DAX; optimize model performance; create reports and dashboards; enrich reports for usability; enhance reports to expose insights; perform advanced analysis; manage datasets, and create and manage workspaces. About Microsoft Certification Passing this exam earns your Microsoft Certified: Data Analyst Associate certification, demonstrating your ability to help businesses maximize the value of data assets by using Microsoft Power BI. As subject matter experts, Data Analysts design and build scalable data models, clean and transform data, and enable advanced analytic capabilities that provide meaningful business value through easy-to-comprehend data visualizations. See full details at: microsoft.com/learn

Responsible Data Science

Explore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of “Black box” algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk of undue harm to vulnerable members of society. Both data science practitioners and managers of analytics teams will learn how to: Improve model transparency, even for black box models Diagnose bias and unfairness within models using multiple metrics Audit projects to ensure fairness and minimize the possibility of unintended harm Perfect for data science practitioners, Responsible Data Science will also earn a spot on the bookshelves of technically inclined managers, software developers, and statisticians.

A Gentle Introduction to Statistics Using SAS Studio in the Cloud

Point and click your way to performing statistics! Many people are intimidated by learning statistics, but A Gentle Introduction to Statistics Using SAS is here to help. Whether you need to perform statistical analysis for a project or, perhaps, for a course in education, psychology, sociology, economics, or any other field that requires basic statistical skills, this book teaches the fundamentals of statistics, from designing your experiment through calculating logistic regressions. Serving as an introduction to many common statistical tests and principles, it explains concepts in an intuitive way with little math and very few formulas. The book is full of examples demonstrating the use of SAS Studio’s easy point-and-click interface accessed with SAS OnDemand for Academics, an online delivery platform for teaching and learning statistical analysis that provides free access to SAS software via the cloud. Studio in the Cloud Topics included in this book are: How to access SAS OnDemand for Academics Descriptive statistics One-sample tests T tests (for independent or paired samples) One-way analysis of variance (ANOVA) N-way ANOVA Correlation analysis Simple and multiple linear regression Binary logistic regression Categorical data, including two-way tables and chi-square Power and sample size calculations Questions are provided to test your knowledge and practice your skills.

Statistical Learning for Big Dependent Data

Master advanced topics in the analysis of large, dynamically dependent datasets with this insightful resource Statistical Learning with Big Dependent Data delivers a comprehensive presentation of the statistical and machine learning methods useful for analyzing and forecasting large and dynamically dependent data sets. The book presents automatic procedures for modelling and forecasting large sets of time series data. Beginning with some visualization tools, the book discusses procedures and methods for finding outliers, clusters, and other types of heterogeneity in big dependent data. It then introduces various dimension reduction methods, including regularization and factor models such as regularized Lasso in the presence of dynamical dependence and dynamic factor models. The book also covers other forecasting procedures, including index models, partial least squares, boosting, and now-casting. It further presents machine-learning methods, including neural network, deep learning, classification and regression trees and random forests. Finally, procedures for modelling and forecasting spatio-temporal dependent data are also presented. Throughout the book, the advantages and disadvantages of the methods discussed are given. The book uses real-world examples to demonstrate applications, including use of many R packages. Finally, an R package associated with the book is available to assist readers in reproducing the analyses of examples and to facilitate real applications. Analysis of Big Dependent Data includes a wide variety of topics for modeling and understanding big dependent data, like: New ways to plot large sets of time series An automatic procedure to build univariate ARMA models for individual components of a large data set Powerful outlier detection procedures for large sets of related time series New methods for finding the number of clusters of time series and discrimination methods , including vector support machines, for time series Broad coverage of dynamic factor models including new representations and estimation methods for generalized dynamic factor models Discussion on the usefulness of lasso with time series and an evaluation of several machine learning procedure for forecasting large sets of time series Forecasting large sets of time series with exogenous variables, including discussions of index models, partial least squares, and boosting. Introduction of modern procedures for modeling and forecasting spatio-temporal data Perfect for PhD students and researchers in business, economics, engineering, and science: Statistical Learning with Big Dependent Data also belongs to the bookshelves of practitioners in these fields who hope to improve their understanding of statistical and machine learning methods for analyzing and forecasting big dependent data.

Mastering Shiny

Master the Shiny web framework—and take your R skills to a whole new level. By letting you move beyond static reports, Shiny helps you create fully interactive web apps for data analyses. Users will be able to jump between datasets, explore different subsets or facets of the data, run models with parameter values of their choosing, customize visualizations, and much more. Hadley Wickham from RStudio shows data scientists, data analysts, statisticians, and scientific researchers with no knowledge of HTML, CSS, or JavaScript how to create rich web apps from R. This in-depth guide provides a learning path that you can follow with confidence, as you go from a Shiny beginner to an expert developer who can write large, complex apps that are maintainable and performant. Get started: Discover how the major pieces of a Shiny app fit together Put Shiny in action: Explore Shiny functionality with a focus on code samples, example apps, and useful techniques Master reactivity: Go deep into the theory and practice of reactive programming and examine reactive graph components Apply best practices: Examine useful techniques for making your Shiny apps work well in production

Hands-On Data Analysis with Pandas - Second Edition

'Hands-On Data Analysis with Pandas' guides you to gain expertise in the Python pandas library for data analysis and manipulation. With practical, real-world examples, you'll learn to analyze datasets, visualize data trends, and implement machine learning models for actionable insights. What this Book will help me do Understand and implement data analysis techniques with Python. Develop expertise in data manipulation using pandas and NumPy. Visualize data effectively with pandas visualization tools and seaborn. Apply machine learning techniques with Python libraries. Combine datasets and handle complex data workflows efficiently. Author(s) Stefanie Molin is a software engineer and data scientist with extensive experience in analytics and Python. She has worked with large data-driven systems and has a strong focus on teaching data analysis effectively. Stefanie's books are known for their practical, hands-on approach to solving real data problems. Who is it for? This book is perfect for aspiring data scientists, data analysts, and Python developers. Readers with beginner to intermediate skill levels in Python will find it accessible and informative. It is designed for those seeking to build practical data analysis skills. If you're looking to add data science and pandas to your toolkit, this book is ideal.

CRAN Recipes: DPLYR, Stringr, Lubridate, and RegEx in R

Want to use the power of R sooner rather than later? Don’t have time to plow through wordy texts and online manuals? Use this book for quick, simple code to get your projects up and running. It includes code and examples applicable to many disciplines. Written in everyday language with a minimum of complexity, each chapter provides the building blocks you need to fit R’s astounding capabilities to your analytics, reporting, and visualization needs. CRAN Recipes recognizes how needless jargon and complexity get in your way. Busy professionals need simple examples and intuitive descriptions; side trips and meandering philosophical discussions are left for other books. Here R scripts are condensed, to the extent possible, to copy-paste-run format. Chapters and examples are structured to purpose rather than particular functions (e.g., “dirty data cleanup” rather than the R package name “janitor”). Everyday language eliminatesthe need to know functions/packages in advance. What You Will Learn Carry out input/output; visualizations; data munging; manipulations at the group level; and quick data exploration Handle forecasting (multivariate, time series, logistic regression, Facebook’s Prophet, and others) Use text analytics; sampling; financial analysis; and advanced pattern matching (regex) Manipulate data using DPLYR: filter, sort, summarize, add new fields to datasets, and apply powerful IF functions Create combinations or subsets of files using joins Write efficient code using pipes to eliminate intermediate steps (MAGRITTR) Work with string/character manipulation of all types (STRINGR) Discover counts, patterns, and how to locate whole words Do wild-card matching, extraction, and invert-match Work with dates using LUBRIDATE Fix dirty data; attractive formatting; bad habits to avoid Who This Book Is For Programmers/data scientists with at least some prior exposure to R.

Bootstrapping

Bootstrapping is a conceptually simple statistical technique to increase the quality of estimates, conduct robustness checks and compute standard errors for virtually any statistic. This book provides an intelligible and compact introduction for students, scientists and practitioners. It not only gives a clear explanation of the underlying concepts but also demonstrates the application of bootstrapping using Python and Stata.

Advancing into Analytics

Data analytics may seem daunting, but if you're an experienced Excel user, you have a unique head start. With this hands-on guide, intermediate Excel users will gain a solid understanding of analytics and the data stack. By the time you complete this book, you'll be able to conduct exploratory data analysis and hypothesis testing using a programming language. Exploring and testing relationships are core to analytics. By using the tools and frameworks in this book, you'll be well positioned to continue learning more advanced data analysis techniques. Author George Mount, founder and CEO of Stringfest Analytics, demonstrates key statistical concepts with spreadsheets, then pivots your existing knowledge about data manipulation into R and Python programming. This practical book guides you through: Foundations of analytics in Excel: Use Excel to test relationships between variables and build compelling demonstrations of important concepts in statistics and analytics From Excel to R: Cleanly transfer what you've learned about working with data from Excel to R From Excel to Python: Learn how to pivot your Excel data chops into Python and conduct a complete data analysis

Trino: The Definitive Guide

Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. With this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's Hive, Cassandra, a relational database, or a proprietary data store. Analysts, software engineers, and production engineers will learn how to manage, use, and even develop with Trino. Initially developed by Facebook, open source Trino is now used by Amazon, Google, LinkedIn, Lyft, Netflix, Pinterest, Salesforce, Shopify, and many other companies. Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization. Get started: Explore Trino's use cases and learn about tools that will help you connect to Trino and query data Go deeper: Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more Put Trino in production: Secure Trino, monitor workloads, tune queries, and connect more applications; learn how other organizations apply Trino

Hands-On Data Visualization

Tell your story and show it with data, using free and easy-to-learn tools on the web. This introductory book teaches you how to design interactive charts and customized maps for your website, beginning with simple drag-and-drop tools such as Google Sheets, Datawrapper, and Tableau Public. You'll also gradually learn how to edit open source code templates like Chart.js, Highcharts, and Leaflet on GitHub. Hands-On Data Visualization takes you step-by-step through tutorials, real-world examples, and online resources. This practical guide is ideal for students, nonprofit organizations, small business owners, local governments, journalists, academics, and anyone who wants to take data out of spreadsheets and turn it into lively interactive stories. No coding experience is required. Build interactive charts and maps and embed them in your website Understand the principles for designing effective charts and maps Learn key data visualization concepts to help you choose the right tools Convert and transform tabular and spatial data to tell your data story Edit and host Chart.js, Highcharts, and Leaflet map code templates on GitHub Learn how to detect bias in charts and maps produced by others

Data Science on AWS

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages

Master the syntax for working with R’s plotting functions in graphics and stats in this easy reference to formatting plots. The approach in Visualizing Data in R 4 toward the application of formatting in ggplot() will follow the structure of the formatting used by the plotting functions in graphics and stats. This book will take advantage of the new features added to R 4 where appropriate including a refreshed color palette for charts, Cairo graphics with more fonts/symbols, and improved performance from grid graphics including ggplot 2 rendering speed. Visualizing Data in R 4 starts with an introduction and then is split into two parts and six appendices. Part I covers the function plot() and the ancillary functions you can use with plot(). You’ll also see the functions par() and layout(), providing for multiple plots on a page. Part II goes over the basics of using the functions qplot() and ggplot() in the package ggplot2. The default plots generated by the functions qplot() and ggplot() give more sophisticated-looking plots than the default plots done by plot() and are easier to use, but the function plot() is more flexible. Both plot() and ggplot() allow for many layers to a plot. The six appendices will cover plots for contingency tables, plots for continuous variables, plots for data with a limited number of values, functions that generate multiple plots, plots for time series analysis, and some miscellaneous plots. Some of the functions that will be in the appendices include functions that generate histograms, bar charts, pie charts, box plots, and heatmaps. What You Will Learn Use R to create informative graphics Master plot(), qplot(), and ggplot() Discover the canned graphics functions in stats and graphics Format plots generated by plot() and ggplot() Who This Book Is For Those in data science who use R. Some prior experience with R or data science is recommended.

Automated Unit Testing with ABAP: A Practical Approach

Write automated unit tests for the ABAP language. This book teaches programmers using simple examples and metaphors and explains the underlying concepts of writing effective automated unit tests. Many, if not most, ABAP programmers learned their programming and testing skills before the ABAP development environment provided an automated unit testing facility. Automated Unit Testing with ABAP: A Practical Approach offers hope and salvation to ABAP programmers who continue to toil with antiquated manual unit testing processes, taking them by the hand and lifting them out of that dungeon of despair with a modern and proven alternative. It begins by explaining how the xUnit family of automated testing frameworks provides a quick and effective means of insuring high-quality software. It then focuses on the ABAP Unit Testing Facility, the xUnit framework applicable specifically to the ABAP language, showing how it can be used to bring ABAP applications underautomated testing control, from old legacy applications to those newly written. Whereas xUnit testing has been widely accepted with developers writing in many other programming languages, it is an unfortunate fact in the ABAP community that many programmers still are unfamiliar with xUnit concepts and do not know how to begin implementing automated unit testing into their development process. This book demonstrates how to refactor programs so they become designed for testability, showing how to use process encapsulation and test isolation to facilitate automated testing, including a thorough explanation of test-driven development and the use of test doubles. The book: Shows how to write automated unit tests for ABAP Instills ABAP programmers with the confidence to refactor poorly written code Explains how an automated testing harness facilitates rapid software development Teaches how to utilize test-driven development (TDD) withABAP Offers advice and tips on the best ways to write automated unit tests What You Will Learn Become familiar with the xUnit approach to testing Know the ABAP statements that interfere with running automated unit tests and how to accommodate them Understand what it means to isolate code for testing and how this is achieved Gain the confidence to refactor poorly written code Make ABAP programs designed for testability Reap the benefits of spending less time manually unit testing ABAP programs Use test-driven development (TDD) with ABAP programming Use configurable test doubles in ABAP Who This Book Is For ABAP programmers who remain unfamiliar with the automated unit testing facility and those who already use it butwant to improve their skill writing and using automated tests. The book addresses the reluctance and trepidation felt by procedural ABAP programmers who need to know some object-oriented concepts to use this facility, expands their horizons, and helps them step through the doorway leading to a different approach to program design.

Cleaning Data for Effective Data Science

Dive into the intricacies of data cleaning, a crucial aspect of any data science and machine learning pipeline, with 'Cleaning Data for Effective Data Science.' This comprehensive guide walks you through tools and methodologies like Python, R, and command-line utilities to prepare raw data for analysis. Learn practical strategies to manage, clean, and refine data encountered in the real world. What this Book will help me do Understand and utilize various data formats such as JSON, SQL, and PDF for data ingestion and processing. Master key tools like pandas, SciPy, and Tidyverse to manipulate and analyze datasets efficiently. Develop heuristics and methodologies for assessing data quality, detecting bias, and identifying irregularities. Apply advanced techniques like feature engineering and statistical adjustments to enhance data usability. Gain confidence in handling time series data by employing methods for de-trending and interpolating missing values. Author(s) David Mertz has years of experience as a Python programmer and data scientist. Known for his engaging and accessible teaching style, David has authored numerous technical articles and books. He emphasizes not only the technicalities of data science tools but also the critical thinking that approaches solutions creatively and effectively. Who is it for? 'Cleaning Data for Effective Data Science' is designed for data scientists, software developers, and educators dealing with data preparation. Whether you're an aspiring data enthusiast or an experienced professional looking to refine your skills, this book provides essential tools and frameworks. Prior programming knowledge, particularly in Python or R, coupled with an understanding of statistical fundamentals, will help you make the most of this resource.

IBM SPSS Essentials, 2nd Edition

Master the fundamentals of SPSS with this newly updated and instructive resource The newly and thoroughly revised Second Edition of SPSS Essentials delivers a comprehensive guide for students in the social sciences who wish to learn how to use the Statistical Package for the Social Sciences (SPSS) for the effective collection, management, and analysis of data. The accomplished researchers and authors provide readers with the practical nuts and bolts of SPSS usage and data entry, with a particular emphasis on managing and manipulating data. The book offers an introduction to SPSS, how to navigate it, and a discussion of how to understand the data the reader is working with. It also covers inferential statistics, including topics like hypothesis testing, one-sample Z-testing, T-testing, ANOVAs, correlations, and regression. Five unique appendices round out the text, providing readers with discussions of dealing with real-world data, troubleshooting, advanced data manipulations, and new workbook activities. SPSS Essentials offers a wide variety of features, including: A revised chapter order, designed to match the pacing and content of typical undergraduate statistics classes An explanation of when particular inferential statistics are appropriate for use, given the nature of the data being worked with Additional material on understanding your data sample, including discussions of SPSS output and how to find the most relevant information A companion website offering additional problem sets, complete with answers Perfect for undergraduate students of the social sciences who are just getting started with SPSS, SPSS Essentials also belongs on the bookshelves of advanced placement high school students and practitioners in social science who want to brush up on the fundamentals of this powerful and flexible software package.