talk-data.com talk-data.com

Topic

data-science-tasks

794

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Science Books ×
Essential Statistics for Non-STEM Data Analysts

Essential Statistics for Non-STEM Data Analysts is your comprehensive guide to mastering the statistical concepts needed for data science. By working through real-world datasets and Python-based examples, you'll learn how to interpret data and build insightful analyses. This book demystifies statistics, making it accessible to anyone aiming to become proficient in data analysis. What this Book will help me do Learn how to preprocess, clean, and prepare data for analysis using Python. Master the foundations of statistical methods such as hypothesis testing and probability theory. Develop skills to interpret and explain statistical results in the context of data science. Understand how statistical concepts apply to machine learning tasks like classification and regression. Build confidence in statistical principles to tackle interviews and enhance your career prospects. Author(s) None Li is an experienced data scientist and educator with a strong focus on making abstract statistical concepts intuitive and applicable. With a background in designing data science curriculums, None has a passion for teaching statistics to individuals from diverse and often non-mathematical backgrounds. Through clear explanations and practical examples, None aims to empower everyone to excel in data analysis and machine learning. Who is it for? This book caters specifically to data analysts, data science enthusiasts, and developers eager to enhance their statistical knowledge. It's crafted for readers transitioning into data science who may lack a strong mathematical or statistics background. If you have a basic grasp of Python programming and a keen interest in understanding how to work effectively with data, this book is a perfect fit. Beginners and students aiming to familiarize themselves with statistical foundations for data-oriented careers will greatly benefit from this resource.

Discrete Networked Dynamic Systems

Discrete Networked Dynamic Systems: Analysis and Performance provides a high-level treatment of a general class of linear discrete-time dynamic systems interconnected over an information network, exchanging relative state measurements or output measurements. It presents a systematic analysis of the material and provides an account to the math development in a unified way. The topics in this book are structured along four dimensions: Agent, Environment, Interaction, and Organization, while keeping global (system-centered) and local (agent-centered) viewpoints. The focus is on the wide-sense consensus problem in discrete networked dynamic systems. The authors rely heavily on algebraic graph theory and topology to derive their results. It is known that graphs play an important role in the analysis of interactions between multiagent/distributed systems. Graph-theoretic analysis provides insight into how topological interactions play a role in achieving coordination among agents. Numerous types of graphs exist in the literature, depending on the edge set of G. A simple graph has no self-loop or edges. Complete graphs are simple graphs with an edge connecting any pair of vertices. The vertex set in a bipartite graph can be partitioned into disjoint non-empty vertex sets, whereby there is an edge connecting every vertex in one set to every vertex in the other set. Random graphs have fixed vertex sets, but the edge set exhibits stochastic behavior modeled by probability functions. Much of the studies in coordination control are based on deterministic/fixed graphs, switching graphs, and random graphs. This book addresses advanced analytical tools for characterization control, estimation and design of networked dynamic systems over fixed, probabilistic and time-varying graphs Provides coherent results on adopting a set-theoretic framework for critically examining problems of the analysis, performance and design of discrete distributed systems over graphs Deals with both homogeneous and heterogeneous systems to guarantee the generality of design results

Stochastic Dynamics of Economic Cycles

This book includes discussions related to solutions of such tasks as: probabilistic description of the investment function; recovering the income function from GDP estimates; development of models for the economic cycles; selecting the time interval of pseudo-stationarity of cycles; estimating characteristics/parameters of cycle models; analysis of accuracy of model factors. All of the above constitute the general principles of a theory explaining the phenomenon of economic cycles and provide mathematical tools for their quantitative description. The introduced theory is applicable to macroeconomic analyses as well as econometric estimations of economic cycles.

Creating Good Data: A Guide to Dataset Structure and Data Representation

Create good data from the start, rather than fixing it after it is collected. By following the guidelines in this book, you will be able to conduct more effective analyses and produce timely presentations of research data. Data analysts are often presented with datasets for exploration and study that are poorly designed, leading to difficulties in interpretation and to delays in producing meaningful results. Much data analytics training focuses on how to clean and transform datasets before serious analyses can even be started. Inappropriate or confusing representations, unit of measurement choices, coding errors, missing values, outliers, etc., can be avoided by using good dataset design and by understanding how data types determine the kinds of analyses which can be performed. This book discusses the principles and best practices of dataset creation, and covers basic data types and their related appropriate statistics and visualizations. A key focus of the book is why certain data types are chosen for representing concepts and measurements, in contrast to the typical discussions of how to analyze a specific data type once it has been selected. What You Will Learn Be aware of the principles of creating and collecting data Know the basic data types and representations Select data types, anticipating analysis goals Understand dataset structures and practices for analyzing and sharing Be guided by examples and use cases (good and bad) Use cleaning tools and methods to create good data Who This Book Is For Researchers who design studies and collect data and subsequently conduct and report the results of their analyses can use the best practices in this book to produce better descriptions and interpretations of their work. In addition, data analysts who explore and explain data of other researchers will be able to create better datasets.

Tableau Prep: Up & Running

For self-service data preparation, Tableau Prep is relatively easy to use—as long as you know how to clean and organize your datasets. Carl Allchin, from The Information Lab in London, gets you up to speed on Tableau Prep through a series of practical lessons that include methods for preparing, cleaning, automating, organizing, and outputting your datasets. Based on Allchin’s popular blog, Preppin’ Data, this practical guide takes you step-by-step through Tableau Prep’s fundamentals. Self-service data preparation reduces the time it takes to complete data projects and improves the quality of your analyses. Discover how Tableau Prep helps you access your data and turn it into valuable information. Know what to look for when you prepare data Learn which Tableau Prep functions to use when working with data fields Analyze the shape and profile of your dataset Output data for analysis and learn how Tableau Prep automates your workflow Learn how to clean your dataset using Tableau Prep functions Explore ways to use Tableau Prep techniques in real-world scenarios Make your data available to others by managing and documenting the output

Statistical Thinking, 3rd Edition

Apply statistics in business to achieve performance improvement Statistical Thinking: Improving Business Performance, 3rd Edition helps managers understand the role of statistics in implementing business improvements. It guides professionals who are learning statistics in order to improve performance in business and industry. It also helps graduate and undergraduate students understand the strategic value of data and statistics in arriving at real business solutions. Instruction in the book is based on principles of effective learning, established by educational and behavioral research. The authors cover both practical examples and underlying theory, both the big picture and necessary details. Readers gain a conceptual understanding and the ability to perform actionable analyses. They are introduced to data skills to improve business processes, including collecting the appropriate data, identifying existing data limitations, and analyzing data graphically. The authors also provide an in-depth look at JMP software, including its purpose, capabilities, and techniques for use. Updates to this edition include: A new chapter on data, assessing data pedigree (quality), and acquisition tools Discussion of the relationship between statistical thinking and data science Explanation of the proper role and interpretation of p-values (understanding of the dangers of “p-hacking”) Differentiation between practical and statistical significance Introduction of the emerging discipline of statistical engineering Explanation of the proper role of subject matter theory in order to identify causal relationships A holistic framework for variation that includes outliers, in addition to systematic and random variation Revised chapters based on significant teaching experience Content enhancements based on student input This book helps readers understand the role of statistics in business before they embark on learning statistical techniques.

SPSS Statistics For Dummies, 4th Edition

The fun and friendly guide to mastering IBM’s Statistical Package for the Social Sciences Written by an author team with a combined 55 years of experience using SPSS, this updated guide takes the guesswork out of the subject and helps you get the most out of using the leader in predictive analysis. Covering the latest release and updates to SPSS 27.0, and including more than 150 pages of basic statistical theory, it helps you understand the mechanics behind the calculations, perform predictive analysis, produce informative graphs, and more. You’ll even dabble in programming as you expand SPSS functionality to suit your specific needs. Master the fundamental mechanics of SPSS Learn how to get data into and out of the program Graph and analyze your data more accurately and efficiently Program SPSS with Command Syntax Get ready to start handling data like a pro—with step-by-step instruction and expert advice!

Learning Tableau 2020 - Fourth Edition

"Learning Tableau 2020" is a comprehensive resource designed to strengthen your understanding of Tableau. It takes you from mastering the fundamentals to achieving proficiency in advanced visualization and data handling techniques. Through this book, you will gain the ability to create impactful data visualizations and interactive dashboards, effectively leveraging the capabilities of Tableau 2020. What this Book will help me do Effectively utilize Tableau 2020 features to develop data visualizations and dashboards. Apply advanced Tableau techniques, such as LOD and table calculations, to solve complex data analysis problems. Clean and structure data using Tableau Prep, enhancing data quality and reliability. Incorporate mapping and geospatial visualization for geographic data insights. Master storytelling with data by constructing engaging and interactive dashboards. Author(s) Joshua N. Milligan, the author of "Learning Tableau 2020," is an experienced Tableau training consultant and professional. With extensive years in the data visualization and analytics field, Joshua brings a practical perspective to the book. He excels at breaking down complex topics into accessible learning paths, making advanced Tableau concepts approachable for learners of all levels. Who is it for? This book is perfect for aspiring data analysts, IT professionals, and data enthusiasts who aim to understand and create compelling business intelligence reports. Beginners in Tableau will find the learning process straightforward due to its structured and incremental lessons. Advanced users can refine their skills with the wide range of complex examples covered. A basic familiarity with working with data is beneficial, though not required.

Hands-on Time Series Analysis with Python: From Basics to Bleeding Edge Techniques

Learn the concepts of time series from traditional to bleeding-edge techniques. This book uses comprehensive examples to clearly illustrate statistical approaches and methods of analyzing time series data and its utilization in the real world. All the code is available in Jupyter notebooks. You'll begin by reviewing time series fundamentals, the structure of time series data, pre-processing, and how to craft the features through data wrangling. Next, you'll look at traditional time series techniques like ARMA, SARIMAX, VAR, and VARMA using trending framework like StatsModels and pmdarima. The book also explains building classification models using sktime, and covers advanced deep learning-based techniques like ANN, CNN, RNN, LSTM, GRU and Autoencoder to solve time series problem using Tensorflow. It concludes by explaining the popular framework fbprophet for modeling time series analysis. After reading Hands-On Time Series Analysis with Python, you'll be able to apply these new techniques in industries, such as oil and gas, robotics, manufacturing, government, banking, retail, healthcare, and more. What You'll Learn: · Explains basics to advanced concepts of time series · How to design, develop, train, and validate time-series methodologies · What are smoothing, ARMA, ARIMA, SARIMA,SRIMAX, VAR, VARMA techniques in time series and how to optimally tune parameters to yield best results · Learn how to leverage bleeding-edge techniques such as ANN, CNN, RNN, LSTM, GRU, Autoencoder to solve both Univariate and multivariate problems by using two types of data preparation methods for time series. · Univariate and multivariate problem solving using fbprophet. Who This Book Is For Data scientists, data analysts, financial analysts, and stock market researchers

Intelligent Data Analysis
  This book focuses on methods and tools for intelligent data analysis, aimed at narrowing the increasing gap between data gathering and data comprehension, and emphasis will also be given to solving of problems which result from automated data collection, such as analysis of computer-based patient records, data warehousing tools, intelligent alarming, effective and efficient monitoring, and so on. This book aims to describe the different approaches of Intelligent Data Analysis from a practical point of view: solving common life problems with data analysis tools.
Learn Grafana 7.0

"Learn Grafana 7.0" is the ultimate beginner's guide to leveraging Grafana's capabilities for analytics and interactive dashboards. You'll master real-time data monitoring, visualization, and learn how to query and explore metrics with a hands-on approach to Grafana 7.0's new features. What this Book will help me do Learn to install and configure Grafana from scratch, preparing you for real-world data analysis tasks. Navigate and utilize the Graph panel in Grafana effectively, ensuring clear and actionable visual insights. Incorporate advanced dashboard features such as annotations, templates, and links to enhance data monitoring. Integrate Grafana with major cloud providers like AWS and Azure for robust monitoring solutions. Implement secure user authentication and fine-tuned permissions for managing teams and sharing insights safely. Author(s) None Salituro, the author of "Learn Grafana 7.0," is an experienced data visualization expert with years of experience in software development and analytics. Salituro focuses on creating understandable and accessible resources for developers and analysts of all skill levels, bringing a hands-on practical approach to technical learning. Who is it for? This book is perfect for data analysts, business intelligence developers, and administrators looking to build skills in data visualization and monitoring with Grafana 7.0. If you're eager to create interactive dashboards and learn practical applications of Grafana's features, this book is for you. Beginners to Grafana are fully accommodated, though familiarity with data visualization principles is beneficial. For those seeking to monitor cloud services like AWS with Grafana, this book is indispensable.

Data Analysis and Applications 3, 3rd Edition

Data analysis as an area of importance has grown exponentially, especially during the past couple of decades. This can be attributed to a rapidly growing computer industry and the wide applicability of computational techniques, in conjunction with new advances of analytic tools. This being the case, the need for literature that addresses this is self-evident. New publications are appearing, covering the need for information from all fields of science and engineering, thanks to the universal relevance of data analysis and statistics packages. This book is a collective work by a number of leading scientists, analysts, engineers, mathematicians and statisticians who have been working at the forefront of data analysis. The chapters included in this volume represent a cross-section of current concerns and research interests in these scientific areas. The material is divided into two parts: Computational Data Analysis, and Classification Data Analysis, with methods for both - providing the reader with both theoretical and applied information on data analysis methods, models and techniques and appropriate applications.

Modern Data Mining Algorithms in C++ and CUDA C: Recent Developments in Feature Extraction and Selection Algorithms for Data Science

Discover a variety of data-mining algorithms that are useful for selecting small sets of important features from among unwieldy masses of candidates, or extracting useful features from measured variables. As a serious data miner you will often be faced with thousands of candidate features for your prediction or classification application, with most of the features being of little or no value. You’ll know that many of these features may be useful only in combination with certain other features while being practically worthless alone or in combination with most others. Some features may have enormous predictive power, but only within a small, specialized area of the feature space. The problems that plague modern data miners are endless. This book helps you solve this problem by presenting modern feature selection techniques and the code to implement them. Some of these techniques are: Forward selection component analysis Local feature selection Linking features and a target with a hidden Markov model Improvements on traditional stepwise selection Nominal-to-ordinal conversion All algorithms are intuitively justified and supported by the relevant equations and explanatory material. The author also presents and explains complete, highly commented source code. The example code is in C++ and CUDA C but Python or other code can be substituted; the algorithm is important, not the code that's used to write it. What You Will Learn Combine principal component analysis with forward and backward stepwise selection to identify a compact subset of a large collection of variables that captures the maximum possible variation within the entire set. Identify features that may have predictive power over only a small subset of the feature domain. Such features can be profitably used by modern predictive models but may be missed by other feature selection methods. Find an underlying hidden Markov model that controls the distributions of feature variables and the target simultaneously. The memory inherent in this method is especially valuable in high-noise applications such as prediction of financial markets. Improve traditional stepwise selection in three ways: examine a collection of 'best-so-far' feature sets; test candidate features for inclusion with cross validation to automatically and effectively limit model complexity; and at each step estimate the probability that our results so far could be just the product of random good luck. We also estimate the probability that the improvement obtained by adding a new variable could have been just good luck. Take a potentially valuable nominal variable (a category or class membership) that is unsuitable for input to a prediction model, and assign to each category a sensible numeric value that can be used as a model input. Who This Book Is For Intermediate to advanced data science programmers and analysts.

Practical Synthetic Data Generation

Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data—fake data generated from real data—so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue. Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution. This book describes: Steps for generating synthetic data using multivariate normal distributions Methods for distribution fitting covering different goodness-of-fit metrics How to replicate the simple structure of original data An approach for modeling data structure to consider complex relationships Multiple approaches and metrics you can use to assess data utility How analysis performed on real data can be replicated with synthetic data Privacy implications of synthetic data and methods to assess identity disclosure

Forensic Analytics, 2nd Edition

Become the forensic analytics expert in your organization using effective and efficient data analysis tests to find anomalies, biases, and potential fraud—the updated new edition Forensic Analytics reviews the methods and techniques that forensic accountants can use to detect intentional and unintentional errors, fraud, and biases. This updated second edition shows accountants and auditors how analyzing their corporate or public sector data can highlight transactions, balances, or subsets of transactions or balances in need of attention. These tests are made up of a set of initial high-level overview tests followed by a series of more focused tests. These focused tests use a variety of quantitative methods including Benford’s Law, outlier detection, the detection of duplicates, a comparison to benchmarks, time-series methods, risk-scoring, and sometimes simply statistical logic. The tests in the new edition include the newly developed vector variation score that quantifies the change in an array of data from one period to the next. The goals of the tests are to either produce a small sample of suspicious transactions, a small set of transaction groups, or a risk score related to individual transactions or a group of items. The new edition includes over two hundred figures. Each chapter, where applicable, includes one or more cases showing how the tests under discussion could have detected the fraud or anomalies. The new edition also includes two chapters each describing multi-million-dollar fraud schemes and the insights that can be learned from those examples. These interesting real-world examples help to make the text accessible and understandable for accounting professionals and accounting students without rigorous backgrounds in mathematics and statistics. Emphasizing practical applications, the new edition shows how to use either Excel or Access to run these analytics tests. The book also has some coverage on using Minitab, IDEA, R, and Tableau to run forensic-focused tests. The use of SAS and Power BI rounds out the software coverage. The software screenshots use the latest versions of the software available at the time of writing. This authoritative book: Describes the use of statistically-based techniques including Benford’s Law, descriptive statistics, and the vector variation score to detect errors and anomalies Shows how to run most of the tests in Access and Excel, and other data analysis software packages for a small sample of the tests Applies the tests under review in each chapter to the same purchasing card data from a government entity Includes interesting cases studies throughout that are linked to the tests being reviewed. Includes two comprehensive case studies where data analytics could have detected the frauds before they reached multi-million-dollar levels Includes a continually-updated companion website with the data sets used in the chapters, the queries used in the chapters, extra coverage of some topics or cases, end of chapter questions, and end of chapter cases. Written by a prominent educator and researcher in forensic accounting and auditing, the new edition of Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations is an essential resource for forensic accountants, auditors, comptrollers, fraud investigators, and graduate students.

Innovative Tableau

Level up with Tableau to build eye-catching, easy-to-interpret data visualizations. In this follow-up guide to Practical Tableau, author Ryan Sleeper takes you through a collection of unique tips and tutorials for using this popular software. Beginning to advanced Tableau users will learn how to go beyond Show Me to make better charts and learn dozens of tricks to improve both the author and user experience. Featuring many approaches he developed himself, Ryan shows you how to create charts that empower Tableau users to explore, understand, and derive value from their data. He also shares many of his favorite tricks that enabled him to become a Tableau Zen Master, Tableau Public Visualization of the Year author, and Tableau Global Iron Viz Champion. Learn what’s new in Tableau since Practical Tableau was released Examine unique new charts—timelines, custom gauges, and leapfrog charts—plus innovations to traditional charts such as highlight tables, scatter plots, and maps Get tips that can help make a Tableau developer’s life easier Understand what developers can do to make users’ lives easier

Practical Statistics for Data Scientists, 2nd Edition

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you’ll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher-quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that "learn" from data Unsupervised learning methods for extracting meaning from unlabeled data