O'Reilly Data Science Books

Modern Data Mining Algorithms in C++ and CUDA C: Recent Developments in Feature Extraction and Selection Algorithms for Data Science

2020-06-05 O'Reilly Amazon

book

Timothy Masters

data data-science data-science-tasks web-scraping Data Science Python

Discover a variety of data-mining algorithms that are useful for selecting small sets of important features from among unwieldy masses of candidates, or extracting useful features from measured variables. As a serious data miner you will often be faced with thousands of candidate features for your prediction or classification application, with most of the features being of little or no value. You’ll know that many of these features may be useful only in combination with certain other features while being practically worthless alone or in combination with most others. Some features may have enormous predictive power, but only within a small, specialized area of the feature space. The problems that plague modern data miners are endless. This book helps you solve this problem by presenting modern feature selection techniques and the code to implement them. Some of these techniques are: Forward selection component analysis Local feature selection Linking features and a target with a hidden Markov model Improvements on traditional stepwise selection Nominal-to-ordinal conversion All algorithms are intuitively justified and supported by the relevant equations and explanatory material. The author also presents and explains complete, highly commented source code. The example code is in C++ and CUDA C but Python or other code can be substituted; the algorithm is important, not the code that's used to write it. What You Will Learn Combine principal component analysis with forward and backward stepwise selection to identify a compact subset of a large collection of variables that captures the maximum possible variation within the entire set. Identify features that may have predictive power over only a small subset of the feature domain. Such features can be profitably used by modern predictive models but may be missed by other feature selection methods. Find an underlying hidden Markov model that controls the distributions of feature variables and the target simultaneously. The memory inherent in this method is especially valuable in high-noise applications such as prediction of financial markets. Improve traditional stepwise selection in three ways: examine a collection of 'best-so-far' feature sets; test candidate features for inclusion with cross validation to automatically and effectively limit model complexity; and at each step estimate the probability that our results so far could be just the product of random good luck. We also estimate the probability that the improvement obtained by adding a new variable could have been just good luck. Take a potentially valuable nominal variable (a category or class membership) that is unsuitable for input to a prediction model, and assign to each category a sensible numeric value that can be used as a model input. Who This Book Is For Intermediate to advanced data science programmers and analysts.

Practical Synthetic Data Generation

2020-05-19 O'Reilly Amazon

book

Khaled El Emam , Lucy Mosquera , Richard Hoptroff

data data-science data-science-tasks data-wrangling-preparation-cleaning data wrangling, preparation, cleaning AI/ML

Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data—fake data generated from real data—so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue. Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution. This book describes: Steps for generating synthetic data using multivariate normal distributions Methods for distribution fitting covering different goodness-of-fit metrics How to replicate the simple structure of original data An approach for modeling data structure to consider complex relationships Multiple approaches and metrics you can use to assess data utility How analysis performed on real data can be replicated with synthetic data Privacy implications of synthetic data and methods to assess identity disclosure

Forensic Analytics, 2nd Edition

2020-05-12 O'Reilly Amazon

book

Mark J. Nigrini

data data-science data-science-tasks statistics Analytics BI

Become the forensic analytics expert in your organization using effective and efficient data analysis tests to find anomalies, biases, and potential fraud—the updated new edition Forensic Analytics reviews the methods and techniques that forensic accountants can use to detect intentional and unintentional errors, fraud, and biases. This updated second edition shows accountants and auditors how analyzing their corporate or public sector data can highlight transactions, balances, or subsets of transactions or balances in need of attention. These tests are made up of a set of initial high-level overview tests followed by a series of more focused tests. These focused tests use a variety of quantitative methods including Benford’s Law, outlier detection, the detection of duplicates, a comparison to benchmarks, time-series methods, risk-scoring, and sometimes simply statistical logic. The tests in the new edition include the newly developed vector variation score that quantifies the change in an array of data from one period to the next. The goals of the tests are to either produce a small sample of suspicious transactions, a small set of transaction groups, or a risk score related to individual transactions or a group of items. The new edition includes over two hundred figures. Each chapter, where applicable, includes one or more cases showing how the tests under discussion could have detected the fraud or anomalies. The new edition also includes two chapters each describing multi-million-dollar fraud schemes and the insights that can be learned from those examples. These interesting real-world examples help to make the text accessible and understandable for accounting professionals and accounting students without rigorous backgrounds in mathematics and statistics. Emphasizing practical applications, the new edition shows how to use either Excel or Access to run these analytics tests. The book also has some coverage on using Minitab, IDEA, R, and Tableau to run forensic-focused tests. The use of SAS and Power BI rounds out the software coverage. The software screenshots use the latest versions of the software available at the time of writing. This authoritative book: Describes the use of statistically-based techniques including Benford’s Law, descriptive statistics, and the vector variation score to detect errors and anomalies Shows how to run most of the tests in Access and Excel, and other data analysis software packages for a small sample of the tests Applies the tests under review in each chapter to the same purchasing card data from a government entity Includes interesting cases studies throughout that are linked to the tests being reviewed. Includes two comprehensive case studies where data analytics could have detected the frauds before they reached multi-million-dollar levels Includes a continually-updated companion website with the data sets used in the chapters, the queries used in the chapters, extra coverage of some topics or cases, end of chapter questions, and end of chapter cases. Written by a prominent educator and researcher in forensic accounting and auditing, the new edition of Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations is an essential resource for forensic accountants, auditors, comptrollers, fraud investigators, and graduate students.

Innovative Tableau

2020-05-12 O'Reilly Amazon

book

Ryan Sleeper

data data-science data-science-tasks data-visualization Tableau

Level up with Tableau to build eye-catching, easy-to-interpret data visualizations. In this follow-up guide to Practical Tableau, author Ryan Sleeper takes you through a collection of unique tips and tutorials for using this popular software. Beginning to advanced Tableau users will learn how to go beyond Show Me to make better charts and learn dozens of tricks to improve both the author and user experience. Featuring many approaches he developed himself, Ryan shows you how to create charts that empower Tableau users to explore, understand, and derive value from their data. He also shares many of his favorite tricks that enabled him to become a Tableau Zen Master, Tableau Public Visualization of the Year author, and Tableau Global Iron Viz Champion. Learn what’s new in Tableau since Practical Tableau was released Examine unique new charts—timelines, custom gauges, and leapfrog charts—plus innovations to traditional charts such as highlight tables, scatter plots, and maps Get tips that can help make a Tableau developer’s life easier Understand what developers can do to make users’ lives easier

Practical Statistics for Data Scientists, 2nd Edition

2020-05-11 O'Reilly Amazon

book

Andrew Bruce , Peter Bruce , Peter Gedeck

data data-science data-science-tasks statistics AI/ML Big Data

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you’ll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher-quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that "learn" from data Unsupervised learning methods for extracting meaning from unlabeled data

End-to-end Data Analytics for Product Development

2020-04-06 O'Reilly Amazon

book

Mattia De Dominicis , Chris Jones , Luigi Salmaso , Rosa Arboretti Giancristofaro

data data-science data-science-tasks exploratory-data-analysis Analytics Data Analytics

An interactive guide to the statistical tools used to solve problems during product and process innovation End to End Data Analytics for Product Development is an accessible guide designed for practitioners in the industrial field. It offers an introduction to data analytics and the design of experiments (DoE) whilst covering the basic statistical concepts useful to an understanding of DoE. The text supports product innovation and development across a range of consumer goods and pharmaceutical organizations in order to improve the quality and speed of implementation through data analytics, statistical design and data prediction. The book reviews information on feasibility screening, formulation and packaging development, sensory tests, and more. The authors – noted experts in the field – explore relevant techniques for data analytics and present the guidelines for data interpretation. In addition, the book contains information on process development and product validation that can be optimized through data understanding, analysis and validation. The authors present an accessible, hands-on approach that uses MINITAB and JMP software. The book: • Presents a guide to innovation feasibility and formulation and process development • Contains the statistical tools used to solve challenges faced during product innovation and feasibility • Offers information on stability studies which are common especially in chemical or pharmaceutical fields • Includes a companion website which contains videos summarizing main concepts Written for undergraduate students and practitioners in industry, End to End Data Analytics for Product Development offers resources for the planning, conducting, analyzing and interpreting of controlled tests in order to develop effective products and processes.

The Practitioner's Guide to Graph Data

2020-03-26 O'Reilly Amazon

book

Denise Gosnell , Matthias Broecheler

data data-science data-science-tasks graph-analytics

Graph data closes the gap between the way humans and computers view the world. While computers rely on static rows and columns of data, people navigate and reason about life through relationships. This practical guide demonstrates how graph data brings these two approaches together. By working with concepts from graph theory, database schema, distributed systems, and data analysis, you’ll arrive at a unique intersection known as graph thinking. Authors Denise Koessler Gosnell and Matthias Broecheler show data engineers, data scientists, and data analysts how to solve complex problems with graph databases. You’ll explore templates for building with graph technology, along with examples that demonstrate how teams think about graph data within an application. Build an example application architecture with relational and graph technologies Use graph technology to build a Customer 360 application, the most popular graph data pattern today Dive into hierarchical data and troubleshoot a new paradigm that comes from working with graph data Find paths in graph data and learn why your trust in different paths motivates and informs your preferences Use collaborative filtering to design a Netflix-inspired recommendation system

Statistical Rethinking, 2nd Edition

2020-03-13 O'Reilly Amazon

book

Richard McElreath

data data-science data-science-tasks statistics

Statistical Rethinking: A Bayesian Course with Examples in R and Stan, Second Edition builds knowledge/confidence in statistical modeling. Pushes readers to perform step-by-step calculations (usually automated.) Unique, computational approach.

Practical Highcharts with Angular: Your Essential Guide to Creating Real-time Dashboards

2020-02-28 O'Reilly Amazon

book

Sourabh Mishra

data data-science data-science-tasks data-visualization dashboards Dashboard

Learn to create stunning animated and interactive charts using Highcharts and Angular. Use and build on your existing knowledge of HTML, CSS, and JavaScript to develop impressive dashboards that will work in all modern browsers. You will learn how to use Highcharts, call backend services for data, and easily construct real-time data dashboards. You'll also learn how you can club your code with jQuery and Angular. This book provides the best solutions for real-time challenges and covers a wide range of charts including line, area, maps, plot, different types of pie chart, Gauge, heat map, Histogram, stacked bar, scatter plot and 3d charts. After reading this book, you'll be able to export your charts in different formats for project-based learning. Highcharts is one the most useful products worldwide for develop charting on the web, and Angular is well known for speed. Using Highcharts with Angular, developers can build fast, interactive dashboards. Get up to speed using this book today. What You’ll Learn How to develop interactive, animated dashboards How you can implement Highcharts using Angular How to develop a real-time application with the use of WebAPI, Angular, and Highcharts How to create interactive styling themes and colors for a dashboard Who This Book Is For This book is aimed at developers, dev leads, software architects, students or enthusiasts who are already familiar with HTML, CSS, and JavaScript.

Hands On With Google Data Studio

2020-02-05 O'Reilly Amazon

book

Lee Hurst

data data-science data-science-tasks data-visualization API DataViz

Learn how to easily transform your data into engaging, interactive visual reports! Data is no longer the sole domain of tech professionals and scientists. Whether in our personal, business, or community lives, data is rapidly increasing in both importance and sheer volume. The ability to visualize all kinds of data is now within reach for anyone with a computer and an internet connection. Google Data Studio, quickly becoming the most popular free tool in data visualization, offers users a flexible, powerful way to transform private and public data into interactive knowledge that can be easily shared and understood. Hands On With Google Data Studio teaches you how to visualize your data today and produce professional quality results quickly and easily. No previous experience is required to get started right away—all you need is this guide, a Gmail account, and a little curiosity to access and visualize data just like large businesses and organizations. Clear, step-by-step instructions help you identify business trends, turn budget data into a report, assess how your websites or business listings are performing, analyze public data, and much more. Practical examples and expert tips are found throughout the text to help you fully understand and apply your new knowledge to a wide array of real-world scenarios. This engaging, reader-friendly guide will enable you to: Use Google Data Studio to access various types of data, from your own personal data to public sources Build your first data set, navigate the Data Studio interface, customize reports, and share your work Learn the fundamentals of data visualization, personal data accessibility, and open data API's Harness the power of publicly accessible data services including Google’s recently released Data Set Search Add banners, logos, custom graphics, and color palettes Hands On With Google Data Studio: A Data Citizens Survival Guide is a must-have resource for anyone starting their data visualization journey, from individuals, consultants, and small business owners to large business and organization managers and leaders.

Principles of Managerial Statistics and Data Science

2020-02-05 O'Reilly Amazon

book

Roberto Rivera

data data-science data-science-tasks statistics Analytics Big Data

Introduces readers to the principles of managerial statistics and data science, with an emphasis on statistical literacy of business students Through a statistical perspective, this book introduces readers to the topic of data science, including Big Data, data analytics, and data wrangling. Chapters include multiple examples showing the application of the theoretical aspects presented. It features practice problems designed to ensure that readers understand the concepts and can apply them using real data. Over 100 open data sets used for examples and problems come from regions throughout the world, allowing the instructor to adapt the application to local data with which students can identify. Applications with these data sets include: Assessing if searches during a police stop in San Diego are dependent on driver’s race Visualizing the association between fat percentage and moisture percentage in Canadian cheese Modeling taxi fares in Chicago using data from millions of rides Analyzing mean sales per unit of legal marijuana products in Washington state Topics covered in Principles of Managerial Statistics and Data Science include:data visualization; descriptive measures; probability; probability distributions; mathematical expectation; confidence intervals; and hypothesis testing. Analysis of variance; simple linear regression; and multiple linear regression are also included. In addition, the book offers contingency tables, Chi-square tests, non-parametric methods, and time series methods. The textbook: Includes academic material usually covered in introductory Statistics courses, but with a data science twist, and less emphasis in the theory Relies on Minitab to present how to perform tasks with a computer Presents and motivates use of data that comes from open portals Focuses on developing an intuition on how the procedures work Exposes readers to the potential in Big Data and current failures of its use Supplementary material includes: a companion website that houses PowerPoint slides; an Instructor's Manual with tips, a syllabus model, and project ideas; R code to reproduce examples and case studies; and information about the open portal data Features an appendix with solutions to some practice problems Principles of Managerial Statistics and Data Science is a textbook for undergraduate and graduate students taking managerial Statistics courses, and a reference book for working business professionals.

Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP, 2nd Edition

2020-02-05 O'Reilly Amazon

book

Bhisham C. Gupta , Kalanka P. Jayalath , Irwin Guttman

data data-science data-science-tasks statistics AI/ML Big Data

Introduces basic concepts in probability and statistics to data science students, as well as engineers and scientists Aimed at undergraduate/graduate-level engineering and natural science students, this timely, fully updated edition of a popular book on statistics and probability shows how real-world problems can be solved using statistical concepts. It removes Excel exhibits and replaces them with R software throughout, and updates both MINITAB and JMP software instructions and content. A new chapter discussing data mining—including big data, classification, machine learning, and visualization—is featured. Another new chapter covers cluster analysis methodologies in hierarchical, nonhierarchical, and model based clustering. The book also offers a chapter on Response Surfaces that previously appeared on the book’s companion website. Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP, Second Edition is broken into two parts. Part I covers topics such as: describing data graphically and numerically, elements of probability, discrete and continuous random variables and their probability distributions, distribution functions of random variables, sampling distributions, estimation of population parameters and hypothesis testing. Part II covers: elements of reliability theory, data mining, cluster analysis, analysis of categorical data, nonparametric tests, simple and multiple linear regression analysis, analysis of variance, factorial designs, response surfaces, and statistical quality control (SQC) including phase I and phase II control charts. The appendices contain statistical tables and charts and answers to selected problems. Features two new chapters—one on Data Mining and another on Cluster Analysis Now contains R exhibits including code, graphical display, and some results MINITAB and JMP have been updated to their latest versions Emphasizes the p-value approach and includes related practical interpretations Offers a more applied statistical focus, and features modified examples to better exhibit statistical concepts Supplemented with an Instructor's-only solutions manual on a book’s companion website Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP is an excellent text for graduate level data science students, and engineers and scientists. It is also an ideal introduction to applied statistics and probability for undergraduate students in engineering and the natural sciences.

Neural Networks Modeling and Control

2020-01-15 O'Reilly Amazon

book

Carlos Lopez-Franco , Alma Y Alanis , Edgar N. Sanchez , Jorge D. Rios , Nancy Arana-Daniel

data data-science data-science-tasks statistics time-series

Neural Networks Modelling and Control: Applications for Unknown Nonlinear Delayed Systems in Discrete Time focuses on modeling and control of discrete-time unknown nonlinear delayed systems under uncertainties based on Artificial Neural Networks. First, a Recurrent High Order Neural Network (RHONN) is used to identify discrete-time unknown nonlinear delayed systems under uncertainties, then a RHONN is used to design neural observers for the same class of systems. Therefore, both neural models are used to synthesize controllers for trajectory tracking based on two methodologies: sliding mode control and Inverse Optimal Neural Control. As well as considering the different neural control models and complications that are associated with them, this book also analyzes potential applications, prototypes and future trends. Provide in-depth analysis of neural control models and methodologies Presents a comprehensive review of common problems in real-life neural network systems Includes an analysis of potential applications, prototypes and future trends

Tableau Desktop Certified Associate: Exam Guide

2019-12-24 O'Reilly Amazon

book

Fabian Peri , Dmitry Anoshin , JC Gillet , Radhika Biyani , Gleb Makarenko

data data-science data-science-tasks data-visualization Tableau Analytics

Tableau Desktop Certified Associate: Exam Guide is your companion for mastering Tableau and preparing for the certification exam with confidence. Through this book, you will gain a comprehensive understanding of Tableau Desktop's features and learn to implement them in practical scenarios to solve analytics challenges. What this Book will help me do Understand and apply Tableau best practices for analyzing and visualizing data effectively. Visualize geographic data using vector maps and gain insights into spatial distributions. Leverage advanced analytics techniques such as forecasting to predict key metrics. Build effective dashboards that convey information clearly and efficiently. Gain confidence in tackling Tableau Desktop Certified Associate exam questions with expert tips and mock exams. Author(s) The authors, Dmitry Anoshin, JC Gillet, Peri Biyani, and others, are experienced professionals in data analytics and business intelligence. With significant expertise in teaching and applying Tableau, they bring a wealth of knowledge to this guide, offering clear instructions and practical insights. Their dedication to empowering learners fosters a supportive and assured journey through this book. Who is it for? This book is ideal for business analysts, BI professionals, and data analysts aiming to become certified Tableau Desktop Associates. If you have a foundational understanding of Tableau Desktop and are looking to deepen your expertise while preparing for certification, this book is tailored to help you achieve that goal.

Effective Data Storytelling

2019-12-17 O'Reilly Amazon

book

Brent Dykes

data data-science data-science-tasks data-visualization Tableau DataViz

Master the art and science of data storytelling—with frameworks and techniques to help you craft compelling stories with data. The ability to effectively communicate with data is no longer a luxury in today’s economy; it is a necessity. Transforming data into visual communication is only one part of the picture. It is equally important to engage your audience with a narrative—to tell a story with the numbers. Effective Data Storytelling will teach you the essential skills necessary to communicate your insights through persuasive and memorable data stories. Narratives are more powerful than raw statistics, more enduring than pretty charts. When done correctly, data stories can influence decisions and drive change. Most other books focus only on data visualization while neglecting the powerful narrative and psychological aspects of telling stories with data. Author Brent Dykes shows you how to take the three central elements of data storytelling—data, narrative, and visuals—and combine them for maximum effectiveness. Taking a comprehensive look at all the elements of data storytelling, this unique book will enable you to: Transform your insights and data visualizations into appealing, impactful data stories Learn the fundamental elements of a data story and key audience drivers Understand the differences between how the brain processes facts and narrative Structure your findings as a data narrative, using a four-step storyboarding process Incorporate the seven essential principles of better visual storytelling into your work Avoid common data storytelling mistakes by learning from historical and modern examples Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals is a must-have resource for anyone who communicates regularly with data, including business professionals, analysts, marketers, salespeople, financial managers, and educators.

Prepare Your Data for Tableau: A Practical Guide to the Tableau Data Prep Tool

2019-12-16 O'Reilly Amazon

book

Lori Blackshear , Tim Costello

data data-science data-science-tasks data-visualization Tableau Analytics

Focus on the most important and most often overlooked factor in a successful Tableau project—data. Without a reliable data source, you will not achieve the results you hope for in Tableau. This book does more than teach the mechanics of data preparation. It teaches you: how to look at data in a new way, to recognize the most common issues that hinder analytics, and how to mitigate those factors one by one. Tableau can change the course of business, but the old adage of "garbage in, garbage out" is the hard truth that hides behind every Tableau sales pitch. That amazing sales demo does not work as well with bad data. The unfortunate reality is that almost all data starts out in a less-than-perfect state. Data prep is hard. Traditionally, we were forced into the world of the database where complex ETL (Extract, Transform, Load) operations created by the data team did all the heavy lifting for us. Fortunately, we have moved past those days. With the introduction of the Tableau Data Prep tool you can now handle most of the common Data Prep and cleanup tasks on your own, at your desk, and without the help of the data team. This essential book will guide you through: The layout and important parts of the Tableau Data Prep tool Connecting to data Data quality and consistency The shape of the data. Is the data oriented in columns or rows? How to decide? Why does it matter? What is the level of detail in the source data? Why is that important? Combining source data to bring in more fields and rows Saving the data flow and the results of our data prep work Common cleanup and setup tasks in Tableau Desktop What You Will Learn Recognize data sources that are good candidates for analytics in Tableau Connect tolocal, server, and cloud-based data sources Profile data to better understand its content and structure Rename fields, adjust data types, group data points, and aggregate numeric data Pivot data Join data from local, server, and cloud-based sources for unified analytics Review the steps and results of each phase of the Data Prep process Output new data sources that can be reviewed in Tableau or any other analytics tool Who This Book Is For Tableau Desktop users who want to: connect to data, profile the data to identify common issues, clean up those issues, join to additional data sources, and save the newly cleaned, joined data so that it can be used more effectively in Tableau

Introduction to Stochastic Processes and Simulation

2019-12-12 O'Reilly Amazon

book

Gerard-Michel Cochard

data data-science data-science-tasks statistics

Mastering chance has, for a long time, been a preoccupation of mathematical research. Today, we possess a predictive approach to the evolution of systems based on the theory of probabilities. Even so, uncovering this subject is sometimes complex, because it necessitates a good knowledge of the underlying mathematics. This book offers an introduction to the processes linked to the fluctuations in chance and the use of numerical methods to approach solutions that are difficult to obtain through an analytical approach. It takes classic examples of inventory and queueing management, and addresses more diverse subjects such as equipment reliability, genetics, population dynamics, physics and even market finance. It is addressed to those at Master's level, at university, engineering school or management school, but also to an audience of those in continuing education, in order that they may discover the vast field of decision support.

Mining Social Media

2019-12-10 O'Reilly Amazon

book

Lam Thuy Vo

data data-science data-science-tasks web-scraping API Google Sheets

Did fake Twitter accounts help sway a presidential election? What can Facebook and Reddit archives tell us about human behavior? In Mining Social Media, senior BuzzFeed reporter Lam Thuy Vo shows you how to use Python and key data analysis tools to find the stories buried in social media. Whether you’re a professional journalist, an academic researcher, or a citizen investigator, you’ll learn how to use technical tools to collect and analyze data from social media sources to build compelling, data-driven stories. Learn how to: •Write Python scripts and use APIs to gather data from the social web •Download data archives and dig through them for insights •Inspect HTML downloaded from websites for useful content •Format, aggregate, sort, and filter your collected data using Google Sheets •Create data visualizations to illustrate your discoveries •Perform advanced data analysis using Python, Jupyter Notebooks, and the pandas library •Apply what you’ve learned to research topics on your own Social media is filled with thousands of hidden stories just waiting to be told. Learn to use the data-sleuthing tools that professionals use to write your own data-driven stories.

Avoiding Data Pitfalls

2019-11-19 O'Reilly Amazon

book

Ben Jones

data data-science data-science-tasks data-visualization DataViz

Avoid data blunders and create truly useful visualizations Avoiding Data Pitfalls is a reputation-saving handbook for those who work with data, designed to help you avoid the all-too-common blunders that occur in data analysis, visualization, and presentation. Plenty of data tools exist, along with plenty of books that tell you how to use them—but unless you truly understand how to work with data, each of these tools can ultimately mislead and cause costly mistakes. This book walks you step by step through the full data visualization process, from calculation and analysis through accurate, useful presentation. Common blunders are explored in depth to show you how they arise, how they have become so common, and how you can avoid them from the outset. Then and only then can you take advantage of the wealth of tools that are out there—in the hands of someone who knows what they're doing, the right tools can cut down on the time, labor, and myriad decisions that go into each and every data presentation. Workers in almost every industry are now commonly expected to effectively analyze and present data, even with little or no formal training. There are many pitfalls—some might say chasms—in the process, and no one wants to be the source of a data error that costs money or even lives. This book provides a full walk-through of the process to help you ensure a truly useful result. Delve into the "data-reality gap" that grows with our dependence on data Learn how the right tools can streamline the visualization process Avoid common mistakes in data analysis, visualization, and presentation Create and present clear, accurate, effective data visualizations To err is human, but in today's data-driven world, the stakes can be high and the mistakes costly. Don't rely on "catching" mistakes, avoid them from the outset with the expert instruction in Avoiding Data Pitfalls.

Advanced Statistics with Applications in R

2019-11-12 O'Reilly Amazon

book

Eugene Demidenko

data data-science data-science-tasks statistics Data Science

Advanced Statistics with Applications in R fills the gap between several excellent theoretical statistics textbooks and many applied statistics books where teaching reduces to using existing packages. This book looks at what is under the hood. Many statistics issues including the recent crisis with p-value are caused by misunderstanding of statistical concepts due to poor theoretical background of practitioners and applied statisticians. This book is the product of a forty-year experience in teaching of probability and statistics and their applications for solving real-life problems. There are more than 442 examples in the book: basically every probability or statistics concept is illustrated with an example accompanied with an R code. Many examples, such as Who said π? What team is better? The fall of the Roman empire, James Bond chase problem, Black Friday shopping, Free fall equation: Aristotle or Galilei, and many others are intriguing. These examples cover biostatistics, finance, physics and engineering, text and image analysis, epidemiology, spatial statistics, sociology, etc. Advanced Statistics with Applications in R teaches students to use theory for solving real-life problems through computations: there are about 500 R codes and 100 datasets. These data can be freely downloaded from the author's website dartmouth.edu/~eugened. This book is suitable as a text for senior undergraduate students with major in statistics or data science or graduate students. Many researchers who apply statistics on the regular basis find explanation of many fundamental concepts from the theoretical perspective illustrated by concrete real-world applications.

Data Mining for Business Analytics

2019-11-05 O'Reilly Amazon

book

Nitin R. Patel , Galit Shmueli , Peter C. Bruce , Peter Gedeck

data data-science data-science-tasks exploratory-data-analysis AI/ML Analytics

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python presents an applied approach to data mining concepts and methods, using Python software for illustration Readers will learn how to implement a variety of popular data mining algorithms in Python (a free and open-source software) to tackle business problems and opportunities. This is the sixth version of this successful text, and the first using Python. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes: A new co-author, Peter Gedeck, who brings both experience teaching business analytics courses using Python, and expertise in the application of machine learning methods to the drug-discovery process A new section on ethical issues in data mining Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students More than a dozen case studies demonstrating applications for the data mining techniques described End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology. “This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject.” —Gareth M. James, University of Southern California and co-author (with Witten, Hastie and Tibshirani) of the best-selling book An Introduction to Statistical Learning, with Applications in R

Clustering Methodology for Symbolic Data

2019-11-04 O'Reilly Amazon

book

Edwin Diday , Lynne Billard

data data-science data-science-tasks exploratory-data-analysis Data Management Data Science

Covers everything readers need to know about clustering methodology for symbolic data—including new methods and headings—while providing a focus on multi-valued list data, interval data and histogram data This book presents all of the latest developments in the field of clustering methodology for symbolic data—paying special attention to the classification methodology for multi-valued list, interval-valued and histogram-valued data methodology, along with numerous worked examples. The book also offers an expansive discussion of data management techniques showing how to manage the large complex dataset into more manageable datasets ready for analyses. Filled with examples, tables, figures, and case studies, Clustering Methodology for Symbolic Data begins by offering chapters on data management, distance measures, general clustering techniques, partitioning, divisive clustering, and agglomerative and pyramid clustering. Provides new classification methodologies for histogram valued data reaching across many fields in data science Demonstrates how to manage a large complex dataset into manageable datasets ready for analysis Features very large contemporary datasets such as multi-valued list data, interval-valued data, and histogram-valued data Considers classification models by dynamical clustering Features a supporting website hosting relevant data sets Clustering Methodology for Symbolic Data will appeal to practitioners of symbolic data analysis, such as statisticians and economists within the public sectors. It will also be of interest to postgraduate students of, and researchers within, web mining, text mining and bioengineering.

Spatial Analysis Using Big Data

2019-11-03 O'Reilly Amazon

book

Hajime Seya , Yoshiki Yamagata

data data-science data-science-tasks exploratory-data-analysis Big Data MATLAB

Spatial Analysis Using Big Data: Methods and Urban Applications helps readers understand the most powerful, state-of-the-art spatial econometric methods, focusing particularly on urban research problems. The methods represent a cluster of potentially transformational socio-economic modeling tools that allow researchers to capture real-time and high-resolution information to potentially reveal new socioeconomic dynamics within urban populations. Each method, written by leading exponents of the discipline, uses real-time urban big data to solve research problems in spatial science. Urban applications of these methods are provided in unsurpassed depth, with chapters on surface temperature mapping, view value analysis, community clustering and spatial-social networks, among many others. Reviews some of the most powerful and challenging modern methods to study big data problems in spatial science Provides computer codes written in R, MATLAB and Python to help implement methods Applies these methods to common problems observed in urban and regional economics

Pro D3.js: Use D3.js to Create Maintainable, Modular, and Testable Charts

2019-10-31 O'Reilly Amazon

book

Marcos Iglesias

data data-science data-science-tasks data-visualization d3 API

Go beyond the basics of D3.js to create maintainable, modular, and testable charts and to package them into a library that can be distributed as open source software or kept for private use. This book will show you how to transform regular D3.js chart code into reusable and extendable modules.You know the basics of working with D3.js, but it's time to become a professional D3.js practitioner. This book is your launching pad to refactoring code, composing complex visualizations from small components, working as a team with other developers, and integrating charts with a Continuous Integration system. You'll begin by creating a production-ready chart using D3.js v5, ES2015, and a test-driven approach and then move on to using and extending Britecharts, the reusable charting library based on Reusable API patterns. Finally, you'll see how to use D3.js along with React to document and build your charts to compose a charting library you can release into the NPM repository. With Pro D3.js, you'll become an accomplished D3.js developer in no time. What You Will Learn Create v5 D3.js charts with ES2016 and unit tests Develop modular, testable and extensible code with the Reusable API pattern Work with and extend Britecharts, a reusable charting library created at Eventbrite Use Webpack and npm to create and publish a charting library from your own chart collections Write reference documentation and build a documentation homepage for your library. Who This Book Is For Data scientists, data visualization engineers, and frontend developers with a fundamental knowledge of D3.js and some experience with JavaScript, as well as data journalists and consultants.

Business Statistics with Solutions in R

2019-10-21 O'Reilly Amazon

book

Mustapha Abiodun Akinkunmi

data data-science data-science-tasks statistics

Business Statistics with Solutions in R covers a wide range of applications of statistics in solving business related problems. It will introduce readers to quantitative tools that are necessary for daily business needs and help them to make evidence-based decisions. The book provides an insight on how to summarize data, analyze it, and draw meaningful inferences that can be used to improve decisions. It will enable readers to develop computational skills and problem-solving competence using the open source language, R. Mustapha Abiodun Akinkunmi uses real life business data for illustrative examples while discussing the basic statistical measures, probability, regression analysis, significance testing, correlation, the Poisson distribution, process control for manufacturing, time series analysis, forecasting techniques, exponential smoothing, univariate and multivariate analysis including ANOVA and MANOVA and more in this valuable reference for policy makers, professionals, academics and individuals interested in the areas of business statistics, applied statistics, statistical computing, finance, management and econometrics.

talk-data.com

O'Reilly Data Science Books

Top Topics

Top Speakers

Modern Data Mining Algorithms in C++ and CUDA C: Recent Developments in Feature Extraction and Selection Algorithms for Data Science

Practical Synthetic Data Generation

Forensic Analytics, 2nd Edition

Innovative Tableau

Practical Statistics for Data Scientists, 2nd Edition

End-to-end Data Analytics for Product Development

The Practitioner's Guide to Graph Data

Statistical Rethinking, 2nd Edition

Practical Highcharts with Angular: Your Essential Guide to Creating Real-time Dashboards

Hands On With Google Data Studio

Principles of Managerial Statistics and Data Science

Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP, 2nd Edition

Neural Networks Modeling and Control

Tableau Desktop Certified Associate: Exam Guide

Effective Data Storytelling

Prepare Your Data for Tableau: A Practical Guide to the Tableau Data Prep Tool

Introduction to Stochastic Processes and Simulation

Mining Social Media

Avoiding Data Pitfalls

Advanced Statistics with Applications in R

Data Mining for Business Analytics

Clustering Methodology for Symbolic Data

Spatial Analysis Using Big Data

Pro D3.js: Use D3.js to Create Maintainable, Modular, and Testable Charts

Business Statistics with Solutions in R