data-science-tasks

Mathematical Statistics and Stochastic Processes

2012-05-14 · O'Reilly Data Science Books O'Reilly Amazon

book

by Denis Bosq

data data-science statistics

Generally, books on mathematical statistics are restricted to the case of independent identically distributed random variables. In this book however, both this case AND the case of dependent variables, i.e. statistics for discrete and continuous time processes, are studied. This second case is very important for today's practitioners. Mathematical Statistics and Stochastic Processes is based on decision theory and asymptotic statistics and contains up-to-date information on the relevant topics of theory of probability, estimation, confidence intervals, non-parametric statistics and robustness, second-order processes in discrete and continuous time and diffusion processes, statistics for discrete and continuous time processes, statistical prediction, and complements in probability. This book is aimed at students studying courses on probability with an emphasis on measure theory and for all practitioners who apply and use statistics and probability on a daily basis.

Textual Information Access: Statistical Models

2012-05-14 · O'Reilly Data Science Books O'Reilly Amazon

book

by Francois Yvon , Eric Gaussier

data data-science statistics

This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access: - information extraction and retrieval; - text classification and clustering; - opinion mining; - comprehension aids (automatic summarization, machine translation, visualization). In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections. Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration. Contents Part 1: Information Retrieval 1. Probabilistic Models for Information Retrieval, Stéphane Clinchant and Eric Gaussier. 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval, Massih-Réza Amini, David Buffoni, Patrick Gallinari, Tuong Vinh Truong and Nicolas Usunier. Part 2: Classification and Clustering 3. Logistic Regression and Text Classification, Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis, Michel Burlet and Yves Denneulin. 4. Kernel Methods for Textual Information Access, Jean-Michel Renders. 5. Topic-Based Generative Models for Text Information Access, Jean-Cédric Chappelier. 6. Conditional Random Fields for Information Extraction, Isabelle Tellier and Marc Tommasi. Part 3: Multilingualism 7. Statistical Methods for Machine Translation, Alexandre Allauzen and François Yvon. Part 4: Emerging Applications 8. Information Mining: Methods and Interfaces for Accessing Complex Information, Josiane Mothe, Kurt Englmeier and Fionn Murtagh. 9. Opinion Detection as a Topic Classification Problem, Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot and Fréderic Béchet.

A Quantitative Approach to Commercial Damages: Applying Statistics to the Measurement of Lost Profits, + Website

2012-05-08 · O'Reilly Data Science Books O'Reilly Amazon

book

by James A. DiGabriele , Mark G. Filler

data data-science statistics

How-to guidance for measuring lost profits due to business interruption damages A Quantitative Approach to Commercial Damages explains the complicated process of measuring business interruption damages, whether they are losses are from natural or man-made disasters, or whether the performance of one company adversely affects the performance of another. Using a methodology built around case studies integrated with solution tools, this book is presented step by step from the analysis damages perspective to aid in preparing a damage claim. Over 250 screen shots are included and key cell formulas that show how to construct a formula and lay it out on the spreadsheet. Includes Excel spreadsheet applications and key cell formulas for those who wish to construct their own spreadsheets Offers a step-by-step approach to computing damages using case studies and over 250 screen shots Often in the course of business, a firm will be damaged by the actions of another individual or company, such as a fire that shuts down a restaurant for two months. Often, this results in the filing of a business interruption claim. Discover how to measure business losses with the proven guidance found in A Quantitative Approach to Commercial Damages.

Cash Flow Analysis and Forecasting: The Definitive Guide to Understanding and Using Published Cash Flow Data

2012-05-08 · O'Reilly Data Science Books O'Reilly Amazon

book

by Timothy Jury

data data-science forecasting statistics time-series

This book is the definitive guide to cash flow statement analysis and forecasting. It takes the reader from an introduction about how cash flows move within a business, through to a detailed review of the contents of a cash flow statement. This is followed by detailed guidance on how to restate cash flows into a template format. The book shows how to use the template to analyse the data from start up, growth, mature and declining companies, and those using US GAAP and IAS reporting. The book includes real world examples from such companies as Black and Decker (US), Fiat (Italy) and Tesco (UK). A section on cash flow forecasting includes full coverage of spreadsheet risk and good practice. Complete with chapters of particular interest to those involved in credit markets as lenders or counter-parties, those running businesses and those in equity investing, this book is the definitive guide to understanding and interpreting cash flow data.

Bayesian Analysis of Stochastic Process Models

2012-05-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mike Wiper , Fabrizio Ruggeri , David Insua

Computer Science bayesian-statistics data data-science statistics

Bayesian analysis of complex models based on stochastic processes has in recent years become a growing area. This book provides a unified treatment of Bayesian analysis of models based on stochastic processes, covering the main classes of stochastic processing including modeling, computational, inference, forecasting, decision making and important applied models. Key features: Explores Bayesian analysis of models based on stochastic processes, providing a unified treatment. Provides a thorough introduction for research students. Computational tools to deal with complex problems are illustrated along with real life case studies Looks at inference, prediction and decision making. Researchers, graduate and advanced undergraduate students interested in stochastic processes in fields such as statistics, operations research (OR), engineering, finance, economics, computer science and Bayesian analysis will benefit from reading this book. With numerous applications included, practitioners of OR, stochastic modelling and applied statistics will also find this book useful.

Modelling Under Risk and Uncertainty: An Introduction to Statistical, Phenomenological and Computational Methods

2012-04-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Etienne de Rocquigny

data data-science statistics

Modelling has permeated virtually all areas of industrial, environmental, economic, bio-medical or civil engineering: yet the use of models for decision-making raises a number of issues to which this book is dedicated: How uncertain is my model ? Is it truly valuable to support decision-making ? What kind of decision can be truly supported and how can I handle residual uncertainty ? How much refined should the mathematical description be, given the true data limitations ? Could the uncertainty be reduced through more data, increased modeling investment or computational budget ? Should it be reduced now or later ? How robust is the analysis or the computational methods involved ? Should / could those methods be more robust ? Does it make sense to handle uncertainty, risk, lack of knowledge, variability or errors altogether ? How reasonable is the choice of probabilistic modeling for rare events ? How rare are the events to be considered ? How far does it make sense to handle extreme events and elaborate confidence figures ? Can I take advantage of expert / phenomenological knowledge to tighten the probabilistic figures ? Are there connex domains that could provide models or inspiration for my problem ? Written by a leader at the crossroads of industry, academia and engineering, and based on decades of multi-disciplinary field experience, Modelling Under Risk and Uncertainty gives a self-consistent introduction to the methods involved by any type of modeling development acknowledging the inevitable uncertainty and associated risks. It goes beyond the "black-box" view that some analysts, modelers, risk experts or statisticians develop on the underlying phenomenology of the environmental or industrial processes, without valuing enough their physical properties and inner modelling potential nor challenging the practical plausibility of mathematical hypotheses; conversely it is also to attract environmental or engineering modellers to better handle model confidence issues through finer statistical and risk analysis material taking advantage of advanced scientific computing, to face new regulations departing from deterministic design or support robust decision-making. Modelling Under Risk and Uncertainty: Addresses a concern of growing interest for large industries, environmentalists or analysts: robust modeling for decision-making in complex systems. Gives new insights into the peculiar mathematical and computational challenges generated by recent industrial safety or environmental control analysis for rare events. Implements decision theory choices differentiating or aggregating the dimensions of risk/aleatory and epistemic uncertainty through a consistent multi-disciplinary set of statistical estimation, physical modelling, robust computation and risk analysis. Provides an original review of the advanced inverse probabilistic approaches for model identification, calibration or data assimilation, key to digest fast-growing multi-physical data acquisition. Illustrated with one favourite pedagogical example crossing natural risk, engineering and economics, developed throughout the book to facilitate the reading and understanding. Supports Master/PhD-level course as well as advanced tutorials for professional training Analysts and researchers in numerical modeling, applied statistics, scientific computing, reliability, advanced engineering, natural risk or environmental science will benefit from this book.

Statistical Thinking: Improving Business Performance, Second Edition

2012-04-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ron Snee , Roger Hoerl

data data-science statistics

How statistical thinking and methodology can help you make crucial business decisions Straightforward and insightful, Statistical Thinking: Improving Business Performance, Second Edition, prepares you for business leadership by developing your capacity to apply statistical thinking to improve business processes. Unique and compelling, this book shows you how to derive actionable conclusions from data analysis, solve real problems, and improve real processes. Here, you'll discover how to implement statistical thinking and methodology in your work to improve business performance. Explores why statistical thinking is necessary and helpful Provides case studies that illustrate how to integrate several statistical tools into the decision-making process Facilitates and encourages an experiential learning environment to enable you to apply material to actual problems With an in-depth discussion of JMP® software, the new edition of this important book focuses on skills to improve business processes, including collecting data appropriate for a specified purpose, recognizing limitations in existing data, and understanding the limitations of statistical analyses.

Introduction to Linear Regression Analysis, 5th Edition

2012-04-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Elizabeth A. Peck , G. Geoffrey Vining , Douglas C. Montgomery

SAS data data-science regression-analysis statistics

Praise for the Fourth Edition "As with previous editions, the authors have produced a leading textbook on regression." —Journal of the American Statistical Association A comprehensive and up-to-date introduction to the fundamentals of regression analysis Introduction to Linear Regression Analysis, Fifth Edition continues to present both the conventional and less common uses of linear regression in today's cutting-edge scientific research. The authors blend both theory and application to equip readers with an understanding of the basic principles needed to apply regression model-building techniques in various fields of study, including engineering, management, and the health sciences. Following a general introduction to regression modeling, including typical applications, a host of technical tools are outlined such as basic inference procedures, introductory aspects of model adequacy checking, and polynomial regression models and their variations. The book then discusses how transformations and weighted least squares can be used to resolve problems of model inadequacy and also how to deal with influential observations. The Fifth Edition features numerous newly added topics, including: A chapter on regression analysis of time series data that presents the Durbin-Watson test and other techniques for detecting autocorrelation as well as parameter estimation in time series regression models Regression models with random effects in addition to a discussion on subsampling and the importance of the mixed model Tests on individual regression coefficients and subsets of coefficients Examples of current uses of simple linear regression models and the use of multiple regression models for understanding patient satisfaction data. In addition to Minitab, SAS, and S-PLUS, the authors have incorporated JMP and the freely available R software to illustrate the discussed techniques and procedures in this new edition. Numerous exercises have been added throughout, allowing readers to test their understanding of the material, and a related FTP site features the presented data sets, extensive problem solutions, software hints, and PowerPoint slides to facilitate instructional use of the book. Introduction to Linear Regression Analysis, Fifth Edition is an excellent book for statistics and engineering courses on regression at the upper-undergraduate and graduate levels. The book also serves as a valuable, robust resource for professionals in the fields of engineering, life and biological sciences, and the social sciences.

Stochastic Modeling and Analysis of Telecoms Networks

2012-04-09 · O'Reilly Data Science Books O'Reilly Amazon

book

by Pascal Moyal , Laurent Decreusefond

data data-science statistics time-series

This book addresses the stochastic modeling of telecommunication networks, introducing the main mathematical tools for that purpose, such as Markov processes, real and spatial point processes and stochastic recursions, and presenting a wide list of results on stability, performances and comparison of systems. The authors propose a comprehensive mathematical construction of the foundations of stochastic network theory: Markov chains, continuous time Markov chains are extensively studied using an original martingale-based approach. A complete presentation of stochastic recursions from an ergodic theoretical perspective is also provided, as well as spatial point processes. Using these basic tools, stability criteria, performance measures and comparison principles are obtained for a wide class of models, from the canonical M/M/1 and G/G/1 queues to more sophisticated systems, including the current "hot topics" of spatial radio networking, OFDMA and real-time networks. Contents 1. Introduction. Part 1: Discrete-time Modeling 2. Stochastic Recursive Sequences. 3. Markov Chains. 4. Stationary Queues. 5. The M/GI/1 Queue. Part 2: Continuous-time Modeling 6. Poisson Process. 7. Markov Process. 8. Systems with Delay. 9. Loss Systems. Part 3: Spatial Modeling 10. Spatial Point Processes.

Logistic Regression Using SAS, 2nd Edition

2012-03-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Paul D. Allison

SAS data data-science regression-analysis statistics

If you are a researcher or student with experience in multiple linear regression and want to learn about logistic regression, Paul Allison's Logistic Regression Using SAS: Theory and Application, Second Edition, is for you! Informal and nontechnical, this book both explains the theory behind logistic regression, and looks at all the practical details involved in its implementation using SAS. Several real-world examples are included in full detail. This book also explains the differences and similarities among the many generalizations of the logistic regression model. The following topics are covered: binary logistic regression, logit analysis of contingency tables, multinomial logit analysis, ordered logit analysis, discrete-choice analysis, and Poisson regression. Other highlights include discussions on how to use the GENMOD procedure to do loglinear analysis and GEE estimation for longitudinal binary data. Only basic knowledge of the SAS DATA step is assumed. The second edition describes many new features of PROC LOGISTIC, including conditional logistic regression, exact logistic regression, generalized logit models, ROC curves, the ODDSRATIO statement (for analyzing interactions), and the EFFECTPLOT statement (for graphing nonlinear effects). Also new is coverage of PROC SURVEYLOGISTIC (for complex samples), PROC GLIMMIX (for generalized linear mixed models), PROC QLIM (for selection models and heterogeneous logit models), and PROC MDC (for advanced discrete choice models).

This book is part of the SAS Press program.

Designing Great Data Products

2012-03-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Margit Zwemer , Mike Loukides , Jeremy Howard

data data-science forecasting statistics time-series

In the past few years, we’ve seen many data products based on predictive modeling. These products range from weather forecasting to recommendation engines like Amazon's. Prediction technology can be interesting and mathematically elegant, but we need to take the next step: going from recommendations to products that can produce optimal strategies for meeting concrete business objectives. We already know how to build these products: they've been in use for the past decade or so, but they're not as common as they should be. This report shows how to take the next step: to go from simple predictions and recommendations to a new generation of data products with the potential to revolutionize entire industries.

Quantifying the User Experience

2012-03-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jeff Sauro , James R Lewis

data data-science statistics

Quantifying the User Experience: Practical Statistics for User Research offers a practical guide for using statistics to solve quantitative problems in user research. Many designers and researchers view usability and design as qualitative activities, which do not require attention to formulas and numbers. However, usability practitioners and user researchers are increasingly expected to quantify the benefits of their efforts. The impact of good and bad designs can be quantified in terms of conversions, completion rates, completion times, perceived satisfaction, recommendations, and sales. The book discusses ways to quantify user research; summarize data and compute margins of error; determine appropriate samples sizes; standardize usability questionnaires; and settle controversies in measurement and statistics. Each chapter concludes with a list of key points and references. Most chapters also include a set of problems and answers that enable readers to test their understanding of the material. This book is a valuable resource for those engaged in measuring the behavior and attitudes of people during their interaction with interfaces. Provides practical guidance on solving usability testing problems with statistics for any project, including those using Six Sigma practices Show practitioners which test to use, why they work, best practices in application, along with easy-to-use excel formulas and web-calculators for analyzing data Recommends ways for practitioners to communicate results to stakeholders in plain English Resources and tools available at the authors’ site: http://www.measuringu.com/

Mathematics and Statistics for Financial Risk Management

2012-03-06 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael B. Miller

Monte Carlo data data-science statistics

Mathematics and Statistics for Financial Risk Management is a practical guide to modern financial risk management for both practitioners and academics. The recent financial crisis and its impact on the broader economy underscore the importance of financial risk management in today's world. At the same time, financial products and investment strategies are becoming increasingly complex. Today, it is more important than ever that risk managers possess a sound understanding of mathematics and statistics. In a concise and easy-to-read style, each chapter of this book introduces a different topic in mathematics or statistics. As different techniques are introduced, sample problems and application sections demonstrate how these techniques can be applied to actual risk management problems. Exercises at the end of each chapter and the accompanying solutions at the end of the book allow readers to practice the techniques they are learning and monitor their progress. A companion website includes interactive Excel spreadsheet examples and templates. This comprehensive resource covers basic statistical concepts from volatility and Bayes' Law to regression analysis and hypothesis testing. Widely used risk models, including Value-at-Risk, factor analysis, Monte Carlo simulations, and stress testing are also explored. A chapter on time series analysis introduces interest rate modeling, GARCH, and jump-diffusion models. Bond pricing, portfolio credit risk, optimal hedging, and many other financial risk topics are covered as well. If you're looking for a book that will help you understand the mathematics and statistics of financial risk management, look no further.

Webbots, Spiders, and Screen Scrapers, 2nd Edition

2012-03-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael Schrenk

data data-science web-scraping

There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you? Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions.

gnuplot Cookbook

2012-02-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by Lee Phillips

data data-science data-visualization gnuplot

Master the art of technical plotting with 'gnuplot Cookbook'. This book serves as an indispensable guide to utilizing gnuplot's full range of capabilities for creating stunning 2D and 3D plots, interactive graphs, and seamless visual integration into programming projects. What this Book will help me do Gain precise control over the aesthetics and presentation of your graphs. Understand how to create complex graphical illustrations from multiple data sources. Learn to integrate gnuplot effectively into your programming workflows and systems. Discover how to produce professional-grade technical documents with high-quality charts and illustrations. Master interactive graph creation for engaging web content. Author(s) Lee Phillips, a seasoned expert in scientific and technical visualization, has leveraged years of practical experience to provide this comprehensive guide to gnuplot. With a sharp focus on clarity and functionality, Lee brings a hands-on approach to teaching through meticulously crafted examples and detailed explanations. Who is it for? This book is ideal for scientists, engineers, and data analysts who are either just starting or looking to deepen their expertise with gnuplot. It's perfect for those with a foundational understanding of graph plotting, aspiring to produce high-quality visualizations and integrate them effectively into diverse projects.

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

2012-01-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Robert Nisbet , John Elder , Andrew Fast , Thomas Hill , Gary D. Miner , Dursun Delen

BI data data-science statistics

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis. Winner of a 2012 PROSE Award in Computing and Information Sciences from the Association of American Publishers, this book presents a comprehensive how-to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities. The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically. Extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible Numerous examples, tutorials, power points and datasets available via companion website on Elsevierdirect.com Glossary of text mining terms provided in the appendix

Practical Data Mining

2011-12-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jr., F. Hancock

data data-science exploratory-data-analysis

Intended for those who need a practical guide to proven and up-to-date data mining techniques and processes, this book covers specific problem genres. With chapters that focus on application specifics, it allows readers to go to material relevant to their problem domain. Each section starts with a chapter-length roadmap for the given problem domain. This includes a checklist/decision-tree, which allows the reader to customize a data mining solution for their problem space. The roadmap discusses the technical components of solutions.

Statistical Learning and Data Science

2011-12-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Myriam Touati , Leon Bottou , Fionn Murtagh , Bernard Goldfarb , Catherine Pardoux , Mireille Gettler Summa

AI/ML Data Science data data-science statistics

Driven by a vast range of applications, data analysis and learning from data are vibrant areas of research. Various methodologies, including unsupervised data analysis, supervised machine learning, and semi-supervised techniques, have continued to develop to cope with the increasing amount of data collected through modern technology. With a focus on applications, this volume presents contributions from some of the leading researchers in the different fields of data analysis. Synthesizing the methodologies into a coherent framework, the book covers a range of topics, from large-scale machine learning to synthesis objects analysis.

Statistics of Medical Imaging

2011-12-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tianhu Lei

data data-science statistics

Statistical investigation into technology not only provides a better understanding of the intrinsic features of the technology (analysis), but also leads to an improved design of the technology (synthesis). Physical principles and mathematical procedures of medical imaging technologies have been extensively studied during past decades. However, less work has been done on their statistical aspect. Filling this gap, this book provides a theoretical framework for statistical investigation into medical technologies. Rather than offer detailed descriptions of statistics of basic imaging protocols of X-ray CT and MRI, the book presents a method to conduct similar statistical investigations into more complicated imaging protocols.

Spectral Feature Selection for Data Mining

2011-12-14 · O'Reilly Data Science Books O'Reilly Amazon

book

by Huan Liu , Zheng Alan Zhao

data data-science exploratory-data-analysis

Spectral Feature Selection for Data Mining introduces a novel feature selection technique that establishes a general platform for studying existing feature selection algorithms and developing new algorithms for emerging problems in real-world applications. This technique represents a unified framework for supervised, unsupervised, and semisupervise

talk-data.com

Activity Trend

Top Events

Top Speakers

Mathematical Statistics and Stochastic Processes

Textual Information Access: Statistical Models

A Quantitative Approach to Commercial Damages: Applying Statistics to the Measurement of Lost Profits, + Website

Cash Flow Analysis and Forecasting: The Definitive Guide to Understanding and Using Published Cash Flow Data

Bayesian Analysis of Stochastic Process Models

Modelling Under Risk and Uncertainty: An Introduction to Statistical, Phenomenological and Computational Methods

Statistical Thinking: Improving Business Performance, Second Edition

Introduction to Linear Regression Analysis, 5th Edition

Stochastic Modeling and Analysis of Telecoms Networks

Logistic Regression Using SAS, 2nd Edition

Designing Great Data Products

Quantifying the User Experience

Mathematics and Statistics for Financial Risk Management

Webbots, Spiders, and Screen Scrapers, 2nd Edition

gnuplot Cookbook

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

Practical Data Mining

Statistical Learning and Data Science

Statistics of Medical Imaging

Spectral Feature Selection for Data Mining