Offering a pathway to vibrant organizations, this book integrates systems thinking, critical thinking, and design thinking, and provides the tools needed to proactively apply them in the social systems where we live and work.
Activities tracked
794
Collection of O'Reilly books on Data Science.
Sessions & talks
Showing 51–75 of 794 · Newest first
Offering a pathway to vibrant organizations, this book integrates systems thinking, critical thinking, and design thinking, and provides the tools needed to proactively apply them in the social systems where we live and work.
Business Statistics, 3rd Edition 3rd Edition
Business Statistics: An Applied Orientation provides with a conceptual framework of business, develops skills in applying concepts into decision situations, and helps understand the nitty-gritty of business statistics. This book will also be useful to professionals who would like to acquire basic knowledge of business statistics that would help them analyze and interpret data.
Business Statistics offers readers a foundation in core statistical concepts using a perfect blend of theory and practical application. This book presents business statistics as value added tools in the process of converting data into useful information.
Valuable resource for researchers and postgraduate students from statistics, biostatistics, and other fields. It could be used as a textbook for a course on model-based clustering methods, and as a supplementary text for courses on data mining, semiparametric modeling, and high-dimensional data analysis.
This fully updated second edition is an essential introduction to inferential statistics. It is the first introductory statistics text to use an estimation approach with meta-analysis from the start and also to explain the new and exciting Open Science practices, which encourage replication and enhance the trustworthiness of research.
Microsoft 365 Excel introduces enhanced features that transform how business dashboards are built and maintained. This book guides you through creating dynamic, interactive dashboards that leverage these modern capabilities. From understanding the essential principles of effective dashboard design to mastering the latest tools like Power Query and dynamic array functions, you'll make the most of Excel's full potential. What this Book will help me do Understand the purpose and advantages of effective dashboards in business analytics. Use advanced Excel functions and tools such as Power Query and dynamic arrays to handle complex data workflows. Design visually engaging dashboards using charts and data visualizations that communicate key insights. Optimize dashboards for automation and real-time data updates, saving time and effort. Apply best practices and techniques for creating professional-grade Excel dashboards. Author(s) Michael Olafusi is a skilled data analyst and expert in Microsoft Excel, with years of experience leveraging Excel for business intelligence and analytics solutions. He enjoys teaching Excel users how to elevate their skills to create functional and visually impactful tools. Michael's approach combines clarity and practical advice, helping readers build proficiency and confidence. Who is it for? This book is perfect for Excel users who want to create professional dashboards for business decision support. It's especially useful for data analysts, financial analysts, business analysts, and those in similar roles. It requires a basic familiarity with Excel's interface and is ideal for those seeking to enhance their data presentation skills and automate repetitive reporting tasks.
Kibana 8.x - A Quick Start Guide to Data Analysis is an essential resource for anyone wanting to harness the robust capabilities of Kibana to analyze, visualize, and make sense of their data. Through clear explanations and practical exercises, this guide breaks down topics like creating dashboards, exploring datasets, and configuring Kibana's powerful features. What this Book will help me do Understand Kibana's interface and functionalities to manage Elasticsearch data. Learn how to create intuitive visualizations and customize dashboards. Explore features such as data discovery and real-time updates for analytics. Optimize and query datasets using ESQL and detailed analytics techniques. Master the process of embedding dashboards and exporting insights. Author(s) None Shah is an experienced data analytics professional with a deep understanding of the Elastic Stack, including Kibana and Elasticsearch. Having spent years working on big data projects, Shah is dedicated to helping technologists turn data into actionable insights. Her writing aims to simplify complex concepts into achievable learning milestones. Who is it for? This book is ideal for data analysts, data engineers, and anyone working extensively with Elasticsearch datasets. If you aim to gain hands-on experience with building interactive dashboards and visualizing data trends, this book is tailored for you. A foundational understanding of Elasticsearch would be beneficial but is not strictly required. Perfect for advancing decision-making with data insights.
If programming is magic, then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. This thoroughly updated third edition not only introduces you to web scraping but also serves as a comprehensive guide to scraping almost every type of data from the modern web. Part I focuses on web scraping mechanics: using Python to request information from a web server, performing basic handling of the server's response, and interacting with sites in an automated fashion. Part II explores a variety of more specific tools and applications to fit any web scraping scenario you're likely to encounter. Parse complicated HTML pages Develop crawlers with the Scrapy framework Learn methods to store the data you scrape Read and extract data from documents Clean and normalize badly formatted data Read and write natural languages Crawl through forms and logins Scrape JavaScript and crawl through APIs Use and write image-to-text software Avoid scraping traps and bot blockers Use scrapers to test your website
Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs. Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value. With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers: Challenges in deduplicating and joining datasets Extracting, cleansing, and preparing datasets for matching Text matching algorithms to identify equivalent entities Techniques for deduplicating and joining datasets at scale Matching datasets containing persons and organizations Evaluating data matches Optimizing and tuning data matching algorithms Entity resolution using cloud APIs Matching using privacy-enhancing technologies
Learn statistics by analyzing professional basketball data! In this action-packed book, you’ll build your skills in exploratory data analysis by digging into the fascinating world of NBA games and player stats using the R language. Statistics Slam Dunk is an engaging how-to guide for statistical analysis with R. Each chapter contains an end-to-end data science or statistics project delving into NBA data and revealing real-world sporting insights. Written by a former basketball player turned business intelligence and analytics leader, you’ll get practical experience tidying, wrangling, exploring, testing, modeling, and otherwise analyzing data with the best and latest R packages and functions. In Statistics Slam Dunk you’ll develop a toolbox of R programming skills including: Reading and writing data Installing and loading packages Transforming, tidying, and wrangling data Applying best-in-class exploratory data analysis techniques Creating compelling visualizations Developing supervised and unsupervised machine learning algorithms Executing hypothesis tests, including t-tests and chi-square tests for independence Computing expected values, Gini coefficients, z-scores, and other measures If you’re looking to switch to R from another language, or trade base R for tidyverse functions, this book is the perfect training coach. Much more than a beginner’s guide, it teaches statistics and data science methods that have tons of use cases. And just like in the real world, you’ll get no clean pre-packaged data sets in Statistics Slam Dunk. You’ll take on the challenge of wrangling messy data to drill on the skills that will make you the star player on any data team. About the Technology Statistics Slam Dunk is a data science manual with a difference. Each chapter is a complete, self-contained statistics or data science project for you to work through—from importing data, to wrangling it, testing it, visualizing it, and modeling it. Throughout the book, you’ll work exclusively with NBA data sets and the R language, applying best-in-class statistics techniques to reveal fun and fascinating truths about the NBA. About the Book Is losing basketball games on purpose a rational strategy? Which hustle statistics have an impact on wins and losses? Does spending more on player salaries translate into a winning record? You’ll answer all these questions and more. Plus, R’s visualization capabilities shine through in the book’s 300 plots and charts, including Pareto charts, Sankey diagrams, Cleveland dot plots, and dendrograms. What's Inside Transforming, tidying, and wrangling data Applying best-in-class exploratory data analysis techniques Developing supervised and unsupervised machine learning algorithms Executing hypothesis tests and effect size tests About the Reader For readers who know basic statistics. No advanced knowledge of R—or basketball—required. About the Author Gary Sutton is a former basketball player who has built and led high-performing business intelligence and analytics organizations across multiple verticals. Quotes In this journey of exploration, every computer scientist will find a valuable ally in understanding the language of data. - Kim Lokøy, areo Transcends other R titles by revealing the hidden narratives that lie within the numbers. - Christian Sutton, Shell International Exploration and Production Seamlessly blending theory and practical insights, this book serves as an indispensable guide for those venturing into the field of data analytics. - Juan Delgado, Sodexo BRS
How to Use SPSS® is designed with the novice computer user in mind and for people who have no previous experience using SPSS. Each chapter is divided into short sections that describe the statistic being used, important underlying assumptions, and how to interpret the results and express them in a research report.
Make some headway in the notoriously tough subject of business statistics Business Statistics For Dummies helps you understand the core concepts and principles of business statistics, and how they relate to the business world. This book tracks to a typical introductory course offered at the undergraduate, so you know you’ll find all the content you need to pass your class and get your degree. You’ll get an introduction to statistical problems and processes common to the world of global business and economics. Written in clear and simple language, Business Statistics For Dummies gives you an introduction to probability, sampling techniques and distributions, and drawing conclusions from data. You’ll also discover how to use charts and graphs to visualize the most important properties of a data set. Grasp the core concepts, principles, and methods of business statistics Learn tricky concepts with simplified explanations and illustrative graphs See how statistics applies in the real world, thanks to concrete examples Read charts and graphs for a better understanding of how businesses operate Business Statistics For Dummies is a lifesaver for students studying business at the college level. This guide is also useful for business professionals looking for a desk reference on this complicated topic.
Many books have been written about software testing, but most of them discuss the general framework of testing from a traditional perspective. Unfortunately, traditional test design techniques are often ineffective and unreliable for revealing the various kinds of faults that may occur. This book introduces three new software testing techniques: Two-Phase Model-Based Testing, the Action-State Testing, and the General Predicate Testing, all of which work best when applied with efficient fault revealing capabilities. You’ll start with a short recap of software testing, focusing on why risk analysis is obligatory, how to classify bugs practically, and how fault-based testing can be used for improving test design. You’ll then see how action-state testing merges the benefits of state transition testing and use case testing into a unified approach. Moving on you’ll look at general predicate testing and how it serves as an extension of boundary value analysis, encompassing morecomplex predicates. Two-phase model-based testing represents an advanced approach where the model does not necessarily need to be machine-readable; human readability suffices. The first phase involves a high-level model from which abstract tests are generated. Upon manual execution of these tests, the test code is generated. Rather than calculating output values, they are merely checked for conformity. The last part of this book contains a chapter on how developers and testers can help each other and work as a collaborative team. What You'll Learn Apply efficient test design techniques for detecting domain faults Work with modeling techniques that combine all the advantages of state transition testing and uses case testing Grasp the two-phase model-based testing technique Use test design efficiently to find almost all the bugs in an application Who This Book Is For Software developers, QA engineers, and, business analysts
Learn Grafana 10.x is your essential guide to mastering the art of data visualization and monitoring through interactive dashboards. Whether you're starting from scratch or updating your knowledge to Grafana 10.x, this book walks you through installation, implementation, data transformation, and effective visualization techniques. What this Book will help me do Install and configure Grafana 10.x for real-time data visualization and analytics. Create and manage insightful dashboards with Grafana's enhanced features. Integrate Grafana with diverse data sources such as Prometheus, InfluxDB, and Elasticsearch. Set up dynamic templated dashboards and alerting systems for proactive monitoring. Implement Grafana's user authentication mechanisms for enhanced security. Author(s) None Salituro is a seasoned expert in data analytics and observability platforms with extensive experience working with time-series data using Grafana. Their practical teaching approach and passion for sharing insights make this book an invaluable resource for both newcomers and experienced users. Who is it for? This book is perfect for business analysts, data visualization enthusiasts, and developers interested in analyzing and monitoring time-series data. Whether you're a newcomer or have some background knowledge, this book offers accessible guidance and advanced tips suitable for all levels. If you're aiming to efficiently build and utilize Grafana dashboards, this is the book for you.
Bayesian optimization helps pinpoint the best configuration for your machine learning models with speed and accuracy. Put its advanced techniques into practice with this hands-on guide. In Bayesian Optimization in Action you will learn how to: Train Gaussian processes on both sparse and large data sets Combine Gaussian processes with deep neural networks to make them flexible and expressive Find the most successful strategies for hyperparameter tuning Navigate a search space and identify high-performing regions Apply Bayesian optimization to cost-constrained, multi-objective, and preference optimization Implement Bayesian optimization with PyTorch, GPyTorch, and BoTorch Bayesian Optimization in Action shows you how to optimize hyperparameter tuning, A/B testing, and other aspects of the machine learning process by applying cutting-edge Bayesian techniques. Using clear language, illustrations, and concrete examples, this book proves that Bayesian optimization doesn’t have to be difficult! You’ll get in-depth insights into how Bayesian optimization works and learn how to implement it with cutting-edge Python libraries. The book’s easy-to-reuse code samples let you hit the ground running by plugging them straight into your own projects. About the Technology In machine learning, optimization is about achieving the best predictions—shortest delivery routes, perfect price points, most accurate recommendations—in the fewest number of steps. Bayesian optimization uses the mathematics of probability to fine-tune ML functions, algorithms, and hyperparameters efficiently when traditional methods are too slow or expensive. About the Book Bayesian Optimization in Action teaches you how to create efficient machine learning processes using a Bayesian approach. In it, you’ll explore practical techniques for training large datasets, hyperparameter tuning, and navigating complex search spaces. This interesting book includes engaging illustrations and fun examples like perfecting coffee sweetness, predicting weather, and even debunking psychic claims. You’ll learn how to navigate multi-objective scenarios, account for decision costs, and tackle pairwise comparisons. What's Inside Gaussian processes for sparse and large datasets Strategies for hyperparameter tuning Identify high-performing regions Examples in PyTorch, GPyTorch, and BoTorch About the Reader For machine learning practitioners who are confident in math and statistics. About the Author Quan Nguyen is a research assistant at Washington University in St. Louis. He writes for the Python Software Foundation and has authored several books on Python programming. Quotes Using a hands-on approach, clear diagrams, and real-world examples, Quan lifts the veil off the complexities of Bayesian optimization. - From the Foreword by Luis Serrano, Author of Grokking Machine Learning This book teaches Bayesian optimization, starting from its most basic components. You’ll find enough depth to make you comfortable with the tools and methods and enough code to do real work very quickly. - From the Foreword by David Sweet, Author of Experimentation for Engineers Combines modern computational frameworks with visualizations and infographics you won’t find anywhere else. It gives readers the confidence to apply Bayesian optimization to real world problems! - Ravin Kumar, Google
In "Hands-On Web Scraping with Python," you'll learn how to harness the power of Python libraries to extract, process, and analyze data from the web. This book provides a practical, step-by-step guide for beginners and data enthusiasts alike. What this Book will help me do Master the use of Python libraries like requests, lxml, Scrapy, and Beautiful Soup for web scraping. Develop advanced techniques for secure browsing and data extraction using APIs and Selenium. Understand the principles behind regex and PDF data parsing for comprehensive scraping. Analyze and visualize data using data science tools such as Pandas and Plotly. Build a portfolio of real-world scraping projects to demonstrate your capabilities. Author(s) Anish Chapagain, the author of "Hands-On Web Scraping with Python," is an experienced programmer and instructor who specializes in Python and data-related technologies. With his vast experience in teaching individuals from diverse backgrounds, Anish approaches complex concepts with clarity and a hands-on methodology. Who is it for? This book is perfect for aspiring data scientists, Python beginners, and anyone who wants to delve into web scraping. Readers should have a basic understanding of how websites work but no prior coding experience is required. If you aim to develop scraping skills and understand data analysis, this book is the ideal starting point.
Building Statistical Models in Python is your go-to guide for mastering statistical modeling techniques using Python. By reading this book, you will explore how to use Python libraries like stats models and others to tackle tasks such as regression, classification, and time series analysis. What this Book will help me do Develop a deep practical knowledge of statistical concepts and their implementation in Python. Create regression and classification models to solve real-world problems. Gain expertise analyzing time series data and generating valuable forecasts. Learn to perform hypothesis verification to interpret data correctly. Understand survival analysis and apply it in various industry scenarios. Author(s) Huy Hoang Nguyen, Paul N Adams, and Stuart J Miller bring their extensive expertise in data science and Python programming to the table. With years of professional experience in both industry and academia, they aim to make statistical modeling approachable and applicable. Combining technical depth with hands-on coding, their goal is to ensure readers not only understand the theory but also gain confidence in its application. Who is it for? This book is tailored for beginners and intermediate programmers seeking to learn statistical modeling without a prerequisite in mathematics. It's ideal for data analysts, data scientists, and Python enthusiasts who want to leverage statistical models to gain insights from data. With this book, you will journey from the basics to advanced applications, making it perfect for those who aim to master statistical analysis.
The ultimate guide to data visualization and information design for business. Making good charts is a must-have skill for managers today. The vast amount of data that drives business isn't useful if you can't communicate the valuable ideas contained in that data—the threats, the opportunities, the hidden trends, the future possibilities. But many think that data visualization is too difficult—a specialist skill that's either the province of data scientists and complex software packages or the domain of professional designers and their visual creativity. Not so. Anyone can learn to produce quality "dataviz" and, more broadly, clear and effective information design. Good Charts will show you how to do it. In this updated and expanded edition, dataviz expert Scott Berinato provides all you need for turning those ordinary charts kicked out of a spreadsheet program into extraordinary visuals that captivate and persuade your audience and for transforming presentations that seem like a mishmash of charts and bullet points into clear, effective, persuasive storytelling experiences. Good Charts shows how anyone who invests a little time getting better at visual communication can create an outsized impact—both in their career and in their organization. You will learn: A framework for getting to better charts in just a few minutes Design techniques that immediately make your visuals clearer and more persuasive The building blocks of storytelling with your data How to build teams to bring visual communication skills into your organization and culture This new edition of Good Charts not only provides new visuals and updated concepts but adds an entirely new chapter on building teams around the visualization part of a data science operation and creating workflows to integrate visualization into everything you do. Graphics that merely present information won't cut it anymore. Make Good Charts your go-to resource for turning plain, uninspiring charts and presentations into smart, effective visualizations and stories that powerfully convey ideas.
This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.
M-STATISTICS A comprehensive resource providing new statistical methodologies and demonstrating how new approaches work for applications M-statistics introduces a new approach to statistical inference, redesigning the fundamentals of statistics, and improving on the classical methods we already use. This book targets exact optimal statistical inference for a small sample under one methodological umbrella. Two competing approaches are offered: maximum concentration (MC) and mode (MO) statistics combined under one methodological umbrella, which is why the symbolic equation M=MC+MO. M-statistics defines an estimator as the limit point of the MC or MO exact optimal confidence interval when the confidence level approaches zero, the MC and MO estimator, respectively. Neither mean nor variance plays a role in M-statistics theory. Novel statistical methodologies in the form of double-sided unbiased and short confidence intervals and tests apply to major statistical parameters: Exact statistical inference for small sample sizes is illustrated with effect size and coefficient of variation, the rate parameter of the Pareto distribution, two-sample statistical inference for normal variance, and the rate of exponential distributions. M-statistics is illustrated with discrete, binomial, and Poisson distributions. Novel estimators eliminate paradoxes with the classic unbiased estimators when the outcome is zero. Exact optimal statistical inference applies to correlation analysis including Pearson correlation, squared correlation coefficient, and coefficient of determination. New MC and MO estimators along with optimal statistical tests, accompanied by respective power functions, are developed. M-statistics is extended to the multidimensional parameter and illustrated with the simultaneous statistical inference for the mean and standard deviation, shape parameters of the beta distribution, the two-sample binomial distribution, and finally, nonlinear regression. Our new developments are accompanied by respective algorithms and R codes, available at GitHub, and as such readily available for applications. M-statistics is suitable for professionals and students alike. It is highly useful for theoretical statisticians and teachers, researchers, and data science analysts as an alternative to classical and approximate statistical inference.
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Learn the core principles and benefits of data observability Use data observability to detect, troubleshoot, and prevent data issues Follow the book's recipes to implement observability in your data projects Use data observability to create a trustworthy communication framework with data consumers Learn how to educate your peers about the benefits of data observability
DATA WRANGLING Written and edited by some of the world’s top experts in the field, this exciting new volume provides state-of-the-art research and latest technological breakthroughs in data wrangling, its theoretical concepts, practical applications, and tools for solving everyday problems. Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis. This process typically includes manually converting and mapping data from one raw form into another format to allow for more convenient consumption and organization of the data. Data wrangling is increasingly ubiquitous at today’s top firms. Data cleaning focuses on removing inaccurate data from your data set whereas data wrangling focuses on transforming the data’s format, typically by converting “raw” data into another format more suitable for use. Data wrangling is a necessary component of any business. Data wrangling solutions are specifically designed and architected to handle diverse, complex data at any scale, including many applications, such as Datameer, Infogix, Paxata, Talend, Tamr, TMMData, and Trifacta. This book synthesizes the processes of data wrangling into a comprehensive overview, with a strong focus on recent and rapidly evolving agile analytic processes in data-driven enterprises, for businesses and other enterprises to use to find solutions for their everyday problems and practical applications. Whether for the veteran engineer, scientist, or other industry professional, this book is a must have for any library.
The concept of deep machine learning is easier to understand by paying attention to the cyclic stochastic time series and a time series whose content is non-stationary not only within the cycles, but also over the cycles as the cycle-to-cycle variations.
Next-Generation Sequencing Data Analysis walks readers through NGS data analysis step-by-step for a wide range of NGS applications.