talk-data.com talk-data.com

Event

O'Reilly Data Science Books

2013-08-09 – 2026-02-25 Oreilly Visit website ↗

Activities tracked

324

Collection of O'Reilly books on Data Science.

Filtering by: Data Science ×

Sessions & talks

Showing 76–100 of 324 · Newest first

Search within this event →
Data Mining and Predictive Analytics for Business Decisions

With many recent advances in data science, we have many more tools and techniques available for data analysts to extract information from data sets. This book will assist data analysts to move up from simple tools such as Excel for descriptive analytics to answer more sophisticated questions using machine learning. Most of the exercises use R and Python, but rather than focus on coding algorithms, the book employs interactive interfaces to these tools to perform the analysis. Using the CRISP-DM data mining standard, the early chapters cover conducting the preparatory steps in data mining: translating business information needs into framed analytical questions and data preparation. The Jamovi and the JASP interfaces are used with R and the Orange3 data mining interface with Python. Where appropriate, Voyant and other open-source programs are used for text analytics. The techniques covered in this book range from basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics. Includes companion files with case study files, solution spreadsheets, data sets and charts, etc. from the book. Features: Covers basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics Uses R, Python, Jamovi and JASP interfaces, and the Orange3 data mining interface Includes companion files with the case study files from the book, solution spreadsheets, data sets, etc.

R All-in-One For Dummies

A deep dive into the programming language of choice for statistics and data With R All-in-One For Dummies, you get five mini-books in one, offering a complete and thorough resource on the R programming language and a road map for making sense of the sea of data we're all swimming in. Maybe you're pursuing a career in data science, maybe you're looking to infuse a little statistics know-how into your existing career, or maybe you're just R-curious. This book has your back. Along with providing an overview of coding in R and how to work with the language, this book delves into the types of projects and applications R programmers tend to tackle the most. You'll find coverage of statistical analysis, machine learning, and data management with R. Grasp the basics of the R programming language and write your first lines of code Understand how R programmers use code to analyze data and perform statistical analysis Use R to create data visualizations and machine learning programs Work through sample projects to hone your R coding skill This is an excellent all-in-one resource for beginning coders who'd like to move into the data space by knowing more about R.

Pandas for Everyone: Python Data Analysis, 2nd Edition

Manage and Automate Data Analysis with Pandas in Python Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple data sets. Pandas for Everyone, 2nd Edition, brings together practical knowledge and insight for solving real problems with Pandas, even if youre new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world data science problems such as using regularization to prevent data overfitting, or when to use unsupervised machine learning methods to find the underlying structure in a data set. New features to the second edition include: Extended coverage of plotting and the seaborn data visualization library Expanded examples and resources Updated Python 3.9 code and packages coverage, including statsmodels and scikit-learn libraries Online bonus material on geopandas, Dask, and creating interactive graphics with Altair Chen gives you a jumpstart on using Pandas with a realistic data set and covers combining data sets, handling missing data, and structuring data sets for easier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes. Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability and introduces you to the wider Python data analysis ecosystem. Work with DataFrames and Series, and import or export data Create plots with matplotlib, seaborn, and pandas Combine data sets and handle missing data Reshape, tidy, and clean data sets so theyre easier to work with Convert data types and manipulate text strings Apply functions to scale data manipulations Aggregate, transform, and filter large data sets with groupby Leverage Pandas advanced date and time capabilities Fit linear models using statsmodels and scikit-learn libraries Use generalized linear modeling to fit models with different response variables Compare multiple models to select the best one Regularize to overcome overfitting and improve performance Use clustering in unsupervised machine learning ...

Python Data Science Handbook, 2nd Edition

Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all—IPython, NumPy, pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you'll learn how: IPython and Jupyter provide computational environments for scientists using Python NumPy includes the ndarray for efficient storage and manipulation of dense data arrays Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data Matplotlib includes capabilities for a flexible range of data visualizations Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms

The Art of Data-Driven Business

Learn how to integrate data-driven methodologies and machine learning into your business decision-making processes with 'The Art of Data-Driven Business.' This comprehensive guide shows you how to apply Python-based machine learning techniques to real-world challenges, transforming your organization into an innovative and well-informed enterprise. What this Book will help me do Create professional-quality data visualizations using Python's seaborn library to derive business insights. Analyze customer behavior, including predicting churn, with machine learning techniques. Apply clustering algorithms to segment customers for targeted marketing campaigns. Utilize pandas effectively for pricing and sales analytics to optimize your pricing strategies. Forecast outcomes of promotional strategies to determine costs and benefits and maximize performance. Author(s) None Palacio is an experienced data scientist and educator who specializes in the application of machine learning to solve business problems. With extensive real-world industry experience, Palacio brings practical insights and methodologies to learners. Their teaching connects technical knowledge to actionable business strategies. Who is it for? This book is ideal for business professionals aiming to incorporate data science into their strategies and technical experts seeking to leverage machine learning for business scenarios. Beginners to Python can find foundational help, while data scientists will appreciate the focused practical applications. It's perfect for individuals seeking a strong data-driven perspective in marketing, sales, and customer management.

Fuzzy Computing in Data Science

FUZZY COMPUTING IN DATA SCIENCE This book comprehensively explains how to use various fuzzy-based models to solve real-time industrial challenges. The book provides information about fundamental aspects of the field and explores the myriad applications of fuzzy logic techniques and methods. It presents basic conceptual considerations and case studies of applications of fuzzy computation. It covers the fundamental concepts and techniques for system modeling, information processing, intelligent system design, decision analysis, statistical analysis, pattern recognition, automated learning, system control, and identification. The book also discusses the combination of fuzzy computation techniques with other computational intelligence approaches such as neural and evolutionary computation. Audience Researchers and students in computer science, artificial intelligence, machine learning, big data analytics, and information and communication technology.

Beginning MATLAB and Simulink: From Beginner to Pro

Employ essential tools and functions of the MATLAB and Simulink packages, which are explained and demonstrated via interactive examples and case studies. This revised edition covers features from the latest MATLAB 2022b release, as well as other features that have been released since the first edition published. This book contains dozens of simulation models and solved problems via m-files/scripts and Simulink models which will help you to learn programming and modelling essentials. You’ll become efficient with many of the built-in tools and functions of MATLAB/Simulink while solving engineering and scientific computing problems. Beginning MATLAB and Simulink, Second Edition explains various practical issues of programming and modelling in parallel by comparing MATLAB and Simulink. After studying and using this book, you'll be proficient at using MATLAB and Simulink and applying the source code and models from the book's examples as templates for your own projects in data science or engineering. What You Will Learn Master the programming and modelling essentials of MATLAB and Simulink Carry out data visualization with MATLAB Build a GUI and develop App with MATLAB Work with integration and numerical root finding methods Apply MATLAB to differential equations-based models and simulations Use MATLAB and Simulink for data science projects Who This Book Is For Engineers, programmers, data scientists, and students majoring in engineering and scientific computing who are new to MATLAB and Simulink.

R 4 Data Science Quick Reference: A Pocket Guide to APIs, Libraries, and Packages

In this handy, quick reference book you'll be introduced to several R data science packages, with examples of how to use each of them. All concepts will be covered concisely, with many illustrative examples using the following APIs: readr, dibble, forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2, modelr, and more. With R 4 Data Science Quick Reference, you'll have the code, APIs, and insights to write data science-based applications in the R programming language. You'll also be able to carry out data analysis. All source code used in the book is freely available on GitHub.. What You'll Learn Implement applicable R 4 programming language specification features Import data with readr Work with categories using forcats, time and dates with lubridate, and strings with stringr Format data using tidyr and then transform that data using magrittr and dplyr Write functions with R for data science, data mining, and analytics-based applications Visualize data with ggplot2 and fit data to models using modelr Who This Book Is For Programmers new to R's data science, data mining, and analytics packages. Some prior coding experience with R in general is recommended.

Mathematical Foundations of Data Science Using R, 2nd Edition

The aim of the book is to help students become data scientists. Since this requires a series of courses over a considerable period of time, the book intends to accompany students from the beginning to an advanced understanding of the knowledge and skills that define a modern data scientist. The book presents a comprehensive overview of the mathematical foundations of the programming language R and of its applications to data science.

Data Science and Analytics for SMEs: Consulting, Tools, Practical Use Cases

Master the tricks and techniques of business analytics consulting, specifically applicable to small-to-medium businesses (SMEs). Written to help you hone your business analytics skills, this book applies data science techniques to help solve problems and improve upon many aspects of a business' operations. SMEs are looking for ways to use data science and analytics, and this need is becoming increasingly pressing with the ongoing digital revolution. The topics covered in the books will help to provide the knowledge leverage needed for implementing data science in small business. The demand of small business for data analytics are in conjunction with the growing number of freelance data science consulting opportunities; hence this book will provide insight on how to navigate this new terrain. This book uses a do-it-yourself approach to analytics and introduces tools that are easily available online and are non-programming based. Data science will allow SMEs to understand their customer loyalty, market segmentation, sales and revenue increase etc. more clearly. Data Science and Analytics for SMEs is particularly focused on small businesses and explores the analytics and data that can help them succeed further in their business. What You'll Learn Create and measure the success of their analytics project Start your business analytics consulting career Use solutions taught in the book in practical uses cases and problems Who This Book Is For Business analytics enthusiasts who are not particularly programming inclined, small business owners and data science consultants, data science and business students, and SME (small-to-medium enterprise) analysts

Practical Linear Algebra for Data Science

If you want to work in any computational or technical field, you need to understand linear algebra. As the study of matrices and operations acting upon them, linear algebra is the mathematical basis of nearly all algorithms and analyses implemented in computers. But the way it's presented in decades-old textbooks is much different from how professionals use linear algebra today to solve real-world modern applications. This practical guide from Mike X Cohen teaches the core concepts of linear algebra as implemented in Python, including how they're used in data science, machine learning, deep learning, computational simulations, and biomedical data processing applications. Armed with knowledge from this book, you'll be able to understand, implement, and adapt myriad modern analysis methods and algorithms. Ideal for practitioners and students using computer technology and algorithms, this book introduces you to: The interpretations and applications of vectors and matrices Matrix arithmetic (various multiplications and transformations) Independence, rank, and inverses Important decompositions used in applied linear algebra (including LU and QR) Eigendecomposition and singular value decomposition Applications including least-squares model fitting and principal components analysis

Comet for Data Science

Discover how to manage and optimize the life cycle of your data science projects with Comet! By the end of this book, you will master preparing, analyzing, building, and deploying models, as well as integrating Comet into your workflow. What this Book will help me do Master managing data science workflows with Comet. Confidently prepare and analyze your data for effective modeling. Deploy and monitor machine learning models using Copet tools. Integrate Comet with DevOps and GitLab workflows for production readiness. Apply Comet to advanced topics like NLP, deep learning, and time series analysis. Author(s) Angelica Lo Duca is an experienced author and data scientist with years of expertise in data science workflows and tools. She brings practical insights into integrating platforms like Comet into modern data science tasks. Who is it for? If you are a data science practitioner or programmer looking to understand and implement efficient project lifecycles using Comet, this book is tailored for you. A basic backdrop in data science and programming is highly recommended, but prior expertise in Comet is unnecessary.

Hands-On Healthcare Data

Healthcare is the next frontier for data science. Using the latest in machine learning, deep learning, and natural language processing, you'll be able to solve healthcare's most pressing problems: reducing cost of care, ensuring patients get the best treatment, and increasing accessibility for the underserved. But first, you have to learn how to access and make sense of all that data. This book provides pragmatic and hands-on solutions for working with healthcare data, from data extraction to cleaning and harmonization to feature engineering. Author Andrew Nguyen covers specific ML and deep learning examples with a focus on producing high-quality data. You'll discover how graph technologies help you connect disparate data sources so you can solve healthcare's most challenging problems using advanced analytics. You'll learn: Different types of healthcare data: electronic health records, clinical registries and trials, digital health tools, and claims data The challenges of working with healthcare data, especially when trying to aggregate data from multiple sources Current options for extracting structured data from clinical text How to make trade-offs when using tools and frameworks for normalizing structured healthcare data How to harmonize healthcare data using terminologies, ontologies, and mappings and crosswalks

Effective Data Science Infrastructure

Simplify data science infrastructure to give data scientists an efficient path from prototype to production. In Effective Data Science Infrastructure you will learn how to: Design data science infrastructure that boosts productivity Handle compute and orchestration in the cloud Deploy machine learning to production Monitor and manage performance and results Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, Conda, and Docker Architect complex applications for multiple teams and large datasets Customize and grow data science infrastructure Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you’ll master scalable techniques for data storage, computation, experiment tracking, and orchestration that are relevant to companies of all shapes and sizes. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python. The author is donating proceeds from this book to charities that support women and underrepresented groups in data science. About the Technology Growing data science projects from prototype to production requires reliable infrastructure. Using the powerful new techniques and tooling in this book, you can stand up an infrastructure stack that will scale with any organization, from startups to the largest enterprises. About the Book Effective Data Science Infrastructure teaches you to build data pipelines and project workflows that will supercharge data scientists and their projects. Based on state-of-the-art tools and concepts that power data operations of Netflix, this book introduces a customizable cloud-based approach to model development and MLOps that you can easily adapt to your company’s specific needs. As you roll out these practical processes, your teams will produce better and faster results when applying data science and machine learning to a wide array of business problems. What's Inside Handle compute and orchestration in the cloud Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, AWS, and the Python data ecosystem Architect complex applications that require large datasets and models, and a team of data scientists About the Reader For infrastructure engineers and engineering-minded data scientists who are familiar with Python. About the Author At Netflix, Ville Tuulos designed and built Metaflow, a full-stack framework for data science. Currently, he is the CEO of a startup focusing on data science infrastructure. Quotes By reading and referring to this book, I’m confident you will learn how to make your machine learning operations much more efficient and productive. - From the Foreword by Travis Oliphant, Author of NumPy, Founder of Anaconda, PyData, and NumFOCUS Effective Data Science Infrastructure is a brilliant book. It’s a must-have for every data science team. - Ninoslav Cerkez, Logit More data science. Less headaches. - Dr. Abel Alejandro Coronado Iruegas, National Institute of Statistics and Geography of Mexico Indispensable. A copy should be on every data engineer’s bookshelf. - Matthew Copple, Grand River Analytics

Python for Data Science

Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. Python for Data Science introduces you to the Pythonic world of data analysis with a learn-by-doing approach rooted in practical examples and hands-on activities. Youâ??ll learn how to write Python code to obtain, transform, and analyze data, practicing state-of-the-art data processing techniques for use cases in business management, marketing, and decision support. You will discover Pythonâ??s rich set of built-in data structures for basic operations, as well as its robust ecosystem of open-source libraries for data science, including NumPy, pandas, scikit-learn, matplotlib, and more. Examples show how to load data in various formats, how to streamline, group, and aggregate data sets, and how to create charts, maps, and other visualizations. Later chapters go in-depth with demonstrations of real-world data applications, including using location data to power a taxi service, market basket analysis to identify items commonly purchased together, and machine learning to predict stock prices.

The Pandas Workshop

The Pandas Workshop offers a detailed journey into the world of data analysis using Python and the pandas library. Throughout the book, you'll build skills in accessing, transforming, visualizing, and modeling data, all while focusing on real-world data science challenges. You will gain the knowledge and confidence needed to dissect and derive insights from complex datasets. What this Book will help me do Understand how to access and load data from various formats including databases and web-based sources. Manipulate and transform data for analysis using efficient pandas techniques. Create insightful visualizations using Matplotlib integrated with pandas for clearer data presentation. Build predictive and descriptive data models and glean data-driven insights. Handle and analyze time-series data to uncover trends and seasonal effects in data patterns. Author(s) Blaine Bateman, Saikat Basak, Thomas Joseph, and William So collectively bring diverse expertise in data analysis, programming, and teaching. Their goal is to make cutting-edge data science techniques accessible through clear explanations and practical exercises, helping learners from varied backgrounds master the pandas library. Who is it for? This book is best suited for novice to intermediate programmers and data enthusiasts who are already familiar with Python but are new to the pandas library. Ideal readers are those interested in honing their skills in data analysis and visualization, as well as leveraging data for informed decision-making. Whether you're an analyst, aspiring data scientist, or business professional seeking to strengthen your analytical toolkit, this book provides beneficial insights and techniques.

Building Data Science Solutions with Anaconda

Explore the comprehensive world of data science with "Building Data Science Solutions with Anaconda." This book covers essential topics like managing environments with Anaconda, detecting and overcoming bias, and ensuring model interpretability. Delve into practical tools and solutions, all explained in an approachable way to help you become proficient in data science workflows. What this Book will help me do Master environment management for data science projects using Anaconda and conda. Detect and mitigate dataset biases to ensure fair and ethical machine learning models. Learn advanced data science techniques with tools like NumPy, pandas, and Jupyter Notebooks. Understand and explain your machine learning models using LIME and SHAP. Grow your expertise in selecting and fine-tuning AI/ML algorithms for diverse applications. Author(s) None Meador combines extensive expertise in data science with a thorough understanding of Anaconda tools and open source software. With a background in engineering and AI model management, None provides an insightful perspective on the field. Their practical and analogy-driven approach makes technical concepts accessible to learners of any level. Who is it for? This book is ideal for data analysts, aspiring machine learning engineers, and data science professionals who wish to deepen their knowledge and make the most of Anaconda's capabilities. A prior understanding of Python and basic data science principles is assumed. If you're looking to optimize your data science workflows and gain hands-on practice, this book is for you.

The Kaggle Book

The Kaggle Book is an essential guide for anyone aiming to excel in data science through Kaggle competitions. With expert advice from Kaggle Grandmasters, you'll learn practical techniques for handling data, creating robust models, and improving your ranking in competitions. This book is packed with insights on advanced topics like ensembling, validation, and evaluation metrics. What this Book will help me do Master the Kaggle platform, including its Notebooks, Datasets, and Discussion capabilities. Enhance model performance using techniques like feature engineering, AutoML, and ensembling strategies. Apply advanced validation schemes to improve the reliability of your predictions. Tackle diverse competition types, including NLP, computer vision, and optimization challenges. Build a professional portfolio to showcase your data science expertise and attract career opportunities. Author(s) Konrad Banachewicz and Luca Massaron, authoritative Kaggle Grandmasters, bring their wealth of experience in competitive data science to this book. They have collectively competed in numerous Kaggle challenges and possess deep insights into what differentiates successful Kagglers. Their guidance combines practicality with expertise, making this book a must-have for aspiring data scientists looking to make an impact. Who is it for? This book is tailored for data analysts and scientists interested in enhancing their Kaggle performance, as well as those new to Kaggle who wish to explore competitive data science. It suits individuals with basic knowledge of machine learning, aiming to develop and demonstrate their skills further. The content is valuable for practitioners aiming to build a professional profile or secure roles in the tech industry.

Data Science on the Google Cloud Platform, 2nd Edition

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline with cloud native tools on GCP. Throughout this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by building a data pipeline in your own project on GCP, and discover how to solve data science problems in a transformative and more collaborative way. You'll learn how to: Employ best practices in building highly scalable data and ML pipelines on Google Cloud Automate and schedule data ingest using Cloud Run Create and populate a dashboard in Data Studio Build a real-time analytics pipeline using Pub/Sub, Dataflow, and BigQuery Conduct interactive data exploration with BigQuery Create a Bayesian model with Spark on Cloud Dataproc Forecast time series and do anomaly detection with BigQuery ML Aggregate within time windows with Dataflow Train explainable machine learning models with Vertex AI Operationalize ML with Vertex AI Pipelines

Leading Data Science Teams

Compared to other functions of an organization, data science is highly speculative. Data science teams are often tasked with last-minute must-have deliverables that are well beyond their ability to produce. Data might be missing or have no signal, or the data models themselves might be impractical. This hands-on reference guides team leaders through the types of challenges you might face and the tools you need to work through them. Author Jacqueline Nolis, head of data science at Saturn Cloud, helps team leaders think through the various issues you'll encounter when running a data science team. You'll learn ways to set up your team, manage data scientists to promote their success, and collaborate with external stakeholders. Once you finish this report, you'll be ready to work through the challenges your current team faces or start a new data science team in an organization that needs one. Determine the scope of work before choosing your team of data scientists and support positions Successfully manage your relationship with stakeholders by providing your team with clear, achievable goals Create an environment to help data scientists and other team members succeed Choose a technical infrastructure for your team, including programming languages, databases, and deployment models

Reproducible Data Science with Pachyderm

Dive into the world of reproducible data science with Pachyderm, a specialized platform designed for version-controlled data pipelines. By following this book, 'Reproducible Data Science with Pachyderm,' you'll gain the skills to implement robust, scalable machine learning workflows with Pachyderm 2.0, covering setup, integration, and advanced use cases. What this Book will help me do Build scalable, version-controlled data pipelines with Pachyderm's unique features. Understand the principles behind reproducible data science and implement them effectively. Deploy Pachyderm on AWS, Google Cloud, and Azure while integrating with popular tools. Create and manage end-to-end machine learning workflows, including hyperparameter tuning. Leverage advanced integrations, such as Pachyderm Notebooks and language clients like Python and Go. Author(s) Svetlana Karslioglu is a seasoned data scientist with extensive experience in constructing scalable machine learning and data processing systems. With years in both practical implementation and educational endeavors, she has a talent for breaking down complex concepts into accessible learning paths. Her approach is hands-on and results-oriented, aimed at empowering professionals to excel in the field of data science. Who is it for? This book is intended for data scientists, machine learning engineers, and data engineers who are keen to ensure reproducibility in their workflows. Ideal readers may have familiarity with data science basics and some exposure to Kubernetes and programming languages like Python. By studying the book, learners will establish confidence in implementing Pachyderm for scalable and reliable data pipelines.

What Is Causal Inference?

Causal inference lies at the heart of our ability to understand why things happen by helping us predict the results of our actions. This process is vital for businesses that aspire to turn data and information into valuable knowledge. With this report, data scientists and analysts will learn a principled way of thinking about causality, using a suite of causal inference techniques now available. Authors Hugo Bowne-Anderson, a data science consultant, and Mike Loukides, vice president of content strategy at O'Reilly Media, introduce causality and discuss randomized control trials (RCTs), key aspects of causal graph theory, and much-needed techniques from econometrics. You'll explore: Techniques from econometrics, including randomized control trials, the causality gold standard used in A/B-testing The constant-effects model for dealing with all things not being equal across the groups you're comparing Regression for dealing with confounding variables and selection bias Instrumental variables to estimate causal relationships in situations where regression won't work Techniques from causal graph theory including forks and colliders, the graphical tools for representing common causal patterns Backdoor and front-door adjustments for making causal inferences in the presence of confounders

Building an Effective Data Science Practice: A Framework to Bootstrap and Manage a Successful Data Science Practice

Gain a deep understanding of data science and the thought process needed to solve problems in that field using the required techniques, technologies and skills that go into forming an interdisciplinary team. This book will enable you to set up an effective team of engineers, data scientists, analysts, and other stakeholders that can collaborate effectively on crucial aspects such as problem formulation, execution of experiments, and model performance evaluation. You’ll start by delving into the fundamentals of data science – classes of data science problems, data science techniques and their applications – and gradually build up to building a professional reference operating model for a data science function in an organization. This operating model covers the roles and skills required in a team, the techniques and technologies they use, and the best practices typically followed in executing data science projects. Building an Effective Data Science Practice provides a common base of reference knowledge and solutions, and addresses the kinds of challenges that arise to ensure your data science team is both productive and aligned with the business goals from the very start. Reinforced with real examples, this book allows you to confidently determine the strategic answers to effectively align your business goals with the operations of the data science practice. What You’ll Learn Transform business objectives into concrete problems that can be solved using data science Evaluate how problems and the specifics of a business drive the techniques and model evaluation guidelines used in a project Build and operate an effective interdisciplinary data science team within an organization Evaluating the progress of the team towards the business RoI Understand the important regulatory aspects that are applicable to a data science practice Who This Book Is For Technology leaders, data scientists, and project managers

How to Lead in Data Science

A field guide for the unique challenges of data science leadership, filled with transformative insights, personal experiences, and industry examples. In How To Lead in Data Science you will learn: Best practices for leading projects while balancing complex trade-offs Specifying, prioritizing, and planning projects from vague requirements Navigating structural challenges in your organization Working through project failures with positivity and tenacity Growing your team with coaching, mentoring, and advising Crafting technology roadmaps and championing successful projects Driving diversity, inclusion, and belonging within teams Architecting a long-term business strategy and data roadmap as an executive Delivering a data-driven culture and structuring productive data science organizations How to Lead in Data Science is full of techniques for leading data science at every seniority level—from heading up a single project to overseeing a whole company's data strategy. Authors Jike Chong and Yue Cathy Chang share hard-won advice that they've developed building data teams for LinkedIn, Acorns, Yiren Digital, large asset-management firms, Fortune 50 companies, and more. You'll find advice on plotting your long-term career advancement, as well as quick wins you can put into practice right away. Carefully crafted assessments and interview scenarios encourage introspection, reveal personal blind spots, and highlight development areas. About the Technology Lead your data science teams and projects to success! To make a consistent, meaningful impact as a data science leader, you must articulate technology roadmaps, plan effective project strategies, support diversity, and create a positive environment for professional growth. This book delivers the wisdom and practical skills you need to thrive as a data science leader at all levels, from team member to the C-suite. About the Book How to Lead in Data Science shares unique leadership techniques from high-performance data teams. It’s filled with best practices for balancing project trade-offs and producing exceptional results, even when beginning with vague requirements or unclear expectations. You’ll find a clearly presented modern leadership framework based on current case studies, with insights reaching all the way to Aristotle and Confucius. As you read, you’ll build practical skills to grow and improve your team, your company’s data culture, and yourself. What's Inside How to coach and mentor team members Navigate an organization’s structural challenges Secure commitments from other teams and partners Stay current with the technology landscape Advance your career About the Reader For data science practitioners at all levels. About the Authors Dr. Jike Chong and Yue Cathy Chang build, lead, and grow high-performing data teams across industries in public and private companies, such as Acorns, LinkedIn, large asset-management firms, and Fortune 50 companies. Quotes Spot-on as a career resource! Captures what’s important to be successful as a data scientist. - Eric Colson, Former Data Executive at Stitch Fix, Netflix The first-of-its-kind book to discuss data science career development in a systematic way! Highly valuable and timely in a world that generates more and more data!” - Michael Li, VP of Data at Coinbase A valuable reference filled with new and useful coaching and techniques. A must-have. - Jesse Bridgewater, VP Data Science at Brightline, formerly Livongo, Twitter, eBay A great book providing frameworks and tools that help contemplate and address key problems faced by data science leaders. - Ron Kohavi, Best-selling Author, Former Executive at Airbnb, Microsoft, Amazon