talk-data.com talk-data.com

Topic

Scikit-learn

machine_learning data_science data_analysis

2

tagged

Activity Trend

6 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Guillaume Lemaitre ×
Skrub: machine learning for dataframes

Skrub is an open source package that simplifies machine-learning with dataframes by providing a variety of tools to explore, prepare and feature-engineer dataframes so they can be integrated into scikit-learn pipelines. Skrub DataOps allow to build extensive, multi-table wrangling plans, explore hyperparameter spaces, and export the resulting objects for deployment. The talk showcases various use cases where skrub can simplify the job of a data scientist from data preparation to deployment, through code examples and demonstrations.

In this talk, we provide an update on the latest scikit-learn features that have been implemented in versions 1.4 and 1.5. We will particularly discuss the following features:

  • the metadata routing API allowing to pass metadata around estimators;
  • the TunedThresholdClassifierCV allowing to tuned operational decision through custom metric;
  • better support for categorical features and missing values;
  • interoperability of array and dataframe.