talk-data.com talk-data.com

Event

PyData Paris 2025

2025-09-01 – 2025-10-02 PyData

Activities tracked

4

Filtering by: PyTorch ×

Sessions & talks

Showing 1–4 of 4 · Newest first

Search within this event →
torchFastText: Modernizing Text Classification at Insee with PyTorch-based models

torchFastText: Modernizing Text Classification at Insee with PyTorch-based models

2025-10-01 Watch
talk

Discover how Insee transitioned from fastText to a PyTorch-based model for text classification by developing and open-sourcing the torchFastText package. This presentation will cover the creation, deployment, and practical applications of torchFastText in modernizing automatic coding systems, benefiting Insee and other European National Statistical Institutes (NSIs).

Probabilistic regression models: let's compare different modeling strategies and discuss how to evaluate them

2025-10-01
talk

Most common machine learning models (linear, tree-based or neural network-based), optimize for the least squares loss when trained for regression tasks. As a result, they output a point estimate of the conditional expected value of the target: E[y|X].

In this presentation, we will explore several ways to train and evaluate probabilistic regression models as a richer alternative to point estimates. Those models predict a richer description of the full distribution of y|X and allow us to quantify the predictive uncertainty for individual predictions.

On the model training part, we will introduce the following options:

  • ensemble of quantile regressors for a grid of quantile levels (using linear models or gradient boosted trees in scikit-learn, XGBoost and PyTorch),
  • how to reduce probabilistic regression to multi-class classification + a cumulative sum of the predict_proba output to recover a continuous conditional CDF.
  • how to implement this approach as a generic scikit-learn meta-estimator;
  • how this approach is used to pretrain foundational tabular models (e.g. TabPFNv2).
  • simple Bayesian models (e.g. Bayesian Ridge and Gaussian Processes);
  • more specialized approaches as implemented in XGBoostLSS.

We will also discuss how to evaluate probabilistic predictions via:

  • the pinball loss of quantile regressors,
  • other strictly proper scoring rules such as Continuous Ranked Probability Score (CRPS),
  • coverage measures and width of prediction intervals,
  • reliability diagrams for different quantile levels.

We will illustrate of those concepts with concrete examples and running code.

Finally, we will illustrate why some applications need such calibrated probabilistic predictions:

  • estimating uncertainty in trip times depending on traffic conditions to help a human decision make choose among various travel plan options.
  • modeling value at risk for investment decisions,
  • assessing the impact of missing variables for an ML model trained to work in degraded mode,
  • Bayesian optimization for operational parameters of industrial machines from little/costly observations.

If time allows, will also discuss usage and limitations of Conformal Quantile Regressors as implemented in MAPIE and contrast aleatoric vs epistemic uncertainty captured by those models.

Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

2025-09-30 Watch
talk

Domain adaptation addresses the challenge of applying ML models to data that differs from the training distribution—a common issue in real-world applications. SKADA is a new Python library that brings domain adaptation tools to the sci-kit-learn and PyTorch ecosystem. This talk covers SKADA’s design, its integration with standard ML workflows, and how it helps practitioners build models that generalize better across domains.

A Hitchhiker's Guide to the Array API Standard Ecosystem

A Hitchhiker's Guide to the Array API Standard Ecosystem

2025-09-30 Watch
talk

The array API standard is unifying the ecosystem of Python array computing, facilitating greater interoperability between code written for different array libraries, including NumPy, CuPy, PyTorch, JAX, and Dask.

But what are all of these "array-api-" libraries for? How can you use these libraries to 'future-proof' your libraries, and provide support for GPU and distributed arrays to your users? Find out in this talk, where I'll guide you through every corner of the array API standard ecosystem, explaining how SciPy and scikit-learn are using all of these tools to adopt the standard. I'll also be sharing progress updates from the past year, to give you a clear picture of where we are now, and what the future holds.