talk-data.com talk-data.com

Event

PyData Paris 2024

2024-09-25 – 2024-09-27 PyData

Activities tracked

8

Filtering by: API ×

Sessions & talks

Showing 1–8 of 8 · Newest first

Search within this event →

An update on the latest scikit-learn features

2024-09-26
talk

In this talk, we provide an update on the latest scikit-learn features that have been implemented in versions 1.4 and 1.5. We will particularly discuss the following features:

  • the metadata routing API allowing to pass metadata around estimators;
  • the TunedThresholdClassifierCV allowing to tuned operational decision through custom metric;
  • better support for categorical features and missing values;
  • interoperability of array and dataframe.

MLOps at Renault Group: A Generic Pipeline for Scalable Deployment

2024-09-26
talk

Scaling machine learning at large organizations like Renault Group presents unique challenges, in terms of scales, legal requirements, and diversity of use cases. Data scientists require streamlined workflows and automated processes to efficiently deploy models into production. We present an MLOps pipeline based on python Kubeflow and GCP Vertex AI API designed specifically for this purpose. It enables data scientists to focus on code development for pre-processing, training, evaluation, and prediction. This MLOPS pipeline is a cornerstone of the AI@Scale program, which aims to roll out AI across the Group.

We choose a Python-first approach, allowing Data scientists to focus purely on writing preprocessing or ML oriented Python code, also allowing data retrieval through SQL queries. The pipeline addresses key questions such as prediction type (batch or API), model versioning, resource allocation, drift monitoring, and alert generation. It favors faster time to market with automated deployment and infrastructure management. Although we encountered pitfalls and design difficulties, that we will discuss during the presentation, this pipeline integrates with a CI/CD process, ensuring efficient and automated model deployment and serving.

Finally, this MLOps solution empowers Renault data scientists to seamlessly translate innovative models into production, and smoothen the development of scalable, and impactful AI-driven solutions.

sktime - python toolbox for time series: next-generation AI – deep learning and foundation models

2024-09-26
talk

sktime is a widely used scikit-learn compatible library for learning with time series. sktime is easily extensible by anyone, and interoperable with the pydata/numfocus stack.

This talk presents progress, challenges, and newest features off the press, in extending the sktime framework to deep learning and foundation models.

Recent progress in generative AI and deep learning is leading to an ever-exploding number of popular “next generation AI” models for time series tasks like forecasting, classification, segmentation.

Particular challenges of the new AI ecosystem are inconsistent formal interfaces, different deep learning backends, vendor specific APIs and architectures which do not match sklearn-like patterns well – every practitioner who has tried to use at least two such models at the same time (outside sktime) will have their individual painful memories.

We show how sktime brings its unified interface architecture for time series modelling to the brave new AI frontier, using novel design patterns building on ideas from hugging face and scikit-learn, to provide modular, extensible building blocks with a simple specification language.

Visualization of the sky in Notebooks: the ipyaladin widget extension

2024-09-26
talk

Aladin allows to visualize images of the sky or planetary surfaces just as an astronomical "openstreetmap" app. The view can be panned and explored interactively. In the ipyaladin widget -- that brings Aladin in the Jupyter Notebook environnement -- these abilities are extended with a python API. The users can send astronomical data in standard formats back and forth the viewer and their Python code. Such data can be images of the sky in different wavelengths, but also tabular data, complex shapes that characterize telescope observation regions, or even special sky features (such as probability region for the provenance of a gravitational event).

With these already existing features, and current work we are doing with the new development framework anywidget, ipyaladin is really close to a version 1.0.0. It is already used in its beta version in different experimental science platforms, for example in the ESCAPE European Science Cluster of Astronomy & Particle Physics project and in the experimental SKA (Square Kilometre Array, a telescope for radio astronomy) analysis platform.

In this presentation, we will share our feedback on the development of a widget thanks to anywidget compared to the bare ipywidget framework. And we will demonstrate the functionalities of the widget through scientific use cases.

Polars Plugins: how you (yes, you!) can extend Polars

2024-09-25
talk

Polars is a dataframe library taking the world by storm. It is very runtime and memory efficient and comes with a clean and expressive API. Sometimes, however, the built-in API isn't enough. And that's where its killer feature comes in: plugins. You can extend Polars, and solve practically any problem.

No prior Rust experience required, intermediate Python and programming experience required. By the end of the talk, you will know how to write your own Polars Plugin! This talk is aimed at data practitioners.

Geoscience at Massive Scale

2024-09-25
talk

When scaling geoscience workloads to large datasets, many scientists and developers reach for Dask, a library for distributed computing that plugs seamlessly into Xarray and offers an Array API that wraps NumPy. Featuring a distributed environment capable of running your workload on large clusters, Dask promises to make it easy to scale from prototyping on your laptop to analyzing petabyte-scale datasets.

Dask has been the de-facto standard for scaling geoscience, but it hasn’t entirely lived up to its promise of operating effortlessly at massive scale. This comes up in a few ways: - Correctly chunking your dataset has a significant impact on Dask’s ability to scale - Workers accidentally run out of memory due to: - Data being loaded too eagerly - Rechunking - Unmanaged memory

Over the last few months, Dask has addressed many of those pains and continues to do so through: - Improvements to its scheduling algorithms - A faster and more memory-stable method for rechunking - First-of-its-kind logical optimization layer for a distributed array framework (ongoing)

Join us as we dive into real-world geoscience workloads, exploring how Dask empowers scientists and developers to run their analyses at massive scale. Discover the impact of improvements made to Dask, ongoing challenges, and future plans for making it truly effortless to scale from your laptop to the cloud.

Solara: Pure Python web apps beyond prototypes and dashboards

2024-09-25
talk

Many Python frameworks are suitable for creating basic dashboards or prototypes but struggle with more complex ones. Taking lessons from the JavaScript community, the experts on building UI’s, we created a new framework called Solara. Solara scales to much more complex apps and compute-intensive dashboards. Built on the Jupyter stack, Solara apps and its reusable components run in the Jupyter notebook and on its own production quality server based on Starlette/FastAPI.

Solara has a declarative API that is designed for dynamic and complex UIs yet is easy to write. Reactive variables power our state management, which automatically triggers rerenders. Our component-centric architecture stimulates code reusability, and hot reloading promotes efficient workflows. With our rich set of UI and data-focused components, Solara spans the entire spectrum from rapid prototyping to robust, complex dashboards.

Python 3.12's new monitoring and debugging API

2024-09-25
talk

Python 3.12 introduced a new low-impact monitoring API with PEP669, which can be used to implement far faster debuggers than ever before. This talk covers the main advantages of this API and how you can use it to develop small tools.