talk-data.com talk-data.com

Event

PyData Paris 2025

2025-09-01 – 2025-10-02 PyData

Activities tracked

97

Filtering by: AI/ML ×

Sessions & talks

Showing 76–97 of 97 · Newest first

Search within this event →

Réinventer l’allocation des coûts: quand la qualité des données rencontre l’IA générative dans Alteryx

2025-10-01
Face To Face
A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

2025-10-01 Watch
talk

Every dataset has a story — and when it comes to geospatial data, it’s a story deeply rooted in space and scale. But working with geospatial information is often a hidden challenge: massive file sizes, strange formats, projections, and pipelines that don't scale easily.

In this talk, we'll follow the life of a real-world geospatial dataset, from its raw collection in the field to its transformation into meaningful insights. Along the way, we’ll uncover the key steps of building a robust, scalable open-source geospatial pipeline.

Drawing on years of experience at Camptocamp, we’ll explore:

  • How raw spatial data is ingested and cleaned
  • How vector and raster data are efficiently stored and indexed (PostGIS, Cloud Optimized GeoTIFFs, Zarr)
  • How modern tools like Dask, GeoServer, and STAC (SpatioTemporal Asset Catalogs) help process and serve geospatial data
  • How to design pipelines that handle both "small data" (local shapefiles) and "big data" (terabytes of satellite imagery)
  • Common pitfalls and how to avoid them when moving from prototypes to production

This journey will show how the open-source ecosystem has matured to make geospatial big data accessible — and how spatial thinking can enrich almost any data project, whether you are building dashboards, doing analytics, or setting the stage for machine learning later on.

Comment les IA génératives réorientent les discussions économiques et législatives en Europe?

2025-10-01
Face To Face

Comment les IA génératives réorientent les discussions économiques et législatives en Europe

Démo Risk Hunter

2025-10-01
Face To Face

Venez découvrir comment notre Plateforme modulable et AI Driven, SaaS et On premise, vous permet de gérer votre GRC et Cybersécurité.

IA@Horse Technologies : comment nous avons doublé notre capacité d’analyse qualité avec Altair Rapid Miner ?

2025-10-01
Face To Face

Grâce à la solution low-no code Altair RapidMiner, les experts qualité peuvent adapter l’algorithme sans programmation.

L'IA générative en entreprise : impact, risques et leviers d’actionL'IA générative en entreprise : impact, risques et leviers d’action

2025-10-01
Face To Face

Dans cette conférence, nous présentons l’IA générative et son impact sur les métiers et les organisations. Nous aborderons les opportunités

Probabilistic regression models: let's compare different modeling strategies and discuss how to evaluate them

2025-10-01
talk

Most common machine learning models (linear, tree-based or neural network-based), optimize for the least squares loss when trained for regression tasks. As a result, they output a point estimate of the conditional expected value of the target: E[y|X].

In this presentation, we will explore several ways to train and evaluate probabilistic regression models as a richer alternative to point estimates. Those models predict a richer description of the full distribution of y|X and allow us to quantify the predictive uncertainty for individual predictions.

On the model training part, we will introduce the following options:

  • ensemble of quantile regressors for a grid of quantile levels (using linear models or gradient boosted trees in scikit-learn, XGBoost and PyTorch),
  • how to reduce probabilistic regression to multi-class classification + a cumulative sum of the predict_proba output to recover a continuous conditional CDF.
  • how to implement this approach as a generic scikit-learn meta-estimator;
  • how this approach is used to pretrain foundational tabular models (e.g. TabPFNv2).
  • simple Bayesian models (e.g. Bayesian Ridge and Gaussian Processes);
  • more specialized approaches as implemented in XGBoostLSS.

We will also discuss how to evaluate probabilistic predictions via:

  • the pinball loss of quantile regressors,
  • other strictly proper scoring rules such as Continuous Ranked Probability Score (CRPS),
  • coverage measures and width of prediction intervals,
  • reliability diagrams for different quantile levels.

We will illustrate of those concepts with concrete examples and running code.

Finally, we will illustrate why some applications need such calibrated probabilistic predictions:

  • estimating uncertainty in trip times depending on traffic conditions to help a human decision make choose among various travel plan options.
  • modeling value at risk for investment decisions,
  • assessing the impact of missing variables for an ML model trained to work in degraded mode,
  • Bayesian optimization for operational parameters of industrial machines from little/costly observations.

If time allows, will also discuss usage and limitations of Conformal Quantile Regressors as implemented in MAPIE and contrast aleatoric vs epistemic uncertainty captured by those models.

Démos d'IA Agentique avec Thinkeo

2025-10-01
Face To Face

Découvrez comment les agents IA transforment la création de documents : - Créez des présentations complètes - Générez des rapports ...

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

2025-10-01 Watch
talk

Built on top of Software Heritage - the largest public archive of source code - the CodeCommons collaboration is building a large-scale, meta-data rich source code dataset designed to make training AI models on code more transparent, sustainable, and fair. Code will be enriched with contextual information such as issues, pull request discussions, licensing data, and provenance. In this presentation, we will present the goals and structure of both Software Heritage and CodeCommons projects, and discuss our particular contribution to CodeCommon's big data infrastructure.

Enhancing Machine Learning Workflows with skore

Enhancing Machine Learning Workflows with skore

2025-10-01 Watch
talk

Discover how skore, a new-born open-source Python library, can elevate your machine learning projects by integrating recommended practices and avoiding common pitfalls. This talk will introduce skore's key features and demonstrate how it can streamline your model evaluation and diagnostics processes.

Venez voir QDA Miner & WordStat 2025 en action !

2025-10-01
Face To Face

Gardez le contrôle sur l'IA: choisissez votre moteur (OpenAI, Gemini, etc.) et personnalisez les prompts pour une transparence totale.

Skrub: machine learning for dataframes

Skrub: machine learning for dataframes

2025-10-01 Watch
talk

Skrub is an open source package that simplifies machine-learning with dataframes by providing a variety of tools to explore, prepare and feature-engineer dataframes so they can be integrated into scikit-learn pipelines. Skrub DataOps allow to build extensive, multi-table wrangling plans, explore hyperparameter spaces, and export the resulting objects for deployment. The talk showcases various use cases where skrub can simplify the job of a data scientist from data preparation to deployment, through code examples and demonstrations.

Building Data Science Tools for Sustainable Transformation

Building Data Science Tools for Sustainable Transformation

2025-10-01 Watch
talk

The current AI hype, driven by generative AI and particularly large language models, is creating excitement, fear, and inflated expectations. In this keynote, we'll explore geographic & mobility data science tools (such as GeoPandas and MovingPandas) to transform this hype into sustainable and positive development that empowers users.

Big ideas shaping scientific Python: the quest for performance and usability

Big ideas shaping scientific Python: the quest for performance and usability

2025-09-30 Watch
talk

Behind every technical leap in scientific Python lies a human ecosystem of volunteers, companies, and institutions working in tension and collaboration. This keynote explores how innovation actually happens in open source, through the lens of recent and ongoing initiatives that aim to move the needle on performance and usability - from the ideas that went into NumPy 2.0 and its relatively smooth rollout to the ongoing efforts to leverage the performance GPUs offer without sacrificing maintainability and usability.

Takeaways for the audience: Whether you’re an ML engineer tired of debugging GPU-CPU inconsistencies, a researcher pushing Python to its limits, or an open-source maintainer seeking sustainable funding, this keynote will equip you with both practical solutions and a clear vision of where scientific Python is headed next.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

2025-09-30 Watch
talk

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

Modern Web Data Extraction: Techniques, Tools, Legal and Ethical Considerations

Modern Web Data Extraction: Techniques, Tools, Legal and Ethical Considerations

2025-09-30 Watch
talk

To satisfy the need for data in generative and traditional AI, in a rapidly evolving environment, the ability to efficiently extract data from the web has become indispensable for businesses and developers. This presentation delves into the methodology and tools of web crawling and web scraping, with an overview of the ethical and legal side of the process, including the best practices on how to crawl politely and efficiently and use the data to not violate any privacy or intellectual property laws.

Optimal Transport in Python: A Practical Introduction with POT

Optimal Transport in Python: A Practical Introduction with POT

2025-09-30 Watch
talk

Optimal Transport (OT) is a powerful mathematical framework with applications in machine learning, statistics, and data science. This talk introduces the Python Optimal Transport toolbox (POT), an open-source library designed to efficiently solve OT problems. Attendees will learn the basics of OT, explore real-world use cases, and gain hands-on experience with POT (https://pythonot.github.io/) .

Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

2025-09-30 Watch
talk

Domain adaptation addresses the challenge of applying ML models to data that differs from the training distribution—a common issue in real-world applications. SKADA is a new Python library that brings domain adaptation tools to the sci-kit-learn and PyTorch ecosystem. This talk covers SKADA’s design, its integration with standard ML workflows, and how it helps practitioners build models that generalize better across domains.

Unlock the full predictive power of your multi-table data

Unlock the full predictive power of your multi-table data

2025-09-30 Watch
talk

While most machine learning tutorials and challenges focus on single-table datasets, real-world enterprise data is often distributed across multiple tables, such as customer logs, transaction records, or manufacturing logs. In this talk, we address the often-overlooked challenge of building predictive features directly from raw, multi-table data. You will learn how to automate feature engineering using a scalable, supervised, and overfit-resistant approach, grounded in information theory and available as a Python open-source library. The talk is aimed at data scientists and ML engineers working with structured data; basic machine learning knowledge is sufficient to follow.

Browser-based AI workflows in Jupyter

Browser-based AI workflows in Jupyter

2025-09-30 Watch
talk

JupyterLite brings Python and other programming languages to the browser, removing the need for a server. In this talk, we show how to extend it for AI workflows: connecting to remote models, running smaller models locally in the browser, and leveraging lightweight interfaces like a chat to interact with them.

Navigating the security compliance maze of an ML service

Navigating the security compliance maze of an ML service

2025-09-30 Watch
talk

While everyone is talking about the m(e/a)ss of bureaucracy, we want to show you hands-on what you could need to be doing to operate an ML service. We will give an overview of things like ISO-27001 certifications, Cyber Resilience Act or AIBOMs. We want to highlight their impact/intention and give advice on how integrate them into your development workflow.

This talk is written from a practiconer's perspective and will help you set up your project to make your compliance department happy. It isn't meant as a deep-dive into the individual standards.

Démo sur stand

2025-09-01
Face To Face

Pour tester nos applications IA : Amplify, Campaign Companion, Score on the fly sur notre stand C16 et échangez avec nos experts Data & IA.