talk-data.com

Event

PyData Paris 2025

2025-09-01 – 2025-10-02 PyData

Activities tracked

Filtering by: Big Data ×

Top Speakers

Johan Mabille 2 Romain Clement 2 Tim Paine 1 Christophe Dervieux 1 David Brochart 1 Emanuele Fabbiani 1 Guillaume Lemaitre 1 Ian Thomas 1 Jeremy Tuloup 1 Justine BEL-LETOILE 1 Lex Avstreikh 1 Nicolas M. Thiéry 1

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →

Gouverner la donnée : une révolution culturelle et innovante pour une transformation réussie

2025-10-01

Face To Face

AI/ML Big Data

La gouvernance des données est une transformation culturelle clé à l’ère du Big Data et de l’IA. Cette conférence explore comment en faire u

Comment l’IA transforme-t-elle l’intelligence d’entreprise ?

2025-10-01

Face To Face

AI/ML Big Data

Découvrez comment agir dès aujourd’hui lors de notre démo session à Big Data & IA Paris.

Du Big Data à la vidéo instantanée

2025-10-01

Face To Face

Big Data

Du Big Data à la vidéo instantanée : l’expérience client réinventée par PULP'IN Découvrez comment générer des expériences data driven à gran

A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

2025-10-01 Watch

talk

Gravin Florent

AI/ML Analytics Big Data Cloud Computing

Every dataset has a story — and when it comes to geospatial data, it’s a story deeply rooted in space and scale. But working with geospatial information is often a hidden challenge: massive file sizes, strange formats, projections, and pipelines that don't scale easily.

In this talk, we'll follow the life of a real-world geospatial dataset, from its raw collection in the field to its transformation into meaningful insights. Along the way, we’ll uncover the key steps of building a robust, scalable open-source geospatial pipeline.

Drawing on years of experience at Camptocamp, we’ll explore:

How raw spatial data is ingested and cleaned
How vector and raster data are efficiently stored and indexed (PostGIS, Cloud Optimized GeoTIFFs, Zarr)
How modern tools like Dask, GeoServer, and STAC (SpatioTemporal Asset Catalogs) help process and serve geospatial data
How to design pipelines that handle both "small data" (local shapefiles) and "big data" (terabytes of satellite imagery)
Common pitfalls and how to avoid them when moving from prototypes to production

This journey will show how the open-source ecosystem has matured to make geospatial big data accessible — and how spatial thinking can enrich almost any data project, whether you are building dashboards, doing analytics, or setting the stage for machine learning later on.

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

2025-10-01 Watch

talk

Simeon Carstens , Rania Talbi

AI/ML Big Data

Built on top of Software Heritage - the largest public archive of source code - the CodeCommons collaboration is building a large-scale, meta-data rich source code dataset designed to make training AI models on code more transparent, sustainable, and fair. Code will be enriched with contextual information such as issues, pull request discussions, licensing data, and provenance. In this presentation, we will present the goals and structure of both Software Heritage and CodeCommons projects, and discuss our particular contribution to CodeCommon's big data infrastructure.

PyData Paris 2025

Top Topics

Top Speakers