talk-data.com talk-data.com

Event

PyData Paris 2025

2025-09-01 – 2025-10-02 PyData

Activities tracked

5

Filtering by: Big Data ×

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →

Gouverner la donnée : une révolution culturelle et innovante pour une transformation réussie

2025-10-01
Face To Face

La gouvernance des données est une transformation culturelle clé à l’ère du Big Data et de l’IA. Cette conférence explore comment en faire u

Comment l’IA transforme-t-elle l’intelligence d’entreprise ?

2025-10-01
Face To Face

Découvrez comment agir dès aujourd’hui lors de notre démo session à Big Data & IA Paris.

Du Big Data à la vidéo instantanée

2025-10-01
Face To Face

Du Big Data à la vidéo instantanée : l’expérience client réinventée par PULP'IN Découvrez comment générer des expériences data driven à gran

A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

2025-10-01 Watch
talk

Every dataset has a story — and when it comes to geospatial data, it’s a story deeply rooted in space and scale. But working with geospatial information is often a hidden challenge: massive file sizes, strange formats, projections, and pipelines that don't scale easily.

In this talk, we'll follow the life of a real-world geospatial dataset, from its raw collection in the field to its transformation into meaningful insights. Along the way, we’ll uncover the key steps of building a robust, scalable open-source geospatial pipeline.

Drawing on years of experience at Camptocamp, we’ll explore:

  • How raw spatial data is ingested and cleaned
  • How vector and raster data are efficiently stored and indexed (PostGIS, Cloud Optimized GeoTIFFs, Zarr)
  • How modern tools like Dask, GeoServer, and STAC (SpatioTemporal Asset Catalogs) help process and serve geospatial data
  • How to design pipelines that handle both "small data" (local shapefiles) and "big data" (terabytes of satellite imagery)
  • Common pitfalls and how to avoid them when moving from prototypes to production

This journey will show how the open-source ecosystem has matured to make geospatial big data accessible — and how spatial thinking can enrich almost any data project, whether you are building dashboards, doing analytics, or setting the stage for machine learning later on.

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

2025-10-01 Watch
talk

Built on top of Software Heritage - the largest public archive of source code - the CodeCommons collaboration is building a large-scale, meta-data rich source code dataset designed to make training AI models on code more transparent, sustainable, and fair. Code will be enriched with contextual information such as issues, pull request discussions, licensing data, and provenance. In this presentation, we will present the goals and structure of both Software Heritage and CodeCommons projects, and discuss our particular contribution to CodeCommon's big data infrastructure.