talk-data.com talk-data.com

Event

PyData Paris 2025

2025-09-01 – 2025-10-02 PyData

Activities tracked

5

Filtering by: Analytics ×

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →

Réinventer l’exploration spatiale avec l’IA On-Prem et la Self Data Analytics

2025-10-02
Face To Face

Comment une plateforme Data souveraine, full Web - Kubernetes - IA - Data Mesh devient un levier de transformation des opérations spatiales

Oracle Analytics : pour des prises de décisions optimisées grâce à l'analyse augmentée & l’IA

2025-10-01
Face To Face
A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

2025-10-01 Watch
talk

Every dataset has a story — and when it comes to geospatial data, it’s a story deeply rooted in space and scale. But working with geospatial information is often a hidden challenge: massive file sizes, strange formats, projections, and pipelines that don't scale easily.

In this talk, we'll follow the life of a real-world geospatial dataset, from its raw collection in the field to its transformation into meaningful insights. Along the way, we’ll uncover the key steps of building a robust, scalable open-source geospatial pipeline.

Drawing on years of experience at Camptocamp, we’ll explore:

  • How raw spatial data is ingested and cleaned
  • How vector and raster data are efficiently stored and indexed (PostGIS, Cloud Optimized GeoTIFFs, Zarr)
  • How modern tools like Dask, GeoServer, and STAC (SpatioTemporal Asset Catalogs) help process and serve geospatial data
  • How to design pipelines that handle both "small data" (local shapefiles) and "big data" (terabytes of satellite imagery)
  • Common pitfalls and how to avoid them when moving from prototypes to production

This journey will show how the open-source ecosystem has matured to make geospatial big data accessible — and how spatial thinking can enrich almost any data project, whether you are building dashboards, doing analytics, or setting the stage for machine learning later on.

PyPI in the face: running jokes that PyPI download stats can play on you

PyPI in the face: running jokes that PyPI download stats can play on you

2025-10-01 Watch
talk

We all love to tell stories with data and we all love to listen to them. Wouldn't it be great if we could also draw actionable insights from these nice stories?

As scikit-learn maintainers, we would love to use PyPI download stats and other proxy metrics (website analytics, github repository statistics, etc ...) to help inform some of our decisions like: - how do we increase user awareness of best practices (please use Pipeline and cross-validation)? - how do we advertise our recent improvements (use HistGradientBoosting rather than GradientBoosting, TunedThresholdClassifier, PCA and a few other models can run on GPU) ? - do users care more about new features from recent releases or consolidation of what already exists? - how long should we support older versions of Python, numpy or scipy ?

In this talk we will highlight a number of lessons learned while trying to understand the complex reality behind these seemingly simple metrics.

Telling nice stories is not always hard, trying to grasp the reality behind these metrics is often tricky.

State of Parquet 2025: Structure, Optimizations, and Recent Innovations

State of Parquet 2025: Structure, Optimizations, and Recent Innovations

2025-09-30 Watch
talk

If you worked with large amounts of tabular data, chances are you have dealt with Parquet files. Apache Parquet is an open source, column-oriented data file format designed for efficient storage and retrieval. It employs high performance compression and encoding schemes to handle complex data at scale and is supported in many programming language and analytics tools. This talk will give a technical overview of Parquet format file structure, explain how the data is represented and stored in Parquet and why and how some of the possible configuration options might better match your specific use case.

We will also highlight some recent developments the and discussions in the Parquet community including Hugging Face's proposed content defined chunking - an approach that reduces required storage space by ten percent on realistic training datasets. We will also examine the geometry and geography types added to the Parquet specification in 2025, which enable efficient storage of spatial data and have catalyzed Parquet's growing adoption within the geospatial community.