Topic

Cloud Storage

object_storage file_storage cloud

Activities

1

tagged

Activity Trend

5 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly Data Engineering Books 21 Google Cloud Next '25 10 Google Cloud Next '24 7 Databricks DATA + AI Summit 2023 6 Microsoft Ignite 2025 5 Data + AI Summit 2025 5 Data Engineering Podcast 2 The Analytics Power Hour 1 SciPy 2025 1 DATA MINER Big Data Europe Conference 2020 1 Get things in order with GCP Workflows - PART 2 || FREE Community Training 1 Snowflake World Tour - Stockholm 1

Top Speakers

Larry Coyne 6 Joe Hew 5 Michael Scott 5 Derek Erdmann 5 Alberto Barajas Ortiz 5 Aderson Pacini 4 Bert Dufrasne 4 Tomoaki Ogino 4 Chen Zhu 4 Taisei Takai 4 Carlos Villuendas (Microsoft) 3 Trevor Davis (Microsoft) 3

Activities

Showing filtered results

All Video Podcast Book

Filtering by: SciPy 2025 ×

Processing Cloud-optimized data in Python with Serverless Functions (Lithops, Dataplug)

2025-07-08 · SciPy 2025

talk

by Universitat Rovira i Virgili (Pedro Garcia Lopez) , Enrique Molina Giménez

Cloud Computing Data Management GitHub Python

Cloud-optimized (CO) data formats are designed to efficiently store and access data directly from cloud storage without needing to download the entire dataset. These formats enable faster data retrieval, scalability, and cost-effectiveness by allowing users to fetch only the necessary subsets of data. They also allow for efficient parallel data processing using on-the-fly partitioning, which can considerably accelerate data management operations. In this sense, cloud-optimized data is a nice fit for data-parallel jobs using serverless. FaaS provides a data-driven scalable and cost-efficient experience, with practically no management burden. Each serverless function will read and process a small portion of the cloud-optimized dataset, being read in parallel directly from object storage, significantly increasing the speedup.

In this talk, you will learn how to process cloud-optimized data formats in Python using the Lithops toolkit. Lithops is a serverless data processing toolkit that is specially designed to process data from Cloud Object Storage using Serverless functions. We will also demonstrate the Dataplug library that enables Cloud Optimized data managament of scientific settings such as genomics, metabolomics, or geospatial data. We will show different data processing pipelines in the Cloud that demonstrate the benefits of cloud-optimized data management.