Stream processing systems have traditionally relied on local storage engines such as RocksDB to achieve low latency. While effective in single-node setups, this model doesn't scale well in the cloud, where elasticity and separation of compute and storage are essential. In this talk, we'll explore how RisingWave rethinks the architecture by building directly on top of S3 while still delivering sub-100 ms latency. At the core is Hummock, a log-structured state engine designed for object storage. Hummock organizes state into a three-tier hierarchy: in-memory cache for the hottest keys, disk cache managed by Foyer for warm data, and S3 as the persistent cold tier. This approach ensures queries never directly hit S3, avoiding its variable performance. We'll also examine how remote compaction offloads heavy maintenance tasks from query nodes, eliminating interference between user queries and background operations. Combined with fine-grained caching policies and eviction strategies, this architecture enables both consistent query performance and cloud-native elasticity. Attendees will walk away with a deeper understanding of how to design streaming systems that balance durability, scalability, and low latency in an S3-based environment.
talk-data.com
Y
Speaker
Yingjun Wu
1
talks
Speaker
RisingWave Labs
CEO, RisingWave
Bio from: Data Streaming Lakehouse Tour (Paris)
Filtering by:
Data Builders’ Evening: Architecture, Engineering & Beyond
×
Filter by Event / Source
Berlin Open Source Data Infrastructure Meetup - November 2023
1
Apache Iceberg Paris Community Meetup #2
1
Data Streaming Lakehouse Tour (Paris)
1
Apache Iceberg Bay Area Community Meetup
1
Data Streaming Lakehouse Tour (Paris)
1
The June Meetup
1
Data Builders’ Evening: Architecture, Engineering & Beyond
1
DATA MINER Big Data Europe Conference 2020
3
Data Engineering Podcast
1
Talks & appearances
Showing 1 of 11 activities