RisingWave Labs

Postgres -> Iceberg: The Pipeline Everyone Thinks Is Boring (But Isn't) (EN)

2025-09-17 · Apache Iceberg Paris Community Meetup #2

talk

Yingjun Wu (Speaker)

postgresql Iceberg

At first, streaming Postgres changes into Iceberg feels like a no-brainer. You spin up Debezium or Kafka Connect, point it at Iceberg, and it all looks boringly straightforward. The surprise comes once the pipeline hits production. Replication slots vanish and start filling up WAL space, LSNs don't line up and cause duplicates or gaps, and Iceberg sinks fail in ways that push back all the way to your primary database. Then you throw in schema changes, backfills, and compaction, and suddenly the \"boring\" pipeline becomes a source of late-night firefights. In this talk, I'll share real stories from running Postgres to Iceberg CDC pipelines in production. We'll look at the unexpected problems that show up, why they happen, and the strategies that actually helped keep things stable. If you've ever thought of Postgres -> Iceberg as just plumbing, this session will show you why it's not so boring after all.

Achieving Sub-100 ms Real-Time Stream Processing with an S3-Native Architecture

2025-09-16 · Data Builders’ Evening: Architecture, Engineering & Beyond

talk

Yingjun Wu (Speaker)

S3 Data Streaming hummock object storage log-structured state engine

Stream processing systems have traditionally relied on local storage engines such as RocksDB to achieve low latency. While effective in single-node setups, this model doesn't scale well in the cloud, where elasticity and separation of compute and storage are essential. In this talk, we'll explore how RisingWave rethinks the architecture by building directly on top of S3 while still delivering sub-100 ms latency. At the core is Hummock, a log-structured state engine designed for object storage. Hummock organizes state into a three-tier hierarchy: in-memory cache for the hottest keys, disk cache managed by Foyer for warm data, and S3 as the persistent cold tier. This approach ensures queries never directly hit S3, avoiding its variable performance. We'll also examine how remote compaction offloads heavy maintenance tasks from query nodes, eliminating interference between user queries and background operations. Combined with fine-grained caching policies and eviction strategies, this architecture enables both consistent query performance and cloud-native elasticity. Attendees will walk away with a deeper understanding of how to design streaming systems that balance durability, scalability, and low latency in an S3-based environment.

Building a Real-Time Lakehouse on Iceberg

2025-06-19 · Data Streaming Lakehouse Tour (Paris)

talk

Yingjun Wu (Speaker)

Iceberg risingwave

Everyone makes streaming sound simple – until you try bolting it onto your batch pipeline and it blows up. This talk skips the marketing gloss and gets into the real work: how to make batch and streaming actually play nice. I’ll walk through the essentials, then get into the messy parts – compaction, primary key updates, exactly-once delivery, and keeping your compute bill from spiraling. You’ll learn how to plug RisingWave into your existing stack and get real-time results without rewriting everything. It’s based on what we’ve seen in production – real problems, real fixes, no buzzwords.

Building a Real-Time Lakehouse on Iceberg (Without Losing Your Mind)

2025-06-19 · Data Streaming Lakehouse Tour (Paris)

talk

Yingjun Wu (Speaker)

risingwave Iceberg

Everyone makes streaming sound simple - until you try bolting it onto your batch pipeline and it blows up. This talk skips the marketing gloss and gets into the real work: how to make batch and streaming actually play nice. I will walk through the essentials, then get into the messy parts - compaction, primary key updates, exactly-once delivery, and keeping your compute bill from spiraling. You will learn how to plug RisingWave into your existing stack and get real-time results without rewriting everything. It is based on what we have seen in production - real problems, real fixes, no buzzwords.

Making Apache Iceberg Work for Time-Series and IoT

2025-06-18 · The June Meetup

talk

Yingjun Wu (Speaker)

apache iceberg

Talk on optimizing Apache Iceberg for time-series and IoT.

How We Implemented the Iceberg Connector in Rust!

2024-11-05 · Apache Iceberg Bay Area Community Meetup

talk

Yingjun Wu (Speaker)

Rust Iceberg

In this talk, we will discuss how we implemented the Iceberg connector in Rust, replacing the original Java-wrapped version to address performance bottlenecks in serialization and memory usage. By following the Apache Iceberg specification, we built a native Rust connector that supports Iceberg’s advanced features, such as multi-catalog compatibility and streaming updates. We’ve contributed this new version to the apache/iceberg-rust repository, and will share insights into the architectural improvements and best practices for leveraging Iceberg in streaming environments.

On the Journey of Redefining Stream Processing: What We Learned from Building RisingWave?

2023-11-30 · Berlin Open Source Data Infrastructure Meetup - November 2023

talk

Yingjun Wu (Speaker)

Cloud Computing postgresql Rust Snowflake Data Streaming

Abstract: RisingWave is an open-source streaming database designed from scratch for the cloud. It implemented a Snowflake-style storage-compute separation architecture to reduce performance cost, and provides users with a PostgreSQL-like experience for stream processing. Over the last three years, RisingWave has evolved from a one-person project to a rapidly-growing product deployed by nearly 100 enterprises and startups. But the journey of building RisingWave is full of challenges. In this talk, I'd like to share with you lessons we've gained from four dimensions: 1) the decoupled compute-storage architecture, 2) the balances between stream processing and OLAP, 3) the Rust ecosystem, and 4) the product positioning. I will dive deep into technical details and then share with you my views on the future of stream processing.

Speakers from RisingWave Labs