talk-data.com talk-data.com

Z

Speaker

Zilong Zhou

1

talks

Software Engineer ByteDance

Zilong is a software engineer at ByteDance, working on the data lake development.

Bio from: Data + AI Summit 2025

Filtering by: Data + AI Summit 2025 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 1 activities

Search activities →
A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

This session introduces ByteDance’s challenges in data management and model training, and addresses them by Magnus (enhanced Apache Iceberg) and Byted Streaming (customized Mosaic Streaming). Magnus uses Iceberg’s branch/tag to manage massive datasets/checkpoints efficiently. With enhanced metadata and a custom C++ data reader, Magnus achieves optimal sharding, shuffling and data loading. Flexible table migration, detailed metrics and built-in full-text indexes on Iceberg tables further ensure training reliability. When training with ultra-large datasets, ByteDance faced scalability and performance issues. Given Streaming's scalability in distributed training and good code structure, the team chose and customized it to resolve challenges like slow startup, high resource consumption, and limited data source compatibility. In this session, we will explore Magnus and Byted Streaming, discuss their enhancements and demonstrate how they enable efficient and robust distributed training.