In this session we will go over how we created GneissWeb and discuss tools and techniques used. We will provide code examples that you can try at your leisure.
π > 2% avg improvement in benchmark performance over FineWeb π Huggingface page π Data prep kit detailed recipe π Data prep kit bloom filter for quick reproduction π Recipe models for reproduction π announcement π Paper