talk-data.com talk-data.com

S

Speaker

Shahrokh Daijavad

1

talks

Shahrokh Daijavad, a distinguished Research Scientist in the Watsonx Data Engineering group at IBM Almaden Research Center, has a rich background in Edge Computing and Data Engineering. He earned his B.Eng. and Ph.D. in electrical engineering from McMaster University and spent years at IBM T. J. Watson Research Center. His recent research focuses on AI@Edge and Data Engineering for IBM Watsonx AI offerings.

Bio from: [AI Alliance] Introducing Gneissweb: A State-Of-The-Art LLM Pre-training Dataset

Filtering by: [AI Alliance] Introducing Gneissweb: A State-Of-The-Art LLM Pre-training Dataset ×

Filter by Event / Source

Talks & appearances

Showing 1 of 3 activities

Search activities →

In this session we will go over how we created GneissWeb and discuss tools and techniques used. We will provide code examples that you can try at your leisure.

๐Ÿ‘‰ > 2% avg improvement in benchmark performance over FineWeb ๐Ÿ‘‰ Huggingface page ๐Ÿ‘‰ Data prep kit detailed recipe ๐Ÿ‘‰ Data prep kit bloom filter for quick reproduction ๐Ÿ‘‰ Recipe models for reproduction ๐Ÿ‘‰ announcement ๐Ÿ‘‰ Paper