Reproducibility in Embedding Benchmarks
Reproducibility in embedding benchmarks is no small feat. Prompt variability, growing computational demands, and evolving tasks make fair comparisons a challenge. The need for robust benchmarking has never been greater. In this talk, we’ll explore the quirks and complexities of benchmarking embedding models, such as prompt sensitivity, scaling issues, and emergent behaviors.
We’ll hear straight from the Massive Text Embedding Benchmark (MTEB) maintainers and show how MTEB (and its extensions like MMTEB and MIEB) simplifies reproducibility, making it easier for researchers and industry practitioners to measure progress, choose the right models, and push the boundaries of embedding performance.