talk-data.com
Meetup
talk
2025-02-20 at 18:00
BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity
Description
The BIOSCAN-5M dataset features five million specimens from 47 countries with paired high-resolution images and DNA barcodes for every sample. The dataset’s hierarchical taxonomic labels, geographic data, and long-tail distribution of rare species offer valuable resources for ecological research and AI model training. BIOSCAN-5M represents a significant advancement in biodiversity informatics, facilitated by the International Barcode of Life and the BIOSCAN project, and is publicly available for download via Hugging Face and PyPI.