Nvidia Unveils Granary: A Massive Multilingual Dataset for AI Translation

2025-08-24
Nvidia Unveils Granary: A Massive Multilingual Dataset for AI Translation

Nvidia announced Granary, a massive open-source multilingual audio dataset exceeding one million hours of audio, designed to boost AI translation for European languages. This dataset, developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, includes nearly all EU official languages plus Russian and Ukrainian, focusing on under-resourced languages. Accompanying Granary are two new models, Canary and Parakeet, optimized for accuracy and speed respectively. Granary significantly reduces the data needed for training, enabling more inclusive speech technologies.