WAXAL: A Large-Scale Multilingual African Language Speech Corpus

🇳🇬🇪🇹🇨🇩🇰🇪🇺🇬🇬🇭🇲🇬🇿🇼🇨🇬 Hausa • Yoruba • Igbo • Amharic • Oromo • Swahili • Lingala • Akan (Fante, Twi) • Luganda • Kikuyu • Luo • Shona • Malagasy • Fulani (Fula) • Tigrinya • Sidama • Wolaytta • Ewe • Nyankole • Rukiga • Masaaba • Soga • Dagbani • Dagaare • Acholi • Ikposo

This took several years and I’m so happy it is finally out. We just released an open-source dataset of nearly 2M African speech records for speech recognition and vocalization (27 languages). As of today it already has close to 10k downloads and it is currently being used for ASR and TTS AI models training.

Blog: blog.google/intl/en-afri…

Paper: arxiv.org/abs/2602.02734

Dataset: huggingface.co/datasets/goo…

News articles:

https://techcabal.com/2026/02/12/voice-is-africas-gateway-to-ai-and-google-wants-to-lead-it

https://restofworld.org/2026/google-waxal-african-languages-ai-sovereignty

https://techpoint.africa/insight/google-is-expanding-waxal-beyond-21-languages-what-it-means-for-african-researchers