π³π¬πͺπΉπ¨π©π°πͺπΊπ¬π¬ππ²π¬πΏπΌπ¨π¬ Hausa β’ Yoruba β’ Igbo β’ Amharic β’ Oromo β’ Swahili β’ Lingala β’ Akan (Fante, Twi) β’ Luganda β’ Kikuyu β’ Luo β’ Shona β’ Malagasy β’ Fulani (Fula) β’ Tigrinya β’ Sidama β’ Wolaytta β’ Ewe β’ Nyankole β’ Rukiga β’ Masaaba β’ Soga β’ Dagbani β’ Dagaare β’ Acholi β’ Ikposo
This took several years and I’m so happy it is finally out. We just released an open-source dataset of nearly 2M African speech records for speech recognition and vocalization (27 languages). As of today it already has close to 10k downloads and it is currently being used for ASR and TTS AI models training.
Blog: blog.google/intl/en-afri…
Paper: arxiv.org/abs/2602.02734
Dataset: huggingface.co/datasets/goo…
News articles:
https://techcabal.com/2026/02/12/voice-is-africas-gateway-to-ai-and-google-wants-to-lead-it
https://restofworld.org/2026/google-waxal-african-languages-ai-sovereignty