Google Translate has taken a significant step towards greater inclusivity by adding 110 new languages to their service, including 31 from Africa. This expansion means that millions of people who previously lacked access to translation services now have a tool to communicate and connect with a wider world.
This achievement is the result of a concerted multi years effort by Isaac Caswell, the Google Translate Research team, and numerous community collaborators. They faced unique challenges, as building high-quality translations for languages with limited digital resources is complex. To address this, they developed an approach that relies on “monolingual” data – text in a single language – instead of solely relying on translated text. This method, called “zero-shot learning,” allows for the creation of translations for languages not explicitly trained on, though it’s important to remember that this is still a developing technology.
What does this mean?
- More Accessibility: People speaking these newly included languages now have a tool to access information, communicate, and break down language barriers.
- Language Preservation: This expansion helps preserve and promote less commonly used languages, which is crucial for maintaining linguistic diversity.
- New Opportunities: The inclusion of languages like Punjabi and Romani opens doors for those communities, aiding in digital navigation and accessing information.
While this is a significant step, it’s important to note that Google Translate, while utilizing powerful AI technology, is not a replacement for the expertise of professional translators, however the app is useful for millions of people. This new approach, powered by the Palm 2 language model will require ongoing refinement and feedback to improve accuracy.
The languages added are significant:
A few notables ones:
- Punjabi (Shahmukhi): This language, written in the Shahmukhi script, is spoken by millions in Pakistan and India. Its inclusion expands communication and access to information for a large community.
- Romani: Romani, the language of romani communities with presence in many european countries, has historically been underrepresented in technology. Its inclusion in Google Translate is a step towards recognizing and supporting this community.
- N’ko: Created in the 1940s, N’ko uses a unique script to unify Manding languages in West Africa. This addition supports literacy and cultural preservation efforts.
- Tamazight (with Tifinagh script): Tamazight is spoken by millions of Berber people in North Africa. Its inclusion acknowledges their cultural diversity and language heritage.
This expansion is a positive step towards a more inclusive digital world, but it’s important to be aware of its limitations. While AI is an exciting tool, it requires continual development and feedback to refine its capabilities. The addition of these new languages is a testament to the potential of technology to foster communication and understanding, but it’s crucial to remember that it’s a journey, not a destination.
Languages by Region
APAC
- Southern Asia
- Bhutan: Dzongkha
- India: Awadhi, Bodo, Khasi, Kokborok, Marwadi, Santali, Tulu
- Nepal: Nepalbhasa (Newari)
- Pakistan: Baluchi, Punjabi (Shahmukhi)
- Eastern Asia
- China: Cantonese, Tibetan
- Hong Kong: Cantonese
- Tibet: Tibetan
- Southeast Asia
- East Timor: Tetum
- Indonesia: Acehnese, Balinese, Batak Karo, Batak Simalungun, Batak Toba, Betawi, Iban, Madurese, Makassar, Minang
- Malaysia: Malay (Jawi)
- Myanmar: Hakha Chin, Jingpo, Shan
- Philippines: Bikol, Hiligaynon, Kapampangan, Pangasinan, Waray
- Melanesia
- Fiji: Fijian
- Papua New Guinea: Tok Pisin
- Micronesia
- Guam: Chamorro
- Micronesia: Chuukese
- Marshall Islands: Marshallese
- Central Asia
- Mongolia: Buryat
- Polynesia
- Tahiti: Tahitian
- Tonga Islands: Tongan
EMEA
- Western Asia
- Afghanistan: Dari
- Northern Africa
- Algeria: Tamazight
- Morocco: Tamazight, Tamazight (Tifinagh)
- Sudan: Acholi, Dinka, Luo, Nuer
- Eastern Europe
- Austria: Romani
- Bosnia and Herzegovina: Romani
- Denmark: Romani
- Finland: Romani
- Germany: Romani
- Hungary: Romani
- Kosovo: Romani
- Montenegro: Romani
- North Macedonia: Romani
- Poland: Romani, Silesian
- Romania: Romani
- Russia: Avar, Bashkir, Buryat, Chechen, Chuvash, Crimean Tatar, Komi, Meadow Mari, Ossetian, Tuvan, Udmurt, Yakut
- Serbia: Romani
- Slovakia: Romani
- Sweden: Romani
- Ukraine: Crimean Tatar
- Western Africa
- Benin: Fon
- Burkina Faso: Dyula
- Côte d’Ivoire: Baoulé, Dyula
- Gabon: Fon
- Gambia: Wolof
- Ghana: Dyula, Fon, Ga
- Guinea: N’Ko, Susu
- Guinea-Bissau: N’Ko, Susu
- Mali: Dyula, N’Ko
- Mauritania: Wolof
- Nigeria: Fon, Tiv
- Senegal: Wolof
- Sierra Leone: Susu
- Togo: Fon
- Southern Africa
- Botswana: Tswana
- Eswatini: Swati
- Lesotho: Swati
- South Africa: Ndebele (South), Swati, Tswana, Venda
- Eastern Africa
- Burundi: Rundi
- Ethiopia: Afar, Luo, Nuer
- Kenya: Luo
- Malawi: Tumbuka
- Mauritius: Mauritian Creole
- Mozambique: Ndau, Swati, Venda
- Rwanda: Kiga
- Seychelles: Seychellois Creole
- South Sudan: Acholi, Dinka, Luo, Nuer
- Tanzania: Bemba, Luo, Tumbuka
- Uganda: Acholi, Alur, Kiga, Luo
- Zambia: Bemba, Dombe, Tumbuka
- Zimbabwe: Dombe, Ndau, Venda
- Middle Africa
- Central African Republic: Sango
- Chad: Sango
- Congo: Kituba, Kikongo
- DRC: Alur, Bemba, Kituba, Kikongo, Luba, Sango
- Northern Europe
- Faroe Islands: Faroese
- Isle of Man: Manx
- Latvia: Latgalian
- Norway: Sami (North), Romani
- Western Europe
- France: Breton, Occitan
- Netherlands: Limburgish, Papiamento
- Southern Europe
- Italy: Friulian, Ligurian, Lombard, Sicilian, Venetian
- Portugal: Portuguese (Portugal)
LATAM
Jamaica: Jamaican Patois
Central America
Guatemala: Q’eqchi’, Mam
Mexico: Nahuatl (Eastern Huaste), Q’eqchi’, Mam, Yucatec Maya, Zapotec
Belize: Q’eqchi’
South America
Brazil: Hunsrik
Caribbean
Caribbean Netherlands: Papiamento