Singapore Launches MERaLiON: Speech Recognition and Multimodal LLM for Multilingual Applications

A public institution in Singapore 🇸🇬 🚀 , as part of a national effort to advance AI capabilities , has just released a speech recognition model and a multimodal large language model (LLM) tailored to Singapore’s multilingual landscape.

The MERaLiON-SpeechEncoder is a speech foundation model designed to support downstream speech applications. Researchers built this model from scratch 🤯 and trained it on a massive dataset of speech data, including English and data from the National Speech Corpus. This corpus includes English spoken in Singapore, as well as Singlish. To handle this vast amount of data, they used supercomputers in both Europe and Singapore.

The MERaLiON-AudioLLM is a multimodal LLM that can process both speech and text inputs and is specifically designed for Singapore’s multilingual and multicultural landscape. This first release was created by combining a fine-tuned version of the MERaLiON-Whisper encoder, based on OpenAI’s Whisper-large-v2 model, with the SEA-LION V3 text decoder. SEA-LION V3 is a localized LLM developed by AI Singapore based on Google’s Gemma 2 9B model.

This is really impressive 🔥 and I hope it can inspire other communities around the world!

Learn more:
MERaLiON-SpeechEncoder research paper: https://arxiv.org/abs/2412.11538?hl=en-US
MERaLiON-AudioLLM research paper: https://arxiv.org/abs/2412.09818
Models:
https://huggingface.co/MERaLiON/MERaLiON-SpeechEncoder-v1
https://huggingface.co/MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION

Singapore Launches MERaLiON: Speech Recognition and Multimodal LLM for Multilingual Applications

More posts

First Images from ESA Biomass Satellite

AlphaGenome API

My Notes on Exploring Google’s Health Foundation Models

Visualizing equations and functions using Gemini and Three.js (Vibe coded )