Abdoulaye Diack

Category: AI News & Commentary

Small Language Models: Notes from the past couple of weeks 🤖🤯

The past few days have brought interesting developments in small language models that could expand mobile computing and low-resource environment applications.

Here’s what caught my attention:

• Microsoft’s Phi was made fully open source (MIT license) and has been improved by Unsloth AI. 🚀🔓 Blog: https://unsloth.ai/blog/phi4

• Kyutai Labs based in Paris 🇫🇷 introduced Helium-1 Preview, a 2B-parameter multilingual base LLM designed for edge and mobile devices.

Model: https://huggingface.co/kyutai/helium-1-preview-2b

Blog: https://kyutai.org/2025/01/13/helium.html

• OpenBMB from China 🇨🇳, released MiniCPM-o 2.6, an 8B-parameter multimodal model that matches the capabilities of several larger models. Model: https://huggingface.co/openbmb/MiniCPM-o-2_6

• Moondream2 added gaze 👀 detection functionality with intestesting application for human-computer interaction and market research applications.

Blog: https://moondream.ai/blog/announcing-gaze-detection

• OuteTTS, a series of small Text-To-Speech model variants expanded to support 6 languages and punctuation for more natural sounding speech synthesis. 🗣️

Model: https://huggingface.co/OuteAI/OuteTTS-0.3-1B

These developments suggest continued progress in making language models more efficient and accessible and we’re likely to see more of this in 2025.

Note: Views on this post are my own opinion.

January 17, 2025
Singapore Launches MERaLiON: Speech Recognition and Multimodal LLM for Multilingual Applications

A public institution in Singapore 🇸🇬 🚀 , as part of a national effort to advance AI capabilities , has just released a speech recognition model and a multimodal large language model (LLM) tailored to Singapore’s multilingual landscape.

The MERaLiON-SpeechEncoder is a speech foundation model designed to support downstream speech applications. Researchers built this model from scratch 🤯 and trained it on a massive dataset of speech data, including English and data from the National Speech Corpus. This corpus includes English spoken in Singapore, as well as Singlish. To handle this vast amount of data, they used supercomputers in both Europe and Singapore.

The MERaLiON-AudioLLM is a multimodal LLM that can process both speech and text inputs and is specifically designed for Singapore’s multilingual and multicultural landscape. This first release was created by combining a fine-tuned version of the MERaLiON-Whisper encoder, based on OpenAI’s Whisper-large-v2 model, with the SEA-LION V3 text decoder. SEA-LION V3 is a localized LLM developed by AI Singapore based on Google’s Gemma 2 9B model.

This is really impressive 🔥 and I hope it can inspire other communities around the world!

Learn more:
MERaLiON-SpeechEncoder research paper: https://arxiv.org/abs/2412.11538?hl=en-US
MERaLiON-AudioLLM research paper: https://arxiv.org/abs/2412.09818
Models:
https://huggingface.co/MERaLiON/MERaLiON-SpeechEncoder-v1
https://huggingface.co/MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION

January 7, 2025

Category: AI News & Commentary

Small Language Models: Notes from the past couple of weeks 🤖🤯

Singapore Launches MERaLiON: Speech Recognition and Multimodal LLM for Multilingual Applications