Category: AI News & Commentary

  • First Images from ESA Biomass Satellite

    Absolutely stunning images of Gabon and Tchad from the European Space Agency’s (ESA) Biomass satellite.

    The first image shows the Ivindo River in Gabon, stretching from the DRC border all the way to Makoukou in the Ogooué-Ivindo province. This region is known for its dense forests. Typically, when we look at forests from above, all we see are the treetops. However, Biomass uses a special kind of radar, called P-band radar, which has the ability to penetrate through the forest canopy to reveal the terrain below. This means it can measure all the woody material—the trunks, branches, and stems—offering a much more complete picture than ever before.

    The second image features the Tibesti Mountains in northern Chad, and it looks like something straight out of space. Here, the radar demonstrates its ability to see up to five meters beneath dry sand. This opens up fascinating possibilities for mapping and studying hidden features in deserts, such as ancient riverbeds and lakes that have long been buried. Such insights are incredibly valuable for understanding Earth’s past climates and even for locating vital water sources in arid regions.

    It’s an exciting time as our ability to collect information about Earth continues to advance, especially with progress in remote sensing and Artificial Intelligence (AI). The rise of geospatial AI, in particular, is opening up fascinating new avenues for understanding our planet and opening new fields of research.

    If you’re a student considering a career in understanding Earth through technology, leveraging AI. In my opinion, this field presents some interesting opportunities. You can explore more about the amazing Biomass mission on the official ESA website:

    https://www.esa.int/Applications/Observing_the_Earth/FutureEO/Biomass/Biomass_satellite_returns_striking_first_images_of_forests_and_more

    Image credit: ESA

  • Small Language Models: Notes from the past couple of weeks 🤖🤯

    The past few days have brought interesting developments in small language models that could expand mobile computing and low-resource environment applications.

    Here’s what caught my attention:

    • Microsoft’s Phi was made fully open source (MIT license) and has been improved by Unsloth AI. 🚀🔓 Blog: https://unsloth.ai/blog/phi4

    Kyutai Labs based in Paris 🇫🇷 introduced Helium-1 Preview, a 2B-parameter multilingual base LLM designed for edge and mobile devices.

    Model: https://huggingface.co/kyutai/helium-1-preview-2b

    Blog: https://kyutai.org/2025/01/13/helium.html

    • OpenBMB from China 🇨🇳, released MiniCPM-o 2.6, an 8B-parameter multimodal model that matches the capabilities of several larger models. Model: https://huggingface.co/openbmb/MiniCPM-o-2_6

    • Moondream2 added gaze 👀 detection functionality with intestesting application for human-computer interaction and market research applications.

    Blog: https://moondream.ai/blog/announcing-gaze-detection

    • OuteTTS, a series of small Text-To-Speech model variants expanded to support 6 languages and punctuation for more natural sounding speech synthesis. 🗣️

    Model: https://huggingface.co/OuteAI/OuteTTS-0.3-1B

    These developments suggest continued progress in making language models more efficient and accessible and we’re likely to see more of this in 2025.

    Note: Views on this post are my own opinion.

  • Singapore Launches MERaLiON: Speech Recognition and Multimodal LLM for Multilingual Applications

    A public institution in Singapore 🇸🇬 🚀 , as part of a national effort to advance AI capabilities , has just released a speech recognition model and a multimodal large language model (LLM) tailored to Singapore’s multilingual landscape.

    The MERaLiON-SpeechEncoder is a speech foundation model designed to support downstream speech applications. Researchers built this model from scratch 🤯 and trained it on a massive dataset of speech data, including English and data from the National Speech Corpus. This corpus includes English spoken in Singapore, as well as Singlish. To handle this vast amount of data, they used supercomputers in both Europe and Singapore.

    The MERaLiON-AudioLLM is a multimodal LLM that can process both speech and text inputs and is specifically designed for Singapore’s multilingual and multicultural landscape. This first release was created by combining a fine-tuned version of the MERaLiON-Whisper encoder, based on OpenAI’s Whisper-large-v2 model, with the SEA-LION V3 text decoder. SEA-LION V3 is a localized LLM developed by AI Singapore based on Google’s Gemma 2 9B model.

    This is really impressive 🔥 and I hope it can inspire other communities around the world!

    Learn more:
    MERaLiON-SpeechEncoder research paper: https://arxiv.org/abs/2412.11538?hl=en-US
    MERaLiON-AudioLLM research paper: https://arxiv.org/abs/2412.09818
    Models:
    https://huggingface.co/MERaLiON/MERaLiON-SpeechEncoder-v1
    https://huggingface.co/MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION