Abdoulaye Diack

Category: AI & Machine Learning

How I Built an Agriculture “Expert” with a 549MB Model

Fun Sunday exercise, how much useful information I can squeeze in a tiny AI model. My goal was to take a general-purpose language model and turn it into an expert on a topic vital such as agriculture.

Here are my steps:

  1. The Foundation: Google’s Gemma 270M

I started with this small but powerful but compact base model, gemma-3-270m-it. At just 550 MB, it’s a brilliant piece of engineering that can run on consumer-grade hardware.I am using my laptop.

https://developers.googleblog.com/en/introducing-gemma-3-270m

  2. The Technique: Parameter-Efficient Fine-Tuning (PEFT) with LoRA

Instead of retraining the entire model (which is slow and resource-intensive), I used a technique called LoRA. Think of it like adding a small, highly specialized “expert module” to the model’s existing brain. The original model’s knowledge remains, but we efficiently “teach” it a new skill, in this case agricultural information.

  3. The Curriculum: The Agriculture Q&A Dataset

I used the KisanVaani/agriculture-qa dataset to teach the model the nuances of farming, crops, pests, and soil.

https://huggingface.co/datasets/KisanVaani/agriculture-qa-english-only/tree/main

  4. The Result

After a 15m training session, the new “expert module” I created was only 45 MB! That’s right. For just 45 MB, I layered deep agricultural knowledge onto a powerful base model. This process has created a specialized AI assistant that is more accurate and relevant for agricultural queries than the original.

Model output:

— Loading Model and Tokenizer —

Model and tokenizer loaded successfully.

Dataset loaded successfully from ‘/home/abdoulaye/aiplayground/agriculture_qa_dataset’.

Device set to use cuda:0

— Testing Base Model Performance —

— Test Question 1 —

Question:

which maize disease survive well in warm and humid weather.

Original Answer:

Gray leaf spot

Generated Answer (Base Model):

The maize disease, also known as the maize blight, is a fungal disease that can affect maize plants, particularly in areas with high humidity and high temperatures. It’s a common problem in many parts of the world, and it can be difficult to control.

————————————–

— Test Question 2 —

Question:

how can overuse of pesticides destroy soil nutrients?

Original Answer:

Pesticides can kill beneficial soil microorganisms and reduce soil biodiversity, which can lead to nutrient depletion and reduced soil fertility.

Generated Answer (Base Model):

Overuse of pesticides can be a serious threat to soil nutrients, which are essential for plant growth, soil health, and overall ecosystem function. Here are some ways pesticides can negatively impact soil nutrients:

* **Reduced nutrient availability:** Pesticides can disrupt the natural nutrient cycle, leading to nutrient deficiencies and reduced plant growth.

* **Soil degradation:** Pesticides can cause soil erosion, compaction, and altered soil structure, weakening the soil’s ability to retain nutrients.

* **Reduced plant health:** Pesticides can suppress plant growth, leading to stunted development, reduced yields, and increased susceptibility to disease.

* **Soil contamination:** Pesticides can contaminate soil with harmful chemicals, which can harm soil microorganisms, leading to soil degradation and reduced nutrient availability.

* **Impact on plant physiology:** Pesticides can affect plant physiology, including nutrient uptake, metabolism, and stress tolerance.

* **Altered soil pH:** Pesticides can alter soil pH, which can affect the availability of essential nutrients.

This quick experiment shows small AI models can provide practical solutions. By using efficient models like Gemma and smart techniques like LoRA, we can build tools that understand various local contexts.

The power to build specialized AI is here, and I’m excited to see what people will build in my region.

For those interested in the technical details, I used the Hugging Face Transformers library to handle the model and the PEFT library’s implementation of LoRA for efficient training. You can learn more about them at the links below:

* For the Hugging Face `transformers` library: This is the main documentation, the central hub for everything related to the library.

       * https://huggingface.co/docs/transformers (https://huggingface.co/docs/transformers)

   * For LoRA and PEFT (Parameter-Efficient Fine-Tuning): This link goes directly to the Hugging Face documentation for the peft library, which is what you used to implement LoRA.

       * https://huggingface.co/docs/peft/conceptual_guides/lora (https://huggingface.co/docs/peft/conceptual_guides/lora)



#AI #LLM #FineTuning #Gemma #PEFT #LoRA #DemocratizeAI #AIforGood #TechInAfrica #GhanaTech #NLP #MachineLearning

September 21, 2025
AlphaEarth Foundations: Where AI Meets Earth Observation for Unmatched Detail

🤯 This incredible image showcases the stunning beauty and diversity of the African continent, generated using the new AlphaEarth Foundations dataset on Google Earth Engine. So what is this dataset all about?

Imagine being able to X-ray the entire Earth across multiple years, even seeing through clouds! Dealing with clouds in remote sensing is a huge challenge (something I know well from my Open Buildings research project). The AlphaEarth team has essentially created a “virtual satellite” capable of doing just that. To achieve this, the AlphaEarth team combined vast amounts of data from dozens of public sources, including optical satellite images, radar, 3D laser mapping, etc.. weaving it all into a seamless picture.

Even after just a few minutes of exploring the dataset, I’ve stumbled upon fascinating insights. For example, why have Central Mali or Lake Kalala in Zambia changed so much? There’s likely a clear explanation, though I don’t know it yet.

This open dataset release is a huge step forward, likely to help scientists and experts make more informed decisions on critical global issues like food security, deforestation, urban expansion, and water resources.

If you think you can leverage this dataset for your research on our changing world, consider applying for the Satellite Embedding Grant. (Link below)

Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaearth-foundations-helps-map-our-planet-in-unprecedented-detail/alphaearth-foundations.pdf

Google Deepmind Blog: https://deepmind.google/discover/blog/alphaearth-foundations-helps-map-our-planet-in-unprecedented-detail/

Google Earth blog: https://medium.com/google-earth/ai-powered-pixels-introducing-googles-satellite-embedding-dataset-31744c1f4650

Demo: https://code.earthengine.google.com/?scriptPath=Examples%3ADatasets%2FGOOGLE%2FGOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL

Dataset: https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL

Grant application: https://docs.google.com/forms/d/e/1FAIpQLSfxnmqM2PEKdphTWXh44jsy83SRBkn0grjg6shRS-mLJTsKrQ/viewform

July 31, 2025
AlphaGenome API

For those in genomic research: Google DeepMind has released AlphaGenome, an AI model for predicting DNA sequences. The API is free for non-commercial research use.

Feel free to share this with anyone in the field who might be interested.

You can get the API key here: https://deepmind.google.com/science/alphagenome/account/terms

Blog: https://deepmind.google/discover/blog/alphagenome-ai-for-better-understanding-the-genome/

Colab examples:

Quickstart: https://colab.research.google.com/github/google-deepmind/alphagenome/blob/main/colabs/quick_start.ipynb#scrollTo=81ffd5da

Vizualizing predictions: https://colab.research.google.com/github/google-deepmind/alphagenome/blob/main/colabs/visualization_modality_tour.ipynb#scrollTo=ou8Yju8s-I0R

#AI #Genomics

July 10, 2025
My Notes on Exploring Google’s Health Foundation Models
(Note: This post reflects my personal opinions and may not reflect those of my employer)

Example of the HeAR encoder that generates a machine learning representation (known as “embeddings”)

This image is a spectrogram representing my name, “Abdoulaye,” generated from my voice audio by HeAR (Health Acoustic Representations). HeAR is one of the recently released Health AI foundation models by Google. I’ve been captivated by these foundation models lately, spending time digging into them, playing with the demos and notebooks, reading ML papers about the models, and also learning more about embeddings in general and their usefulness in low-resource environments. All of this started after playing with a couple of the notebooks.

Embeddings are numerical representations of data. AI models learn to create these compact summaries (vectors) from various inputs like images, sounds, or text, capturing essential features. These information-rich numerical representations are useful because they can serve as a foundation for developing new, specialized AI models, potentially reducing the amount of task-specific data and development time required. This efficiency is especially crucial in settings where large, labeled medical datasets may be scarce.
- If you would like to read further into what Embeddings are, Vicki Boykis’ essay is such a great free resource; this essay is also ideal to learn or dive into machine learning. I know many of my previous colleagues from the telco and engineering world will love this: https://vickiboykis.com/what_are_embeddings/
- For a technical perspective on their evolution, check out the word2vec paper: https://arxiv.org/abs/1301.3781
The HeAR model, which processed my voice audio, is trained on over 300 million audio clips (e.g., coughs, breathing, speech). Its application can extend to identifying acoustic biomarkers for conditions like TB or COVID-19. It utilizes a Vision Transformer (ViT) to analyze spectrograms. Below, you can see an example of sneezing being detected within an audio file, and later, throat clearing detected at the end.

Health event detector demo

Health event detector demo

This release also includes other open-weight foundation models, each designed to generate high-quality embeddings:

Derm Foundation (Skin Images) This model processes dermatology images to produce embeddings, aiming to make AI development for skin image analysis more efficient by reducing data and compute needs. It facilitates the development of tools for various tasks, such as classifying clinical conditions or assessing image quality.

Explore the Derm Foundation model site for more information and to download the model use this link.

CXR Foundation (Chest X-rays) The CXR Foundation model produces embeddings from chest X-ray images, which can then be used to train models for various chest X-ray related tasks. The models were trained on very large X-ray datasets. What got my attention, some models within the collection, like ELIXR-C, use an approach inspired by CLIP (contrastive language-image pre-training) to link images with text descriptions, enabling powerful zero-shot classification. This means the model might classify an X-ray for a condition it wasn’t specifically trained on, simply by understanding a text description of that condition which i find fascinating. The embeddings generated can also be used to train models that can detect diseases like tuberculosis without a large amount of data; for instance, “models trained on the embeddings derived from just 45 tuberculosis-positive images were able to achieve diagnostic performance non-inferior to radiologists.” This data efficiency is particularly valuable in regions with limited access to large, labeled datasets. Read the paper for more details.

Retrieve images by text queries

Retrieve images by text queries demo

Path Foundation (Pathology Slides) Google’s Path Foundation model is trained on large-scale digital pathology datasets to produce embeddings from these complex microscopy images. Its primary purpose is to enable more efficient development of AI tools for pathology image analysis. This approach supports tasks like identifying tumor tissue or searching for similar image regions, using significantly less data and compute. See the impressive Path Foundation demos on HuggingFace.

Path foundation demos

Outlier Tissue Detector Demo

These models are provided as Open Weight with the goal of enabling developers and researchers to download and adapt them, fostering the creation of localized AI tools. In my opinion, this is particularly exciting for regions like Africa, where such tools could help address unique health challenges and bridge gaps in access to specialist diagnostic capabilities.

For full acknowledgment of contributions from various institutions, including partners like the Center for Infectious Disease Research in Zambia, please refer to the detailed in the paper.

For those interested in the architectural and training methodologies, here are some of the pivotal papers and concepts relevant to these foundation models:
- Vision Transformer (ViT): Applied in HeAR and Path Foundation. (An Image is Worth 16×16 Words: https://arxiv.org/abs/2010.11929)
- Masked Autoencoders (MAE): A self-supervised learning technique used for HeAR. (Masked Autoencoders Are Scalable Vision Learners: https://arxiv.org/abs/2111.06377)
- EfficientNet: The family of architectures related to the backbone for CXR Foundation models. (EfficientNet: https://arxiv.org/pdf/1905.11946)
- Supervised Contrastive Learning: Utilized in aspects of CXR Foundation. (https://arxiv.org/abs/2004.11362)
- Big Transfer (BiT): The framework relevant to training ELIXR-B in CXR Foundation. (https://arxiv.org/abs/1912.11370)
- Masked Siamese Networks: The self-supervised approach for Path Foundation. (https://arxiv.org/abs/2204.07141)
- ELIXR (aligning language and radiology vision encoders): Relevant to ELIXR-C and ELIXR-B in CXR Foundation. (Elixr: https://arxiv.org/abs/2308.01317)
#AIforHealth #FoundationModels #GlobalHealth #AIinAfrica #ResponsibleAI #MedTech #Innovation #GoogleResearch #Embeddings #MachineLearning #DeepLearning
May 12, 2025
Visualizing equations and functions using Gemini and Three.js (Vibe coded )

Visualizing Machine Learning: An Interactive 3D Guide to Gradient Descent & SVMs

From Gaussian Curves to the Heat Equation

May 4, 2025
Managing ML Projects: A Guide for Beginners and Professionals

How do you manage ML projects? 🤔 A question I hear often!
Working in research over the years, I often got asked about the day-to-day of managing machine learning projects. That’s why I’m excited about Google’s new, FREE “Managing ML Projects” guide which I can now point to going forward. it’s only 90 minutes but a good start!

It can be useful for:

* Those entering the ML field 🚀: Providing a clear, structured approach.
* Professionals seeking to refine their ML project management skills.
* Individuals preparing for ML-related interviews: Offering practical insights and frameworks.

This guide covers:

* ML project lifecycle management.
* Applying established project management principles to ML.
* Navigating traditional and generative AI projects.
* Effective stakeholder collaboration.

If you’re curious about ML project management, or want to level up your skills, take a look!

https://developers.google.com/machine-learning/managing-ml-projects

March 1, 2025
SigLIP 2: Multilingual Vision-Language Encoders Released
Google DeepMind has released SigLIP 2, a family of Open-weight (Apache V2) vision-language encoders trained on data covering 109 languages, including Swahili. The released models are available in four sizes: ViT-B (86M), L (303M), So400m (400M), and g (1B).

Why is this important?

This release offers improved multilingual capabilities, covering 109 languages, which can contribute to more inclusive and accurate AI systems. It also features better image recognition and document understanding. The four model sizes offer flexibility and potentially increased accessibility for resource-constrained environments.

Models: https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/README_siglip2.md

Paper: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

https://arxiv.org/pdf/2502.14786

HuggingFace Blog and Demo: https://huggingface.co/blog/siglip2

Google Colab: https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP2_demo.ipynb
```
Credits:  "SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features" by Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen, Andreas Steiner, and Xiaohua Zhai (2025).
```
February 22, 2025
SMOL: New Open-Source Dataset for Low-Resource Language Machine Translation
🎉 My colleagues and members of the language community have released SMOL, a new open-source dataset (CC-BY-4) designed for machine translation research. SMOL includes professionally translated parallel text for over 115 low-resource languages, with a significant representation of over 50 African languages. This dataset is intended to provide a valuable resource for researchers working on machine translation for under-represented languages.

Kindly check the paper for more details including limitations of this dataset.

Paper: https://arxiv.org/pdf/2502.12301
Dataset: https://huggingface.co/datasets/google/smol

List of languages:

Afar
Acoli
Afrikaans
Alur
Amharic
Bambara
Baoulé
Bemba (Zambia)
Berber
Chiga
Dinka
Dombe
Dyula
Efik
Ewe
Fon
Fulfulde
Ga
Hausa
Igbo
Kikuyu
Kongo
Kanuri
Krio
Kituba (DRC)
Lingala
Luo
Kiluba (Luba-Katanga)
Malagasy
Mossi
North Ndebele
Ndau
Nigerian Pidgin
Oromo
Rundi
Kinyarwanda
Sepedi
Shona
Somali
South Ndebele
Susu
Swati
Swahili
Tamazight
Tigrinya
Tiv
Tsonga
Tumbuka
Tswana
Twi
Venda
Wolof
Xhosa
Yoruba
Zulu

Credits:
```
Isaac Caswell and Elizabeth Nielsen and Jiaming Luo and Colin Cherry and Geza Kovacs and Hadar Shemtov and Partha Talukdar and Dinesh Tewari and Baba Mamadi Diane and Koulako Moussa Doumbouya and Djibrila Diane and Solo Farabado Cissé. SMOL: Professionally translated parallel data for 115 under-represented languages.
```
February 22, 2025
Small Language Models: Notes from the past couple of weeks 🤖🤯

The past few days have brought interesting developments in small language models that could expand mobile computing and low-resource environment applications.

Here’s what caught my attention:

• Microsoft’s Phi was made fully open source (MIT license) and has been improved by Unsloth AI. 🚀🔓 Blog: https://unsloth.ai/blog/phi4

• Kyutai Labs based in Paris 🇫🇷 introduced Helium-1 Preview, a 2B-parameter multilingual base LLM designed for edge and mobile devices.

Model: https://huggingface.co/kyutai/helium-1-preview-2b

Blog: https://kyutai.org/2025/01/13/helium.html

• OpenBMB from China 🇨🇳, released MiniCPM-o 2.6, an 8B-parameter multimodal model that matches the capabilities of several larger models. Model: https://huggingface.co/openbmb/MiniCPM-o-2_6

• Moondream2 added gaze 👀 detection functionality with intestesting application for human-computer interaction and market research applications.

Blog: https://moondream.ai/blog/announcing-gaze-detection

• OuteTTS, a series of small Text-To-Speech model variants expanded to support 6 languages and punctuation for more natural sounding speech synthesis. 🗣️

Model: https://huggingface.co/OuteAI/OuteTTS-0.3-1B

These developments suggest continued progress in making language models more efficient and accessible and we’re likely to see more of this in 2025.

Note: Views on this post are my own opinion.

January 17, 2025
Pastra – A Practical Guide to the Gemini Multimodal Live API
Google’s Gemini Multimodal Live API provides developers with tools to build AI applications that process and respond to real-time multimodal input (audio, video, and text). Heiko Hotz, a Gemini expert at Google, has created a project called Pastra, a comprehensive guide to help developers get started with this technology.

What the guide covers:
- An introduction to the Gemini Multimodal Live API and its capabilities.
- Practical code examples and tutorials for building applications.
- Insights into real-time communication and audio processing techniques used by Gemini, such as low-latency audio chunking and system Voice Activity Detection (VAD).
Getting started with the guide:
- Clone the repository:
```
git clone https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide
```
- Add your API key: Update the index.html files manually in all the sub directories with your API key, or use the command and replace the text “add your key here” with your key:
```
find . -name index.html -exec sed -i '' 's/const apiKey = '\''<YOUR_API_KEY>'\''/const apiKey = '\''add your key here'\''/g' {} \
```
- Start the server:
```
python server.py
```
- Explore the examples: Open https://localhost:8000 in your browser.
The guide offers a practical starting point for developers interested in exploring the potential of the Gemini Multimodal Live API for building interactive AI applications. Have fun!
January 8, 2025