SOLVE‑Med is a modular medical QA system that routes each question to domain–specialized Small Language Models (SLMs) and then fuses their opinions via a lightweight 9 B orchestrator.
The stack is self‑hostable, GPU‑friendly (4‑bit quantization) and supports multi‑turn chat with automatic history reformulation.
Watch our demo to see SOLVE-Med in action.
-
🔑 Check‑points not included
To comply with model licences and size limits, we do not ship the fine‑tuned SLMs nor the classifier.
We provide instead:- full training scripts in
training/ - ready‑to‑run config YAMLs with hyper‑parameters
- a sample folder layout showing where to drop your own checkpoints.
- full training scripts in
-
🧠 LLaMA & licensing The specialist SLMs are fine‑tuned from Meta’s LLaMA 3.2 models.
We do not distribute any LLaMA weights — users must obtain access from Meta:
https://ai.meta.com/resources/models-and-libraries/llama/ -
🐑 Unsloth
Our fine-tuning code and runtime are based on the open‑source Unsloth library (Apache 2.0):
https://github.com/unslothai/unsloth -
🚀 How to train your own models
# classifier
python training/scripts/train_classifier.py \\
--cfg training/cfg/distilbert_multilingual.yaml \\
--data.root_dir /data/your_forum
# specialist SLM (e.g. Dermatology)
python training/scripts/train_slm.py \\
--cfg training/cfg/slm_dermatology.yaml \\
--data.train_csv /data/dermatology/train.csvThe backend will automatically pick up every model placed under
backend/models/<model_name> and every classifier checkpoint listed in
backend/app/data/classifiers.yaml.
| Layer | Stack |
|---|---|
| Backend & REST API | FastAPI + Uvicorn |
| LLM runtime | Unsloth (4‑bit quantization, LoRA) |
| Orchestrator | gemma‑2‑9b‑it (quantized) |
| Specialists | 10× Llama‑3.2‑1B‑Instruct fine‑tuned SLMs |
| Router Agent | DistilBERT multilingual (multi‑label) |
| Frontend | React + Vite |
| DevOps | Docker / Makefile |
- Reformulate – the orchestrator rewrites the last user turn using the whole conversation context.
- Route – a classifier selects the most relevant specialties.
- Specialists – selected SLMs generate expert‑level answers.
- Fuse – the orchestrator merges all snippets into a single, evidence‑based response.
- Ask‑a‑specialist – allows you to ask a question directly to a specialist, bypassing the fusion stage.
- Plug‑and‑play specialists – drop new SLMs by just editing a YAML file.
- Memory‑aware – low mem mode keeps only the orchestrator model resident, streaming SLMs on demand.
- Language‑agnostic prompts – templates live in
backend/app/prompts/<lang>/.
- Python ≥ 3.10
- GPU with ≥ 14 GB VRAM (T4) or set
APP_FORCE_DEVICE=cpu(slow)
git clone https://github.com/PRAISELab-PicusLab/SOLVE-Med
cd solve‑med
# Backend
./scripts/init_project.sh backend
cp backend/.env.example backend/.env # add your HF token
uvicorn backend.app.api.main:app --reloadSwagger UI: http://localhost:8000/docs
docker compose up -d # root docker‑compose.yml (backend + frontend)SOLVE‑Med does not replace professional healthcare providers.
The project is for demonstration purposes only.
All outputs must be verified by licensed clinicians.
The model may produce hallucinations and is trained on user‑generated content; use at your own risk.
Thanks to the Unsloth team, the open‑source Llama‑3 community, the developers of the Python libraries used, and our research team for their contributions to this project.
👨💻 This project was developed by Roberta Di Marino, Giovanni Dioguardi, Antonio Romano, Giuseppe Riccio, Mariano Barone, Marco Postiglione, Flora Amato and Vincenzo Moscato
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
It includes code adapted from:
- Unsloth, licensed under Apache 2.0
- Our own scripts, released under the same CC BY-NC 4.0 license
Note: The LLaMA model family is developed by Meta and distributed under its own license.
Use of Meta’s models must comply with their terms:
https://github.com/facebookresearch/llama/blob/main/LICENSE
