S-Chain: Structured Visual Chain-of-Thought for Medicine

⭐ If you find this project helpful, please consider giving it a star on GitHub!

Khai Le-Duc^{* 1,2✉}, Duy M. H. Nguyen^{* 3,4,24✉}, Phuong T. H. Trinh^{* 5}, Tien-Phat Nguyen^{* 6}, Nghiem T. Diep^{** 3}, An Ngo^{** 7}, Tung Vu^{** 8}, Trinh Vuong⁹, Anh-Tien Nguyen^10,11, Mau Nguyen¹², Van Trung Hoang¹³, Khai-Nguyen Nguyen¹⁴, Hy Nguyen¹⁵, Chris Ngo², Anji Liu¹⁶, Nhat Ho¹⁷, Anne-Christin Hauschild¹¹, Khanh Xuan Nguyen¹⁸, Thanh Nguyen-Tang¹⁹, Pengtao Xie^20,21, Daniel Sonntag^3,22, James Zou²³, Mathias Niepert^4,24, Anh Totti Nguyen^25✉

*Co-first authors; order randomized | **Co-second authors
✉ Corresponding Authors

🎓 Affiliations (click to expand)

University of Toronto, Canada
Knovel Engineering Lab, Singapore
German Research Centre for Artificial Intelligence
University of Stuttgart, Germany
Chonnam National University, South Korea
Singapore University of Technology and Design
Bucknell University, USA
Concordia University, Canada
Korea University
Justus Liebig University Giessen, Germany
University Medical Center Göttingen, Germany
Japan Advanced Institute of Science and Technology
Hue University, Vietnam
College of William & Mary, USA
Deakin University, Australia
National University of Singapore
University of Texas at Austin, USA
University of California, Berkeley, USA
New Jersey Institute of Technology, USA
University of California San Diego, USA
MBZUAI, UAE
Oldenburg University, Germany
Stanford University, USA
Max Planck Research School for Intelligent Systems (IMPRS-IS), Germany
Auburn University, USA

✨ In honor of Hải Thượng Lãn Ông (海上懶翁) – Lê Hữu Trác (黎友晫), the father of Vietnamese traditional medicine ✨

📚 Table of Contents (click to expand)

🔍 What is S-Chain?
📣 News
🗂 Repo layout
I. Quickstart
II. 🧪 Reproducing experiments
- Currently Available Models
Citation
⚖️ Important Notice on Dataset Usage

🔍 What is S-Chain?

S-Chain is the first large-scale dataset of Structured Visual Chain-of-Thought (SV-CoT): each reasoning step is explicitly linked to visual evidence via bounding boxes. This enables training and evaluating grounded medical VLM reasoning instead of hallucinated justifications.

12,000 medical images with expert bounding boxes.
700k+ VQA / rationale pairs across 16 languages.
Each sample: image, question, answer, stepwise SV-CoT, and per-step visual regions.

We show that supervising VLMs with SV-CoT:

Improves interpretability
Improves grounding fidelity (reasoning actually points to the right region)
Improves robustness across models and languages

📣 News

[Oct 2025] Released experiment scripts and checkpoints for two state-of-the-art medical MLLMs with ExGra-Med and LLaVA-Med.
[Oct 2025] Dataset and project site released.

🗂 Repo layout

architectures/ — adapters for each backbone (ExGra-Med, LLaVA-Med, InternVL, MedGemma, ...). Each model has its own installation and usage instructions.
medrag_integration/ — Retrieval-Augmented Generation (RAG) setup for medical evidence.
data/ — dataset download scripts and directory conventions.

I. Quickstart

1. 📥 Download the S-Chain dataset

Example Usage (Python) from Hugging Face

👉 https://huggingface.co/datasets/leduckhai/S-Chain

from datasets import load_dataset
dataset = load_dataset("leduckhai/S-Chain")
print(dataset)

Or using Bash

cd data
bash download_english.sh        # English-only SV-CoT split
bash download_multilingual.sh   # All 16 languages

This will populate:

data/
  s_chain_en/
    train.jsonl
    val.jsonl
    test.jsonl
    images/
    annotations/
  s_chain_multilingual/
    ...

Each *.jsonl record contains:

{
  "image_path": "images/img_000123.png",
  "question": "...",
  "answer": "...",
  "sv_cot": [
    {
      "step_text": "First, identify the left costophrenic angle...",
      "evidence_bbox": [x, y, w, h]
    },
    {
      "step_text": "Blunting indicates pleural effusion...",
      "evidence_bbox": [x, y, w, h]
    }
  ],
  "language": "en"
}

2. 📦 Choose Model Checkpoints

Model	Description	🤗 Download Link
`llava-med-base`	LLaVa-Med trained with base settings (Q4 only)	Link
`llava-med-gpt-cot`	LLaVa-Med trained with GPT-synthetic visual COT	Link
`llava-med-gpt-schain`	LLaVa-Med trained with our S-Chain dataset	Link
`llava-med-gpt-medrag-only`	LLaVa-Med with medical retrieval augmented generation and Q4 only	Link
`llava-med-gpt-medrag-schain`	LLaVa-Med with medical retrieval augmented generation and S-Chian	Link
`exgra-med-base`	ExGra-Med trained with base settings (Q4 only)	Link
`exgra-med-gpt-cot`	ExGra-Med trained with GPT-synthetic visual COT	Link
`exgra-med-gpt-schain`	ExGra-Med trained with our S-Chain dataset	Link
`exgra-med-gpt-medrag-only`	ExGra-Med with medical retrieval augmented generation and Q4 only	Link
`exgra-med-gpt-medrag-schain`	ExGra-Med with medical retrieval augmented generation and S-Chian	Link

Before starting the finetuning/inference/evaluation, download our finetuned checkpoints, e.g., download folder exgra-med-gpt-schain at this link and put it inside architectures/Exgra-Med/checkpoints

3. Run inference with a pretrained checkpoint

Below: load ExGra-Med fine-tuned on SV-CoT from Hugging Face and generate answer and grounded rationale.

cd architectures/Exgra-Med-CoT

# Then, choosing one of two ways below:
bash bashscript/run_infer_demo.py 
# or
python llava/eval/run_med_datasets_eval_batch_CoT.py \
    --num-chunks 2 \
    --conv-mode ${prompt_mode} \
    --use_rag ${use_rag} \
    --model-name ${output_dir} \
    --mm_dense_connector_type none \
    --num_l 6 \
    --question-file ${test_file_json} \
    --image-folder ${image_folder} \
    --answers-file ${answers_file}

python llava/eval/run_eval_CoT.py \
    --gt ${test_file_json} \
    --pred ${answers_file} \

` Outputs will include (a) predicted answer, (b) stepwise visual chain-of-thought, and (c) bounding boxes per step (saved overlay in outputs/viz/).

II. 🧪 Reproducing experiments

We evaluate the following training regimes for each backbone:

Baseline CoT: Supervise on model with input image, question and final prediction (Q4).
GPT-Synthetic CoT: Supervise on GPT-based synthetic visual chain-of-thought.
SV-CoT (Ours): Supervise on our Structured Visual CoT, where each step links to image regions.
Medical RAG-only Fine-tune with medical Retrieval-Augmented Generation context. We follow the techniques by MIRIAD to generate addtional context in input promots and train the models without our SV-CoT supervision.
SV-CoT + RAG (Joint): Fine-tune using both: visual step grounding from S-Chain and retrieved evidence from MIRIAD.

Currently Available Models

To train a provided model with any settings, first you need to move into the corresponding folder in ./architectures and follow the README carefully.

1. ExGra-Med & LLaVA-Med

To train:

cd architectures/Exgra-Med-CoT
bash bashscript/llava1-5_stage2_noval_CoT.sh

To evaluate:

cd architectures/Exgra-Med-CoT

python llava/eval/run_med_datasets_eval_batch_CoT.py \
    --num-chunks 2 \
    --conv-mode ${prompt_mode} \
    --use_rag ${use_rag} \
    --model-name ${output_dir} \
    --mm_dense_connector_type none \
    --num_l 6 \
    --question-file ${test_file_json} \
    --image-folder ${image_folder} \
    --answers-file ${answers_file}

python llava/eval/run_eval_CoT.py \
    --gt ${test_file_json} \
    --pred ${answers_file} \

Please find more details in Exgra-Med & LLaVA-Med.

More models coming soon...

Citation

If you find this work useful, please cite our paper: https://arxiv.org/abs/2510.22728

@article{leduc2025schain,
  title={S-Chain: Structured Visual Chain-of-Thought For Medicine},
  author={Le-Duc, Khai and Trinh, Phuong T. H. and Nguyen, Duy M. H. and Nguyen, Tien-Phat and Diep, Nghiem T. and Ngo, An and Vu, Tung and Vuong, Trinh and Nguyen, Anh-Tien and Nguyen, Mau and Hoang, Van Trung and Nguyen, Khai-Nguyen and Nguyen, Hy and Ngo, Chris and Liu, Anji and Ho, Nhat and Hauschild, Anne-Christin and Nguyen, Khanh Xuan and Nguyen-Tang, Thanh and Xie, Pengtao and Sonntag, Daniel and Zou, James and Niepert, Mathias and Nguyen, Anh Totti},
  journal={arXiv preprint},
  eprint={2510.22728},
  url={https://arxiv.org/abs/2510.22728},
  year={2025}
}

⚖️ Important Notice on Dataset Usage

The S-Chain dataset is provided solely for research and educational purposes. It may contain human or machine annotation errors, as well as potential biases or inconsistencies inherent to medical data. Users are expected to exercise appropriate caution in interpretation and ensure ethical and non-commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
architectures		architectures
assets		assets
data		data
experiments		experiments
medrag_integration		medrag_integration
.DS_Store		.DS_Store
DATASET_LICENSE.md		DATASET_LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

S-Chain: Structured Visual Chain-of-Thought for Medicine

🔍 What is S-Chain?

📣 News

🗂 Repo layout

I. Quickstart

1. 📥 Download the S-Chain dataset

2. 📦 Choose Model Checkpoints

3. Run inference with a pretrained checkpoint

II. 🧪 Reproducing experiments

Currently Available Models

Citation

⚖️ Important Notice on Dataset Usage

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

leduckhai/S-Chain

Folders and files

Latest commit

History

Repository files navigation

S-Chain: Structured Visual Chain-of-Thought for Medicine

🔍 What is S-Chain?

📣 News

🗂 Repo layout

I. Quickstart

1. 📥 Download the S-Chain dataset

2. 📦 Choose Model Checkpoints

3. Run inference with a pretrained checkpoint

II. 🧪 Reproducing experiments

Currently Available Models

Citation

⚖️ Important Notice on Dataset Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages