Skip to content

leduckhai/S-Chain

Repository files navigation

Logo S-Chain: Structured Visual Chain-of-Thought for Medicine

ArXiv Hugging Face Dataset License Website


If you find this project helpful, please consider giving it a star on GitHub!


Khai Le-Duc* 1,2✉, Duy M. H. Nguyen* 3,4,24✉, Phuong T. H. Trinh* 5, Tien-Phat Nguyen* 6, Nghiem T. Diep** 3, An Ngo** 7, Tung Vu** 8, Trinh Vuong9, Anh-Tien Nguyen10,11, Mau Nguyen12, Van Trung Hoang13, Khai-Nguyen Nguyen14, Hy Nguyen15, Chris Ngo2, Anji Liu16, Nhat Ho17, Anne-Christin Hauschild11, Khanh Xuan Nguyen18, Thanh Nguyen-Tang19, Pengtao Xie20,21, Daniel Sonntag3,22, James Zou23, Mathias Niepert4,24, Anh Totti Nguyen25✉

*Co-first authors; order randomized   |   **Co-second authors
✉ Corresponding Authors

🎓 Affiliations (click to expand)
  1. University of Toronto, Canada
  2. Knovel Engineering Lab, Singapore
  3. German Research Centre for Artificial Intelligence
  4. University of Stuttgart, Germany
  5. Chonnam National University, South Korea
  6. Singapore University of Technology and Design
  7. Bucknell University, USA
  8. Concordia University, Canada
  9. Korea University
  10. Justus Liebig University Giessen, Germany
  11. University Medical Center Göttingen, Germany
  12. Japan Advanced Institute of Science and Technology
  13. Hue University, Vietnam
  14. College of William & Mary, USA
  15. Deakin University, Australia
  16. National University of Singapore
  17. University of Texas at Austin, USA
  18. University of California, Berkeley, USA
  19. New Jersey Institute of Technology, USA
  20. University of California San Diego, USA
  21. MBZUAI, UAE
  22. Oldenburg University, Germany
  23. Stanford University, USA
  24. Max Planck Research School for Intelligent Systems (IMPRS-IS), Germany
  25. Auburn University, USA

✨ In honor of Hải Thượng Lãn Ông (海上懶翁) – Lê Hữu Trác (黎友晫), the father of Vietnamese traditional medicine ✨

📚 Table of Contents (click to expand)

🔍 What is S-Chain?

S-Chain is the first large-scale dataset of Structured Visual Chain-of-Thought (SV-CoT): each reasoning step is explicitly linked to visual evidence via bounding boxes. This enables training and evaluating grounded medical VLM reasoning instead of hallucinated justifications.

  • 12,000 medical images with expert bounding boxes.
  • 700k+ VQA / rationale pairs across 16 languages.
  • Each sample: image, question, answer, stepwise SV-CoT, and per-step visual regions.

We show that supervising VLMs with SV-CoT:

  • Improves interpretability
  • Improves grounding fidelity (reasoning actually points to the right region)
  • Improves robustness across models and languages

Alt text

📣 News

  • [Oct 2025] Released experiment scripts and checkpoints for two state-of-the-art medical MLLMs with ExGra-Med and LLaVA-Med.
  • [Oct 2025] Dataset and project site released.

🗂 Repo layout

  • architectures/ — adapters for each backbone (ExGra-Med, LLaVA-Med, InternVL, MedGemma, ...). Each model has its own installation and usage instructions.
  • medrag_integration/ — Retrieval-Augmented Generation (RAG) setup for medical evidence.
  • data/ — dataset download scripts and directory conventions.

I. Quickstart

1. 📥 Download the S-Chain dataset

Example Usage (Python) from Hugging Face

👉 https://huggingface.co/datasets/leduckhai/S-Chain

from datasets import load_dataset
dataset = load_dataset("leduckhai/S-Chain")
print(dataset)

Or using Bash

cd data
bash download_english.sh        # English-only SV-CoT split
bash download_multilingual.sh   # All 16 languages

This will populate:

data/
  s_chain_en/
    train.jsonl
    val.jsonl
    test.jsonl
    images/
    annotations/
  s_chain_multilingual/
    ...

Each *.jsonl record contains:

{
  "image_path": "images/img_000123.png",
  "question": "...",
  "answer": "...",
  "sv_cot": [
    {
      "step_text": "First, identify the left costophrenic angle...",
      "evidence_bbox": [x, y, w, h]
    },
    {
      "step_text": "Blunting indicates pleural effusion...",
      "evidence_bbox": [x, y, w, h]
    }
  ],
  "language": "en"
}

2. 📦 Choose Model Checkpoints

Model Description 🤗 Download Link
llava-med-base LLaVa-Med trained with base settings (Q4 only) Link
llava-med-gpt-cot LLaVa-Med trained with GPT-synthetic visual COT Link
llava-med-gpt-schain LLaVa-Med trained with our S-Chain dataset Link
llava-med-gpt-medrag-only LLaVa-Med with medical retrieval augmented generation and Q4 only Link
llava-med-gpt-medrag-schain LLaVa-Med with medical retrieval augmented generation and S-Chian Link
exgra-med-base ExGra-Med trained with base settings (Q4 only) Link
exgra-med-gpt-cot ExGra-Med trained with GPT-synthetic visual COT Link
exgra-med-gpt-schain ExGra-Med trained with our S-Chain dataset Link
exgra-med-gpt-medrag-only ExGra-Med with medical retrieval augmented generation and Q4 only Link
exgra-med-gpt-medrag-schain ExGra-Med with medical retrieval augmented generation and S-Chian Link

Before starting the finetuning/inference/evaluation, download our finetuned checkpoints, e.g., download folder exgra-med-gpt-schain at this link and put it inside architectures/Exgra-Med/checkpoints

3. Run inference with a pretrained checkpoint

Below: load ExGra-Med fine-tuned on SV-CoT from Hugging Face and generate answer and grounded rationale.

cd architectures/Exgra-Med-CoT

# Then, choosing one of two ways below:
bash bashscript/run_infer_demo.py 
# or
python llava/eval/run_med_datasets_eval_batch_CoT.py \
    --num-chunks 2 \
    --conv-mode ${prompt_mode} \
    --use_rag ${use_rag} \
    --model-name ${output_dir} \
    --mm_dense_connector_type none \
    --num_l 6 \
    --question-file ${test_file_json} \
    --image-folder ${image_folder} \
    --answers-file ${answers_file}

python llava/eval/run_eval_CoT.py \
    --gt ${test_file_json} \
    --pred ${answers_file} \

` Outputs will include (a) predicted answer, (b) stepwise visual chain-of-thought, and (c) bounding boxes per step (saved overlay in outputs/viz/).

II. 🧪 Reproducing experiments

We evaluate the following training regimes for each backbone:

  • Baseline CoT: Supervise on model with input image, question and final prediction (Q4).

  • GPT-Synthetic CoT: Supervise on GPT-based synthetic visual chain-of-thought.

  • SV-CoT (Ours): Supervise on our Structured Visual CoT, where each step links to image regions.

  • Medical RAG-only Fine-tune with medical Retrieval-Augmented Generation context. We follow the techniques by MIRIAD to generate addtional context in input promots and train the models without our SV-CoT supervision.

  • SV-CoT + RAG (Joint): Fine-tune using both: visual step grounding from S-Chain and retrieved evidence from MIRIAD.

Currently Available Models

To train a provided model with any settings, first you need to move into the corresponding folder in ./architectures and follow the README carefully.

1. ExGra-Med & LLaVA-Med

To train:

cd architectures/Exgra-Med-CoT
bash bashscript/llava1-5_stage2_noval_CoT.sh

To evaluate:

cd architectures/Exgra-Med-CoT

python llava/eval/run_med_datasets_eval_batch_CoT.py \
    --num-chunks 2 \
    --conv-mode ${prompt_mode} \
    --use_rag ${use_rag} \
    --model-name ${output_dir} \
    --mm_dense_connector_type none \
    --num_l 6 \
    --question-file ${test_file_json} \
    --image-folder ${image_folder} \
    --answers-file ${answers_file}

python llava/eval/run_eval_CoT.py \
    --gt ${test_file_json} \
    --pred ${answers_file} \

Please find more details in Exgra-Med & LLaVA-Med.

More models coming soon...

Citation

If you find this work useful, please cite our paper: https://arxiv.org/abs/2510.22728

@article{leduc2025schain,
  title={S-Chain: Structured Visual Chain-of-Thought For Medicine},
  author={Le-Duc, Khai and Trinh, Phuong T. H. and Nguyen, Duy M. H. and Nguyen, Tien-Phat and Diep, Nghiem T. and Ngo, An and Vu, Tung and Vuong, Trinh and Nguyen, Anh-Tien and Nguyen, Mau and Hoang, Van Trung and Nguyen, Khai-Nguyen and Nguyen, Hy and Ngo, Chris and Liu, Anji and Ho, Nhat and Hauschild, Anne-Christin and Nguyen, Khanh Xuan and Nguyen-Tang, Thanh and Xie, Pengtao and Sonntag, Daniel and Zou, James and Niepert, Mathias and Nguyen, Anh Totti},
  journal={arXiv preprint},
  eprint={2510.22728},
  url={https://arxiv.org/abs/2510.22728},
  year={2025}
}

⚖️ Important Notice on Dataset Usage

The S-Chain dataset is provided solely for research and educational purposes. It may contain human or machine annotation errors, as well as potential biases or inconsistencies inherent to medical data. Users are expected to exercise appropriate caution in interpretation and ensure ethical and non-commercial use.

About

S-Chain: Structured Visual Chain-of-Thought For Medicine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •