About the Project | Folder Structure | Installation | How to Run | Additional Documentation | References
This project explores a new paradigm in medical image analysis by moving beyond traditional task-specific models towards generalist models capable of handling a wide variety of clinical tasks with minimal supervision.
While conventional models in retinal Optical Coherence Tomography (OCT) have shown strong performance, they are often limited by their narrow scope and the high cost of development and adaptation for each new task. This project investigates the use of Visual In-Context Learning (VICL), a technique that allows models to adapt to new tasks at inference time by simply observing a few annotated examples, eliminating the need for retraining or fine-tuning.
We introduce:
- A framework for generalist model training in retinal OCT, based on the Neuralizer approach.
- A comprehensive evaluation protocol tailored to VICL in the OCT domain.
- Extensive benchmarks across multiple OCT datasets using a state-of-the-art VICL method.
This work aims to establish a strong baseline and uncover the strengths and limitations of VICL in the context of retinal imaging, paving the way for more flexible and scalable AI solutions in ophthalmology.
📦thesis (root)
┣ 📂assets <-- Contains saved figures, screenshots, ...
┣ 📂configs <-- Configuration files for the pipeline
┃ ┗ ⚙️config.yaml <-- Configuration file for SSH development (default)
┣ 📂data <-- Provided data (.csv files, training images, ...)
┣ 📂docs <-- Project documentation and powerpoint slides
┣ 📂logs <-- Contains logs from the training, e.g. tensorboard logs
┣ 📂models <-- Saved models during Development
┃ ┣ 📂neuralizer_oct <-- Checkpoints of trained Neuralizer models (OCT domain)
┃ ┣ 📂retinalizer <-- Checkpoints of trained Retinalizer model
┃ ┗ 🧠neuralizer_base.ckpt <-- Checkpoint of original Neuralizer paper (MRI domain)
┣ 📂notebooks <-- Jupyter Notebooks used for experimentations
┃ ┣ 📂data <-- EDA Notebooks, Slicing experimentation notebooks, ...
┃ ┣ 📂eval <-- Evaluation notebooks (ablation study, multitask capabilities, domain generalization)
┃ ┣ 📂neuralizer <-- Inference notebook for neuralizer
┃ ┣ 📂retinalizer <-- Inference notebook for retinalizer
┃ ┣ 📂tasks <-- Task visualization notebook
┃ ┗ 📂unet <-- UNet experimentation notebook (not part of the Retinalizer project)
┣ 📂scripts <-- Standalone scripts (evaluation scripts, enrichment script, bg conversion)
┃ ┣ 📂slicing <-- Data preprocessing for the DUKE and UMN dataset (slicing)
┃ ┣ 📂utils <-- Utility scripts (e.g. background color conversion)
┃ ┣ 📜evaluation.py <-- Evaluation protocol
┃ ┣ 📜evaluation_domain_generalization.py <-- Evaluation protocol for the domain generalization scenario
┃ ┗ 📜setup_enriched_semantic_dataset.py <-- Script for semantically enriching segmentation data
┣ 📂src <-- Source code / modules / classes
┃ ┣ 📂data <-- Data related functionalities to collect images and preprocessing them
┃ ┣ 📂dataset <-- Contains PyTorch Dataset, DataLoader and custom Sampler
┃ ┣ 📂eval <-- Contains logic for evaluation objects
┃ ┣ 📂neuralizer <-- Contains source code from the neuralizer model
┃ ┣ 📂retinalizer <-- Contains source code for retinalizer (architecture, distribution alignment)
┃ ┣ 📂tasks <-- Contains implementation of all tasks (e.g. Gaussian Denoising)
┃ ┣ 📂train <-- Contains logic for the actual fitting of the models
┃ ┣ 📂unet <-- Contains basic UNet for binary fluid segmentation to get into PL
┃ ┣ 📂utils <-- Contains all sorts of utility functions
┃ ┗ 📂visualizations <-- Contains multiple plottings using matplotlib + seaborn
┣ 📂tests <-- Unit tests for the source code (not complete)
┣ 📜.pre-commit-config.yaml <-- Pre-commit hooks (see Installation)
┣ 🕹️main.py <-- Entry point of the pipeline
┣ 📜README.md <-- The top-level README for developers using this project
┗ 📜requirements.txt <-- The requirenments file for reproducing the environment
-
Clone the repository by running the following command:
git clone git@github.com:negralessio/thesis-visual-in-context-learning.git
-
Navigate to the project root directory by running the following command in your terminal:
cd thesis-visual-in-context-learning -
Create a virtual environment and activate it.
python3 -m venv venv source venv/bin/activate -
Install the required packages by running the following command in your terminal:
pip install --upgrade pip pip install -r requirements.txt
-
(Optional) Install pre-commit to help adhering to code styles and mitigating minor issues
pre-commit install pre-commit run --all-files
In the following, you will find the necessary steps to run this pipeline.
- First put the data into
data/raw/xwhile x specifies the data set (e.g. KERMANY).- If you put the data somewhere else: Specify
search_dirin configs/config.yaml accordingly.
- If you put the data somewhere else: Specify
- The
main.pymodule will then createmetadata.csv, if it does not exist yet.- The dataframe will contain essential meta information about all images
- For extracting the slices for different datasets, see the section below.
- Put the DUKE dataset into the
search_dirfolder, e.g.<search_dir>/DUKE. - Then execute the following standalone scripts to get the 2D fluid slices from the .mat files:
python3 scripts/DUKE_fluid_segmentation_extraction.py --config "configs/config.yaml"- And for the DUKE layer segmentations:
python3 scripts/DUKE_layer_segmentation_extraction.py --config "configs/config.yaml"- The script will put the images and labels inside the DUKE folder, i.e. in
DUKE/fluid/images/andDUKE/fluid/labels/. - (Analog for the layers extraction in
DUKE/layers/images/andDUKE/layers/labels/)
- Put the UMN dataset into the
search_dirfolder, e.g.<search_dir>/UMN. - Then execute the following standalone scripts to get the 2D fluid slices from the .mat files:
python3 scripts/UMN_fluid_segmentation_extraction.py --config "configs/config.yaml"- The script will put the images and labels inside the UMN folder, i.e. in
UMN/images/andUMN/labels/.
- To obtain the semantically enriched dataset, the line
dataloader.load_data(CFG)(main.py) needs to be executed once to obtain the metadata.csv, which is crucial for this framework. - Afterwards, execute
./scripts/setup_enriched_semantic_dataset.py. This script will create hulls, skeletons, points, etc. from given semantic images. - Last you need to execute the notebook
./notebooks/data/_semantic_enrichment_EDA.ipynbto combine the resulting enriched images from the previous script. - As a result, you will get the following .csv file
enriched_images_combined.csv. This file needs to be specified in theconfig.yamlfile underdataloader.enrichment_location.
For training the specific model, please refer to the main.py module documentation. An example of fitting Retinalizer on GPU with ID 3 and the given config file located in ./config:
CUDA_VISIBLE_DEVICES=3 python3 main.py --fit-retinalizer --config "configs/config.yaml"The resulting model is saved in the directory specified in the training/tensorboard_logger/save_dir entry in the config/configs.yaml file.
Please refer to the module documentation of the stand alone evaluation scripts found in ./scripts. Example to evaluate models (eval_objects) on GPU 3 on their multitask capabilities and generalization towards unseen tasks:
CUDA_VISIBLE_DEVICES=3 python3 scripts/evaluation.py --run-evalResults in .csv files in ./data/evaluation-results/ containing the scores.
Trained model weights can be found in the models/ folder.
Neuralizer models trained on OCT data (Retinalizer in manuscript):
- Vanilla version with our training strategy and tasks: checkpoint
- Random Recoloring Augmentation version with our training strategy and tasks: checkpoint
Neuralizer models trained on OCT data with adversarial training:
- Adversarial training strategy with our tasks: checkpoint
- Random Recoloring Augmentation adversarial training strategy and our tasks: checkpoint
This project is based on the following work:
- Czolbe, Steffen, and Adrian V. Dalca. "Neuralizer: General neuroimage analysis without re-training." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- Implementation of the Neuralizer architecture
(For datasets used within this project, see Datasets Documentation)
Note: What is referred to in the manuscript "Conquering the Retina: Bringing Visual in-Context Learning to OCT" as Retinalizer is not the adversarial training implemented here under the Retinalizer folder. In this repository, reproducing the results from the manuscript can be done by training a Neuralizer model with our training strategy and enriched OCT data as well as using the color augmentation strategy.
@article{negrini2025conquering,
title={Conquering the Retina: Bringing Visual in-Context Learning to OCT},
author={Negrini, Alessio and Rei{\ss}, Simon},
journal={arXiv preprint arXiv:2506.15200},
year={2025}
}
@inproceedings{czolbe2023neuralizer,
title={Neuralizer: General neuroimage analysis without re-training},
author={Czolbe, Steffen and Dalca, Adrian V},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={6217--6230},
year={2023}
}