About the Project

About the Project

This project explores a new paradigm in medical image analysis by moving beyond traditional task-specific models towards generalist models capable of handling a wide variety of clinical tasks with minimal supervision.

While conventional models in retinal Optical Coherence Tomography (OCT) have shown strong performance, they are often limited by their narrow scope and the high cost of development and adaptation for each new task. This project investigates the use of Visual In-Context Learning (VICL), a technique that allows models to adapt to new tasks at inference time by simply observing a few annotated examples, eliminating the need for retraining or fine-tuning.

We introduce:

A framework for generalist model training in retinal OCT, based on the Neuralizer approach.
A comprehensive evaluation protocol tailored to VICL in the OCT domain.
Extensive benchmarks across multiple OCT datasets using a state-of-the-art VICL method.

This work aims to establish a strong baseline and uncover the strengths and limitations of VICL in the context of retinal imaging, paving the way for more flexible and scalable AI solutions in ophthalmology.

Folder Structure

📦thesis (root)
 ┣ 📂assets                                      <-- Contains saved figures, screenshots, ...
 ┣ 📂configs                                     <-- Configuration files for the pipeline
 ┃  ┗ ⚙️config.yaml                              <-- Configuration file for SSH development (default)
 ┣ 📂data                                        <-- Provided data (.csv files, training images, ...)
 ┣ 📂docs                                        <-- Project documentation and powerpoint slides
 ┣ 📂logs                                        <-- Contains logs from the training, e.g. tensorboard logs
 ┣ 📂models                                      <-- Saved models during Development
 ┃  ┣ 📂neuralizer_oct                           <-- Checkpoints of trained Neuralizer models (OCT domain)
 ┃  ┣ 📂retinalizer                              <-- Checkpoints of trained Retinalizer model
 ┃  ┗ 🧠neuralizer_base.ckpt                     <-- Checkpoint of original Neuralizer paper (MRI domain)
 ┣ 📂notebooks                                   <-- Jupyter Notebooks used for experimentations
 ┃  ┣ 📂data                                     <-- EDA Notebooks, Slicing experimentation notebooks, ...
 ┃  ┣ 📂eval                                     <-- Evaluation notebooks (ablation study, multitask capabilities, domain generalization)
 ┃  ┣ 📂neuralizer                               <-- Inference notebook for neuralizer
 ┃  ┣ 📂retinalizer                              <-- Inference notebook for retinalizer
 ┃  ┣ 📂tasks                                    <-- Task visualization notebook
 ┃  ┗ 📂unet                                     <-- UNet experimentation notebook (not part of the Retinalizer project) 
 ┣ 📂scripts                                     <-- Standalone scripts (evaluation scripts, enrichment script, bg conversion)
 ┃  ┣ 📂slicing                                  <-- Data preprocessing for the DUKE and UMN dataset (slicing)
 ┃  ┣ 📂utils                                    <-- Utility scripts (e.g. background color conversion)   
 ┃  ┣ 📜evaluation.py                            <-- Evaluation protocol 
 ┃  ┣ 📜evaluation_domain_generalization.py      <-- Evaluation protocol for the domain generalization scenario
 ┃  ┗ 📜setup_enriched_semantic_dataset.py       <-- Script for semantically enriching segmentation data
 ┣ 📂src                                         <-- Source code / modules / classes
 ┃  ┣ 📂data                                     <-- Data related functionalities to collect images and preprocessing them
 ┃  ┣ 📂dataset                                  <-- Contains PyTorch Dataset, DataLoader and custom Sampler
 ┃  ┣ 📂eval                                     <-- Contains logic for evaluation objects
 ┃  ┣ 📂neuralizer                               <-- Contains source code from the neuralizer model
 ┃  ┣ 📂retinalizer                              <-- Contains source code for retinalizer (architecture, distribution alignment)
 ┃  ┣ 📂tasks                                    <-- Contains implementation of all tasks (e.g. Gaussian Denoising)
 ┃  ┣ 📂train                                    <-- Contains logic for the actual fitting of the models
 ┃  ┣ 📂unet                                     <-- Contains basic UNet for binary fluid segmentation to get into PL
 ┃  ┣ 📂utils                                    <-- Contains all sorts of utility functions
 ┃  ┗ 📂visualizations                           <-- Contains multiple plottings using matplotlib + seaborn
 ┣ 📂tests                                       <-- Unit tests for the source code (not complete)
 ┣ 📜.pre-commit-config.yaml                     <-- Pre-commit hooks (see Installation)
 ┣ 🕹️main.py                                     <-- Entry point of the pipeline
 ┣ 📜README.md                                   <-- The top-level README for developers using this project
 ┗ 📜requirements.txt                            <-- The requirenments file for reproducing the environment

👨🏽‍💻 Installation

Clone the repository by running the following command:

git clone git@github.com:negralessio/thesis-visual-in-context-learning.git

Navigate to the project root directory by running the following command in your terminal:
```
cd thesis-visual-in-context-learning
```

Create a virtual environment and activate it.

python3 -m venv venv
source venv/bin/activate

Install the required packages by running the following command in your terminal:
```
pip install --upgrade pip
pip install -r requirements.txt
```
(Optional) Install pre-commit to help adhering to code styles and mitigating minor issues
```
pre-commit install
pre-commit run --all-files
```

🚀 How To Run

In the following, you will find the necessary steps to run this pipeline.

1. Data Preperation

First put the data into data/raw/x while x specifies the data set (e.g. KERMANY).
- If you put the data somewhere else: Specify search_dir in configs/config.yaml accordingly.
The main.py module will then create metadata.csv, if it does not exist yet.
- The dataframe will contain essential meta information about all images
For extracting the slices for different datasets, see the section below.

DUKE Dataset 2D Slicing

Put the DUKE dataset into the search_dir folder, e.g. <search_dir>/DUKE.
Then execute the following standalone scripts to get the 2D fluid slices from the .mat files:

python3 scripts/DUKE_fluid_segmentation_extraction.py --config "configs/config.yaml"

And for the DUKE layer segmentations:

python3 scripts/DUKE_layer_segmentation_extraction.py --config "configs/config.yaml"

The script will put the images and labels inside the DUKE folder, i.e. in DUKE/fluid/images/ and DUKE/fluid/labels/.
(Analog for the layers extraction in DUKE/layers/images/ and DUKE/layers/labels/)

UMN Dataset 2D Slicing

Put the UMN dataset into the search_dir folder, e.g. <search_dir>/UMN.
Then execute the following standalone scripts to get the 2D fluid slices from the .mat files:

python3 scripts/UMN_fluid_segmentation_extraction.py --config "configs/config.yaml"

The script will put the images and labels inside the UMN folder, i.e. in UMN/images/ and UMN/labels/.

Semantically enriching semantic datasets

To obtain the semantically enriched dataset, the line dataloader.load_data(CFG) (main.py) needs to be executed once to obtain the metadata.csv, which is crucial for this framework.
Afterwards, execute ./scripts/setup_enriched_semantic_dataset.py. This script will create hulls, skeletons, points, etc. from given semantic images.
Last you need to execute the notebook ./notebooks/data/_semantic_enrichment_EDA.ipynb to combine the resulting enriched images from the previous script.
As a result, you will get the following .csv file enriched_images_combined.csv. This file needs to be specified in the config.yaml file under dataloader.enrichment_location.

2. Training

For training the specific model, please refer to the main.py module documentation. An example of fitting Retinalizer on GPU with ID 3 and the given config file located in ./config:

CUDA_VISIBLE_DEVICES=3 python3 main.py --fit-retinalizer --config "configs/config.yaml"

The resulting model is saved in the directory specified in the training/tensorboard_logger/save_dir entry in the config/configs.yaml file.

3. Evaluation

Please refer to the module documentation of the stand alone evaluation scripts found in ./scripts. Example to evaluate models (eval_objects) on GPU 3 on their multitask capabilities and generalization towards unseen tasks:

CUDA_VISIBLE_DEVICES=3 python3 scripts/evaluation.py --run-eval

Results in .csv files in ./data/evaluation-results/ containing the scores.

Trained models

Trained model weights can be found in the models/ folder.

Neuralizer models trained on OCT data (Retinalizer in manuscript):

Vanilla version with our training strategy and tasks: checkpoint
Random Recoloring Augmentation version with our training strategy and tasks: checkpoint

Neuralizer models trained on OCT data with adversarial training:

Adversarial training strategy with our tasks: checkpoint
Random Recoloring Augmentation adversarial training strategy and our tasks: checkpoint

📚 Additional Documentation

🔗 References and Acknowledgements

This project is based on the following work:

Czolbe, Steffen, and Adrian V. Dalca. "Neuralizer: General neuroimage analysis without re-training." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
Implementation of the Neuralizer architecture

(For datasets used within this project, see Datasets Documentation)

Cite this and influential prior work

Note: What is referred to in the manuscript "Conquering the Retina: Bringing Visual in-Context Learning to OCT" as Retinalizer is not the adversarial training implemented here under the Retinalizer folder. In this repository, reproducing the results from the manuscript can be done by training a Neuralizer model with our training strategy and enriched OCT data as well as using the color augmentation strategy.

@article{negrini2025conquering,
  title={Conquering the Retina: Bringing Visual in-Context Learning to OCT},
  author={Negrini, Alessio and Rei{\ss}, Simon},
  journal={arXiv preprint arXiv:2506.15200},
  year={2025}
}

@inproceedings{czolbe2023neuralizer,
  title={Neuralizer: General neuroimage analysis without re-training},
  author={Czolbe, Steffen and Dalca, Adrian V},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6217--6230},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About the Project

Folder Structure

👨🏽‍💻 Installation

🚀 How To Run

1. Data Preperation

DUKE Dataset 2D Slicing

UMN Dataset 2D Slicing

Semantically enriching semantic datasets

2. Training

3. Evaluation

Trained models

📚 Additional Documentation

🔗 References and Acknowledgements

Cite this and influential prior work

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
data		data
docs		docs
models		models
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pytest.ini		.pytest.ini
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

negralessio/thesis-visual-in-context-learning

Folders and files

Latest commit

History

Repository files navigation

About the Project

Folder Structure

👨🏽‍💻 Installation

🚀 How To Run

1. Data Preperation

DUKE Dataset 2D Slicing

UMN Dataset 2D Slicing

Semantically enriching semantic datasets

2. Training

3. Evaluation

Trained models

📚 Additional Documentation

🔗 References and Acknowledgements

Cite this and influential prior work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages