Dynamic‑Vision is a benchmarking framework for testing computer vision models on dynamic stimuli and quantifying how well they align with neural and behavioural data. It combines model activation extraction, simple decoders and evaluation metrics into a coherent pipeline. At a high level Dynamic‑Vision loads a model, computes its spatiotemporal activations on a given dataset and uses linear decoders to predict behavioural choices, fMRI signals or standard computer‑vision tasks.
The code builds on the BrainScore‑Vision (please use this fork for now) tools for loading models and brain datasets. Static image architectures are optionally converted into temporal models, and a handful of decoders (ridge regression or logistic regression) are used to report alignment scores. The complete dataset loading process is currently in progress and will be updated.
The top‑level src package contains all implementation details and
four scoring scripts. Key components include:
analysis– statistical helpers for permutation tests, cross‑validation and false discovery control.data– dataset loaders and definitions.src/data/__init__.pyenumerates fMRI datasets, behavioural experiments, electrode recordings and standard computer‑vision tasks and maps each to an appropriate decoder.evaluate– decoders and evaluation functions for behaviour, fMRI, electrodes and classification tasks.models– model loaders and definitions.groups.pylists a wide array of image and video models, from ResNets and EfficientNets to R3D, SlowFast and masked autoencoders. The loader inloading.pyconverts static models to temporal ones and configures inference.utils.py,store.pyandceiling.py– utilities for caching results and computing noise ceilings.- Scoring scripts –
score_behaviour.py,score_electrodes.py,score_fmri.pyandscore_task.py. Each accepts a model identifier plus one or more tasks/datasets and outputs alignment scores.
Running a script loads the requested model, computes activations,
fits a linear decoder and reports test/validation scores. Results
and activations are cached on disk (cache/.store by default).
The scripts/figures contains codes to reproduce results in the paper, once you run the scoring.
Please use Python==3.11 for this repo. Any operating system should be compatible.
Install the following packages (typically takes ~20 mins):
# install core dependencies
pip install numpy scipy pandas scikit-learn torch torchvision h5py matplotlib opencv-python
pip install brainscore-core brainio nilearn
pip install git+https://github.com/YingtianDt/vision.git
pip install git+https://github.com/YingtianDt/neuroparc.gitSet RESULTCACHING_HOME, BRAINIO_HOME, BRAINSCORE_HOME, TORCH_HOME,
HF_HOME and MMAP_HOME to directories on your machine where the
code can store intermediate results and locate stimuli. See
src/config.py for examples.
In the Usage section below, we describe how to score models on different benchmarks. Results are stored in the cache at cache/.store (by default). After scoring, reproduce all main figures by running bash make_figures.sh (typically takes ~15 mins), or generate individual plots like running python -m scripts.figures.f3.a for Figure 3(a), for example. The reproduced figures will be stored under scripts/figures/cache.
Note: The full scoring procedure is extremely computationally intensive and generates ~34TB of data. To facilitate quick reproduction, we provide a pre-computed cache containing only the essential data needed for figure generation.
You can get the cache by running bash download_cache.sh. The required storage is about 60GB.
The cache will be downloaded and extracted to cache/.store. To use a different cache location, modify the "store" path in src/config.py.
Note: Currently, due to copyright restrictions, the following benchmarks (processed brain recordings and stimuli) are not publicly available; however, they can be provided upon reasonable request.
Activate your environment, ensure the cache variables are set and run
one of the scoring scripts. Each takes a --model argument plus
dataset/task names. Examples:
- fMRI – evaluate on one or more fMRI datasets (optionally convolving with an HRF):
python -m src.data.fmri.cache # to reproduce the results python score_fmri.py --model r3d_18 --datasets all # or a subset of datasets python score_fmri.py --model r3d_18 --datasets mcmahon2023-fmri keles2024-fmri
- Behaviour – compare a model to human choices:
python -m src.data.behaviour.cache python score_behaviour.py --model alexnet --tasks rajalingham2018
- Tasks – run standard vision benchmarks using logistic or ridge decoding:
python -m src.data.tasks.cache python score_task.py --model resnet50_imagenet_full --tasks imagenet2012 selfmotion
- Electrodes – (unsupported yet) fit the model to neuronal recordings:
python -m src.data.electrodes.cache python score_electrodes.py --model resnet50_imagenet_full \ --datasets freemanziemba2013-V4 crcns-pvc1
The data module defines several groups of inputs:
- fMRI datasets – the
DATASETSlist includes multiple recordings such as savasegal2023, keles2024 and mcmahon2023. - Behavioural tasks – experiments like ilic2022–ucf5 and
rajalingham2018 are listed in
BEHAVIOURSand come with built‑in metric. - Electrode recordings – recordings from areas V1, V2, V4, IT and
CRCNS datasets are enumerated in
ELECTRODES. - Standard tasks – classification or action‑recognition tasks
(ImageNet‑100, Kinetics‑400, self‑motion and others) are defined in
TASKSalong with the appropriate decoder type. A separateSTATIClist identifies tasks that involve single images rather than video frames.
Dozens of models are available out‑of‑the‑box. Static image networks
(e.g. ResNet, EfficientNet, DeiT and ConvNeXt variants)
can be transformed into temporal models. Video‑centric architectures
include 3D CNNs and transformers such as R3D‑18, SlowFast, Video Swin
and TimeSformer, as well as masked autoencoders,
audio‑video networks and recurrent predictors. See
src/models/groups.py for the full list. The loader handles
downsampling, temporal conversion and random initialization.
Intermediate activations and scores are cached on disk via
resultcaching and pickle_store. Caches live under the
RESULTCACHING_HOME directory and make repeated evaluations faster.
Use --rerun-activation and related flags when you want to force
recomputation.