Seq2Vec model code + analysis utilities, with an emphasis on model–neuronal comparison (RSA, regression, CCA, UMAP, time-resolved metrics, etc.).
The Python source lives under src/python/. Most scripts are intended to be run from the repo root (see “Running” below).
src/: source codesrc/python/: Python modules and scripts (primary code)src/ref_matlab/: MATLAB reference extraction scripts + a small reference.mat
data/: datasets and intermediate data products (mostly gitignored; one example dataset is tracked)models/: model checkpoints (mostly gitignored; one example checkpoint is tracked)views/: analysis figures and inference cache (gitignored; generated)MLspike/: external/embedded code folder (present locally; ignored by git per.gitignore).cursor/,.venv/,__pycache__/: editor/venv/python cache (not part of runtime artifacts)
For a more detailed breakdown of src/python/, see src/python/README.md.
Python code is organized by workflow:
src/python/paths.py: path “single source of truth”.DATA_HOMEpoints at the directory that containsdata/,models/,views/.src/python/model/: model training + model–neural comparison.src/python/model/train/: Seq2Vec/autoencoder training, dataset construction, probes, and tests.src/python/model/analysis/: model–neuronal comparison analyses and plotting.run_all.py: orchestrates RSA/CCA/regression/UMAP/etc. from a single config block.preprocess.py: loads dataset + checkpoint, aligns trials, extracts representations, writes inference cache.
src/python/analysis/: “neural-only” analyses and plots (correlation-over-time, dynamics GIF utilities, block direction significance, etc.).src/python/behavior/: behavioral metrics and plots (success rate, expert days, movement-event plots).src/python/data_subject/andsrc/python/subject_data/: utilities for working with subject/session data and trial extraction/alignment (loading pickles, neuron selection, time normalization).src/python/deconvolution/: scripts for examining spike deconvolution quality.
Reference MATLAB extraction code and a small reference dataset:
extract_data_seq2vec_v5.m: producesdata_seq2vec_v5_m06.mat(event-code normalization + time compression + train/val selection).data_seq2vec_v5_m06.mat: reference.matoutput (see “Data/file formats” below).
Intended storage for datasets and subject data.
- Tracked example:
data/dataset/seq2vec_dataset_6_8_r0.8_s42_rew.npz - Common (not tracked):
data/subject_mark/(subject pickles and/ordata_mark.mat),data/raw/, etc. (see.gitignore)
Intended storage for checkpoints.
- Tracked example:
models/autoenc_6_8_conv_lr3e-3_a0_pk0_nll_pmse_pmm01_flip.pth
Generated outputs only.
- Figures typically go under
views/model_neuronal_comparison/when runningsrc/python/model/analysis/run_all.py. - Cached inference (model forward pass outputs) goes under
views/model_neuronal_comparison/inference_cache/.
- Python: 3.10+ (see
pyproject.toml) - Core deps: see
requirements.txt
From the repo root:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -r requirements.txtpython -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install -r requirements.txtThis repo uses a single source of truth for disk locations in src/python/paths.py.
Edit DATA_HOME so it points at the directory that contains your:
data/models/views/
By default it is set to the repo path:
DATA_HOME = Path(r"D:\Projects\seq2vec")
If you move the project or keep large artifacts elsewhere, update DATA_HOME accordingly.
Run commands from the repo root with PYTHONPATH=. so src.python... is importable.
$env:PYTHONPATH="."
python -m src.python.model.analysis.run_allexport PYTHONPATH=.
python -m src.python.model.analysis.run_allrun_all.py also supports being executed as a file path (it inserts the project root into sys.path):
python src/python/model/analysis/run_all.pyOpen src/python/model/analysis/run_all.py and set the config block near the top, especially:
CHECKPOINT_PATH: relative toDATA_HOME(example in file:models/autoenc_... .pth)DATASET_PATH: relative toDATA_HOME(example in file:data/dataset/... .npz)SUBJECT,DAY,REGION(e.g."cbl"or"ctx")OUTPUT_DIR: relative toDATA_HOME(e.g.views/model_neuronal_comparison) orNoneto show plots without savingRUN_ANALYSES: list of analysis IDs to run (e.g.[4]for RSA only)
- Figures / results: typically under
views/model_neuronal_comparison/whenOUTPUT_DIRis set inrun_all.py. - Inference cache: model forward-pass results are cached as
.npzfiles under:views/model_neuronal_comparison/inference_cache/
Re-running the same analysis configuration should reuse cached inference and skip expensive forward passes.
Datasets are stored as NumPy .npz archives. Code paths that read them include:
src/python/model/train/seq2vec_data.py(training/validation splits)src/python/model/analysis/preprocess.py(analysis bundles; also uses “full trial” arrays for some analyses)
Common keys you will see (depending on how the dataset was generated):
- Windowed / shifted-window data:
X_train,E_train,X_val,E_val(and optionallyX_test,E_test)train_idx,val_idx(and optionallytest_idx)
- Full-trial strips (used for “full trial” and shifted-window analysis bundles):
X_full_train,E_full_train,X_full_val,E_full_val,X_full_test,E_full_test
- Offsets:
per_trial_offsets(optional; otherwise code may derive default offsets)
Cached forward-pass outputs produced by src/python/model/analysis/preprocess.py contain:
vec:(n_samples, hidden_dim)rnn_out:(n_samples, T, hidden_dim)slot_logits:(n_samples, T, 6)logits:(n_samples, T, 6)
PyTorch checkpoint used to load model weights for inference/training (e.g. autoencoder checkpoints used by run_all.py).
src/ref_matlab/data_seq2vec_v5_m06.mat (generated by the MATLAB scripts here) is saved with variables like:
ses,cbl_vec,seq_b,seq_btype,seq_b2,seq_b2_2,mask,sbj_m,learning
This repo intentionally gitignores most large artifacts, but does include a few small/representative references:
- Dataset example:
data/dataset/seq2vec_dataset_6_8_r0.8_s42_rew.npz - Checkpoint example:
models/autoenc_6_8_conv_lr3e-3_a0_pk0_nll_pmse_pmm01_flip.pth - MATLAB reference:
src/ref_matlab/data_seq2vec_v5_m06.mat+ extraction scripts insrc/ref_matlab/ - Probe/params JSON:
glm_params_probe.json,glm_params_probe_quick.json(repo root)src/python/model/analysis/regression_params.json,src/python/model/analysis/glm_params.json
- Misc reference text:
email_from_Mark.txt
- Run from the repo root and set
PYTHONPATH=.(otherwisefrom src.python...imports will fail). - Many large artifacts are intentionally not tracked by git (see
.gitignore). You’re expected to have the relevant datasets/checkpoints present under your configuredDATA_HOME.