seq2vec

Seq2Vec model code + analysis utilities, with an emphasis on model–neuronal comparison (RSA, regression, CCA, UMAP, time-resolved metrics, etc.).

The Python source lives under src/python/. Most scripts are intended to be run from the repo root (see “Running” below).

Repo layout (high level)

src/: source code
- src/python/: Python modules and scripts (primary code)
- src/ref_matlab/: MATLAB reference extraction scripts + a small reference .mat
data/: datasets and intermediate data products (mostly gitignored; one example dataset is tracked)
models/: model checkpoints (mostly gitignored; one example checkpoint is tracked)
views/: analysis figures and inference cache (gitignored; generated)
MLspike/: external/embedded code folder (present locally; ignored by git per .gitignore)
.cursor/, .venv/, __pycache__/: editor/venv/python cache (not part of runtime artifacts)

For a more detailed breakdown of src/python/, see src/python/README.md.

Folder-by-folder details (what goes where)

`src/python/`

Python code is organized by workflow:

src/python/paths.py: path “single source of truth”. DATA_HOME points at the directory that contains data/, models/, views/.
src/python/model/: model training + model–neural comparison.
- src/python/model/train/: Seq2Vec/autoencoder training, dataset construction, probes, and tests.
- src/python/model/analysis/: model–neuronal comparison analyses and plotting.
  - run_all.py: orchestrates RSA/CCA/regression/UMAP/etc. from a single config block.
  - preprocess.py: loads dataset + checkpoint, aligns trials, extracts representations, writes inference cache.
src/python/analysis/: “neural-only” analyses and plots (correlation-over-time, dynamics GIF utilities, block direction significance, etc.).
src/python/behavior/: behavioral metrics and plots (success rate, expert days, movement-event plots).
src/python/data_subject/ and src/python/subject_data/: utilities for working with subject/session data and trial extraction/alignment (loading pickles, neuron selection, time normalization).
src/python/deconvolution/: scripts for examining spike deconvolution quality.

`src/ref_matlab/`

Reference MATLAB extraction code and a small reference dataset:

extract_data_seq2vec_v5.m: produces data_seq2vec_v5_m06.mat (event-code normalization + time compression + train/val selection).
data_seq2vec_v5_m06.mat: reference .mat output (see “Data/file formats” below).

`data/` (under `DATA_HOME`)

Intended storage for datasets and subject data.

Tracked example: data/dataset/seq2vec_dataset_6_8_r0.8_s42_rew.npz
Common (not tracked): data/subject_mark/ (subject pickles and/or data_mark.mat), data/raw/, etc. (see .gitignore)

`models/` (under `DATA_HOME`)

Intended storage for checkpoints.

Tracked example: models/autoenc_6_8_conv_lr3e-3_a0_pk0_nll_pmse_pmm01_flip.pth

`views/` (under `DATA_HOME`)

Generated outputs only.

Figures typically go under views/model_neuronal_comparison/ when running src/python/model/analysis/run_all.py.
Cached inference (model forward pass outputs) goes under views/model_neuronal_comparison/inference_cache/.

Requirements

Python: 3.10+ (see pyproject.toml)
Core deps: see requirements.txt

Setup (virtualenv)

From the repo root:

PowerShell

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -r requirements.txt

bash

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install -r requirements.txt

Configure `DATA_HOME` (required)

This repo uses a single source of truth for disk locations in src/python/paths.py.

Edit DATA_HOME so it points at the directory that contains your:

data/
models/
views/

By default it is set to the repo path:

DATA_HOME = Path(r"D:\Projects\seq2vec")

If you move the project or keep large artifacts elsewhere, update DATA_HOME accordingly.

Running

Run commands from the repo root with PYTHONPATH=. so src.python... is importable.

PowerShell

$env:PYTHONPATH="."
python -m src.python.model.analysis.run_all

bash

export PYTHONPATH=.
python -m src.python.model.analysis.run_all

Alternative (direct script invocation)

run_all.py also supports being executed as a file path (it inserts the project root into sys.path):

python src/python/model/analysis/run_all.py

What `run_all.py` expects

Open src/python/model/analysis/run_all.py and set the config block near the top, especially:

CHECKPOINT_PATH: relative to DATA_HOME (example in file: models/autoenc_... .pth)
DATASET_PATH: relative to DATA_HOME (example in file: data/dataset/... .npz)
SUBJECT, DAY, REGION (e.g. "cbl" or "ctx")
OUTPUT_DIR: relative to DATA_HOME (e.g. views/model_neuronal_comparison) or None to show plots without saving
RUN_ANALYSES: list of analysis IDs to run (e.g. [4] for RSA only)

Outputs and caching

Figures / results: typically under views/model_neuronal_comparison/ when OUTPUT_DIR is set in run_all.py.
Inference cache: model forward-pass results are cached as .npz files under:
- views/model_neuronal_comparison/inference_cache/

Re-running the same analysis configuration should reuse cached inference and skip expensive forward passes.

Data/file formats (what’s inside)

Dataset `.npz` (under `data/dataset/`)

Datasets are stored as NumPy .npz archives. Code paths that read them include:

src/python/model/train/seq2vec_data.py (training/validation splits)
src/python/model/analysis/preprocess.py (analysis bundles; also uses “full trial” arrays for some analyses)

Common keys you will see (depending on how the dataset was generated):

Windowed / shifted-window data:
- X_train, E_train, X_val, E_val (and optionally X_test, E_test)
- train_idx, val_idx (and optionally test_idx)
Full-trial strips (used for “full trial” and shifted-window analysis bundles):
- X_full_train, E_full_train, X_full_val, E_full_val, X_full_test, E_full_test
Offsets:
- per_trial_offsets (optional; otherwise code may derive default offsets)

Inference cache `.npz` (under `views/.../inference_cache/`)

Cached forward-pass outputs produced by src/python/model/analysis/preprocess.py contain:

vec: (n_samples, hidden_dim)
rnn_out: (n_samples, T, hidden_dim)
slot_logits: (n_samples, T, 6)
logits: (n_samples, T, 6)

Checkpoint `.pth` (under `models/`)

PyTorch checkpoint used to load model weights for inference/training (e.g. autoencoder checkpoints used by run_all.py).

MATLAB reference `.mat` (under `src/ref_matlab/`)

src/ref_matlab/data_seq2vec_v5_m06.mat (generated by the MATLAB scripts here) is saved with variables like:

ses, cbl_vec, seq_b, seq_btype, seq_b2, seq_b2_2, mask, sbj_m, learning

Reference + data artifacts tracked in git (not exhaustive outputs)

This repo intentionally gitignores most large artifacts, but does include a few small/representative references:

Dataset example: data/dataset/seq2vec_dataset_6_8_r0.8_s42_rew.npz
Checkpoint example: models/autoenc_6_8_conv_lr3e-3_a0_pk0_nll_pmse_pmm01_flip.pth
MATLAB reference: src/ref_matlab/data_seq2vec_v5_m06.mat + extraction scripts in src/ref_matlab/
Probe/params JSON:
- glm_params_probe.json, glm_params_probe_quick.json (repo root)
- src/python/model/analysis/regression_params.json, src/python/model/analysis/glm_params.json
Misc reference text: email_from_Mark.txt

Common gotchas (internal)

Run from the repo root and set PYTHONPATH=. (otherwise from src.python... imports will fail).
Many large artifacts are intentionally not tracked by git (see .gitignore). You’re expected to have the relevant datasets/checkpoints present under your configured DATA_HOME.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seq2vec

Repo layout (high level)

Folder-by-folder details (what goes where)

`src/python/`

`src/ref_matlab/`

`data/` (under `DATA_HOME`)

`models/` (under `DATA_HOME`)

`views/` (under `DATA_HOME`)

Requirements

Setup (virtualenv)

PowerShell

bash

Configure `DATA_HOME` (required)

Running

PowerShell

bash

Alternative (direct script invocation)

What `run_all.py` expects

Outputs and caching

Data/file formats (what’s inside)

Dataset `.npz` (under `data/dataset/`)

Inference cache `.npz` (under `views/.../inference_cache/`)

Checkpoint `.pth` (under `models/`)

MATLAB reference `.mat` (under `src/ref_matlab/`)

Reference + data artifacts tracked in git (not exhaustive outputs)

Common gotchas (internal)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data/dataset		data/dataset
models		models
src		src
.gitignore		.gitignore
README.md		README.md
email_from_Mark.txt		email_from_Mark.txt
glm_params_probe.json		glm_params_probe.json
glm_params_probe_quick.json		glm_params_probe_quick.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
seq2vec.code-workspace		seq2vec.code-workspace
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

seq2vec

Repo layout (high level)

Folder-by-folder details (what goes where)

src/python/

src/ref_matlab/

data/ (under DATA_HOME)

models/ (under DATA_HOME)

views/ (under DATA_HOME)

Requirements

Setup (virtualenv)

PowerShell

bash

Configure DATA_HOME (required)

Running

PowerShell

bash

Alternative (direct script invocation)

What run_all.py expects

Outputs and caching

Data/file formats (what’s inside)

Dataset .npz (under data/dataset/)

Inference cache .npz (under views/.../inference_cache/)

Checkpoint .pth (under models/)

MATLAB reference .mat (under src/ref_matlab/)

Reference + data artifacts tracked in git (not exhaustive outputs)

Common gotchas (internal)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`src/python/`

`src/ref_matlab/`

`data/` (under `DATA_HOME`)

`models/` (under `DATA_HOME`)

`views/` (under `DATA_HOME`)

Configure `DATA_HOME` (required)

What `run_all.py` expects

Dataset `.npz` (under `data/dataset/`)

Inference cache `.npz` (under `views/.../inference_cache/`)

Checkpoint `.pth` (under `models/`)

MATLAB reference `.mat` (under `src/ref_matlab/`)

Packages