BraIn-to-Text (BIT) is an end-to-end speech BCI framework that connects a transformer-based neural encoder with audio LLMs to directly generate sentences from neural activity.
conda env create -f env.yamlDownload data from DRYAD and rename it to brandman_2024_text.
Download competitionData.tar.gz from DRYAD and rename it to willett_2023_text.
Update trainer YAML to use your own data and checkpoint path. For example, change the following entries in configs/finetune/phoneme/ndt/trainer.yaml:
dirs:
data_dir: YOUR_DATA_DIR
checkpoint_dir: YOUR_CHECKPOINT_DIR
log_dir: YOUR_LOG_DIRRun the following command to train a model:
python train.py --training_mode MODE \
--dataset DATASET \
--features FEATURES \
--encoder ENCODER \
--task TASK \
[--ft_ckpt CKPT] \
[--ds_config DS_CONFIG] \
[--kwargs KEY=VALUE ...]- --training_mode: train_from_scratch, finetune
- --encoder: ndt
- --task: phoneme, sentence
- --dataset: willett_2023_text, brandman_2024_text
- --features: all, tx1, spikePow
- --ft_ckpt: path to finetuned model checkpoint (optional)
- Train from scratch for phoneme decoding:
python train.py --training_mode train_from_scratch \
--dataset brandman_2024_text \
--features all \
--encoder ndt \
--task phoneme- Fine-tune the above model for sentence decoding:
python train.py --training_mode finetune \
--dataset brandman_2024_text \
--features all \
--encoder ndt \
--task sentence \
--ft_ckpt YOUR_MODEL_PATHOnce you have the fine-tuned model, you can generate sentence predictions in two stages:
- Run the following command to predict phonemes:
python eval_phoneme.py --model_path YOUR_MODEL_PATH --eval_split val- --eval_split: val, test, holdout
- "val" corresponds to the benchmark test set; use "holdout" for the holdout set of the competition
- {eval_split}_phoneme_logits.pt is saved for use in language model rescoring
- Run the following command to predict sentences using an LLM:
python eval_llm.py --model_path YOUR_MODEL_PATH --eval_split val- --nbest: number of candidate sentences for nucleus sampling (optional)
- --phoneme_logits_path: path to saved phoneme logits (optional)
Please cite our paper if you use this code in your own work:
@inproceedings{zhangcross,
title={A cross-species neural foundation model for end-to-end speech decoding},
author={Zhang, Yizi and He, Linyang and Fan, Chaofei and Liu, Tingkai and Yu, Han and Le, Trung and Li, Jingyuan and Linderman, Scott and Duncker, Lea and Willett, Francis R and others},
booktitle={The Fourteenth International Conference on Learning Representations}
}