FairEMG

Accepted at TMLR 2025

FairEMG is a highly extensible platform for research on sEMG and other biosignal modalities. It is currently able to train models with various featurizers (spectrogram, convolutional) and encoders (transformer, TDS) on the emg2qwerty transcription task. It supports generic supervised training on CTC loss and logits and feature knowledge distillation loss.

Official open source release for the "Scaling and Distilling Transformer Models for sEMG" paper, accepted at TMLR.

⚙️ Installation

Clone this repo and setup environment

cd ~
git clone git@github.com:facebookresearch/fairemg.git && cd fairemg
conda env create -f environment.yml
conda activate fairemg
pip install -e .

Test environment installation on randomly generated data

cd ~/fairemg/src && conda activate fairemg
./fairemg/scripts/run_dummy_qwerty_local.sh # cpu
./fairemg/scripts/run_dummy_qwerty_distill_local.sh # cpu

🗂️ Downloading & Preparing the data

Download the raw data (instructions taken from emg2qwerty codebase)

# Download the dataset, extract, and symlink to fairemg repo
cd ~ && wget https://fb-ctrl-oss.s3.amazonaws.com/emg2qwerty/emg2qwerty-data-2021-08.tar.gz
tar -xvzf emg2qwerty-data-2021-08.tar.gz
mkdir ~/fairemg/data
ln -s ~/emg2qwerty-data-2021-08 ~/fairemg/data/emg2qwerty

Process the dataset into sharded dataset for training. The following will preprocess the data and produce 'shards' of data containing training and evaluation samples:

cd ~/fairemg && conda activate fairemg
python scripts/process_qwerty.py

You can find these sharded datasets (aka partitions) in ~/fairemg/data/sharded, they are identified by a hash created from the parameters we used to create them (See PartitionConfig in ./src/fairemg/config/partition.py for more details). Finally, test installation on the following scripts which will train models on the data partition you have just created:

cd ~/fairemg/src && conda activate fairemg
./fairemg/scripts/run_qwerty_local.sh # cpu
./fairemg/scripts/run_qwerty_distill_local.sh # cpu

🚀 Launch an experiment from the paper

We provide scripts to recreate the main results of the "Scaling and Distilling Transformer Models for sEMG" paper. For exploring the code with local training, follow the following intructions:

cd ~/fairemg/src/ && conda activate fairemg

# Create the callable `.sh` scripts
python fairemg/scripts/generate_scripts.py --job-name figure_3_supervised --task qwerty --interactive

# Run the generated scripts. Each script represents an hyperparameter configuration within the sweep
bash ./fairemg/scripts/figure_3_supervised/figure_3_supervised-interactive-1.sh
bash ./fairemg/scripts/figure_3_supervised/figure_3_supervised-interactive-2.sh
...

In order to run the full scale (multi-node) version of the experiment, we provide the code to run the experiment within a slurm cluster. sweep-parallelism, time and partition are parameters that will be passed to slurm.

cd ~/fairemg/src/ && conda activate fairemg

# Create the callable `.sh` scripts
python fairemg/scripts/generate_scripts.py \
    --job-name figure_3_supervised \
    --task qwerty \
    --sweep-parallelism -1 \
    --time 4320 \
    --partition scavenge \
    --monitor-jobs

# Run the generated sweep script. This will launch a slurm job array for all hyperparameter configuration.
bash fairemg/scripts/figure_3_supervised/figure_3_supervised-sweep.sh

If you do not have access to a slurm cluster, you will have to change the launching mechanism at ./src/fairemg/hf/app/torchrun_slurm.py to fit your computing platform.

Available experiments

We also provide scripts for the other experiments in the "Scaling and Distilling Transformer Models for sEMG" paper, the procedure is the same as above, you just have to change the flags of the generate_scripts.py scripts:

figure_3_supervised: Red curve in Figure 3
- --job-name figure_3_supervised
- --task qwerty
figure_3_distilled: Blue curve in Figure 3
- --job-name figure_3_distilled
- --task qwerty-distill
- Note: You will need to provide the path to a trained model model.safetensors checkpoint file. Follow instructions from the 'Initializing models from checkpoint weights' section
table_3_personalization_from_supervised: 'Standard' column of the personalization 'SMALL' row in Table 3
- Note: You will need to provide the path to a trained model model.safetensors checkpoint file. Follow instructions from the 'Initializing models from checkpoint weights' section
- --job-name table_3_personalization_from_supervised
- --task qwerty-personalization
table_3_personalization_from_distilled: 'Distilled' column of the personalization 'SMALL' row in Table 3
- Note: You will need to provide the path to a trained model model.safetensors checkpoint file. Follow instructions from the 'Initializing models from checkpoint weights' section
- --job-name table_3_personalization_from_distilled
- --task qwerty-personalization
table_4_tds: 'TDS' row in Table 4
- --job-name table_4_tds
- --task qwerty
table_4_transformer: 'Transformer' row in Table 4
- --job-name table_4_transformer
- --task qwerty
table_5_augment: All of Table 5
- --job-name table_5_augment
- --task qwerty

🏋️ Initializing models from checkpoint weights

This section details how to initialize a model (e.g., for the distillation teacher or the generic model for emg2qwerty personalization).

After running a supervised training job (e.g., the figure_3_supervised experiment, or a subset of it)
Use the ./notebooks/qwerty.ipynb notebook to find the job id of a model.
- Note: Each individual training run has a unique identifier following the template user_<USER>_name_<job_name>_seed_<seed>_id_<hash>, where <hash> is created from the run configuration using hashlib.sha224
You will find the logs and checkpoints of your runs in ~/fairemg/logs/<id> folder.
Locate the checkpoint file ~/fairemg/logs/<id>/checkpoint-<n>/model.safetensors, where <n> is the training step at which the checkpoint was created.
Put the absolute path of the safetensor file in the path_to_initialize_weights fields of the configuration to load in the weight in the submodules you want to init from the pre-trained model.
- Note: You will have to specify a key_to_select_weights alongside the path to model weights in order to specify a subset of the keys of the state dict that is associated with the module.

🎨 Create a custom experiment

Create a name for your experiment (e.g., custom_experiment).
All scripts are executed from src folder: cd src.
Create a new folder inside ./fairemg/scripts with the name of your experiment (e.g. mkdir ./fairemg/scripts/custom_experiment).
Create an hps.json file (touch ./fairemg/scripts/custom_experiment/hps.json) into this folder this acts as the hyperparameter overwrite to the default parameters found in ./fairemg/scripts/default_hps/<task>. Where <task> can be:

qwerty for generic emg2qwerty training
qwerty-distill for generic emg2qwerty training with distillation signal
qwerty-personalization for personalization emg2qwerty training

Follow the procedure from the 'Launch an experiment from the paper' section to launch the experiment.

📊 Processing experimental results

Setup a jupyter notebook:

cd ~/fairemg && conda activate fairemg
PYTHONPATH=. jupyter lab --port 8200

Open the ./notebooks/qwerty.ipynb notebook
In the 3rd cell, add the experiment names you want to fetch the results of in the job_names_to_fetch list (e.g., figure_3_supervised)
Run the notebook and it will show basic loss and CER plots for throughout training
- Note: The experimental logs will be in the shape of a big pandas table where each row represents a logging call during training (usually once per training step for the training split, or once per epoch for the evaluation splits)
- Tip: You can group the table by the job the id column in order to group unique training runs together.

Contributing

See the CONTRIBUTING file for how to help out.

Citation

If you found this work useful, please consider citing:

TODO

License

fairemg is CC-BY-NC-4.0 licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
notebooks		notebooks
requirements		requirements
scripts		scripts
src/fairemg		src/fairemg
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FairEMG

⚙️ Installation

🗂️ Downloading & Preparing the data

🚀 Launch an experiment from the paper

Available experiments

🏋️ Initializing models from checkpoint weights

🎨 Create a custom experiment

📊 Processing experimental results

Contributing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

facebookresearch/fairemg

Folders and files

Latest commit

History

Repository files navigation

FairEMG

⚙️ Installation

🗂️ Downloading & Preparing the data

🚀 Launch an experiment from the paper

Available experiments

🏋️ Initializing models from checkpoint weights

🎨 Create a custom experiment

📊 Processing experimental results

Contributing

Citation

License

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages