This repository contains the implementation for the paper Vocabulary-level Memory Efficiency for Language Model Fine-tuning.
Partial Embedding Matrix Adaptation is a simple technique that can reduce the memory footprint of language model fine-tuning without impacting performance.
pip install git+https://github.com/mlsw/partial-embedding-matrix-adaptation.git
There is a high-level API for Hugging Face Transformers PyTorch models via the HFEmbeddingPruner
class.
from partial_embedding_matrix_adaptation import HFEmbeddingPruner
embedding_pruner = HFEmbeddingPruner(model)
dataset, _ = embedding_pruner.prepare_model(tokenizer, dataset)
Please see examples/distilbert_sst2.py for a complete example. Additionally, the scripts in the utils directory show how to use this API with the Hugging Face Transformers Trainer.
Alternatively, the EmbeddingPruner
class can be used directly for PyTorch models. Please see HFEmbeddingPruner
for an example of how to use this.
The following scripts can be used to reproduce the results from the paper. These are adapted from Hugging Face Transformers PyTorch Examples with support for Partial Embedding Matrix Adaptation.
Task | Script | Documentation |
---|---|---|
GLUE | run_glue_pema.py | Here |
XNLI | run_xnli_pema.py | Here |
This project is licensed under the terms of the MIT license. Please see LICENSE for more details.
If you found this work useful, please consider citing our paper:
@misc{williams-aletras-2025-vocabulary,
title={Vocabulary-level Memory Efficiency for Language Model Fine-tuning},
author={Miles Williams and Nikolaos Aletras},
year={2025},
eprint={2309.08708},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2309.08708}
}