Skip to content

TartuNLP/gec-llm

Repository files navigation

To Err Is Human, but Llamas Can Learn It Too

Fine-tuning Llama for GEC

This repository contains the fine-tuning, inference and data formating scripts for fine-tuning and continued-pretraining of Llama-2 for GEC.

See scripts/gec for example scripts.

Models

Models for GEC trained on 1M Llama-generated errors, then gold errors:

Models for AEG (artificial error generation):

Synthetic data generated with AEG models: tartuNLP/aeg-data.

You can also find all the models in our HuggingFace collection

Citation

@misc{luhtaru2024errhumanllamaslearn,
      title={To Err Is Human, but Llamas Can Learn It Too}, 
      author={Agnes Luhtaru and Taido Purason and Martin Vainikko and Maksym Del and Mark Fishel},
      year={2024},
      eprint={2403.05493},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2403.05493}, 
}

Acknowledgements

Code originally based on github.com/TartuNLP/llammas.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published