Project for CS 386L: Programming Languages. Based on "Language Models Can Teach Themselves to Program Better" (Patrick Haluptzok, Matthew Bowers, Adam Tauman Kalai)
- Install dependencies for pycoq
- Use conda to install the "coq" and "train" environments from the yml files (for interacting with coq/testing the model and training the model respectively)
- gather theorem files from coq and for the test set (not provided as it contains class material and assignments)
- run
coq_parser.pyto extract theorems from the *.v files - run
test_proof.pyto filter out theorems that don't compile - fill in OpenAI API keys in
codex.py - run
training_scripts/splits.pyto create a train/validation/test split for the training data - run
train.shto train the model; fill in the path for the checkpoint for themodel_name_or_pathparameter in the training script for subsequent training sessions and themodel_pathvariable ingpt_neo.py
- use
codex.py evalto evaluate the test set using Codex - use
gpt_neo.py evalto evaluate the test set using GPT-Neo - use
codex.py generateto generate more theorems using Codex - use
gpt_neo.py generateto generate more theorems using GPT-Neo