Can Language Models Teach Themselves to Prove Better?

Setup (for Ubuntu 22.04.1 LTS)

Install dependencies for pycoq
Use conda to install the "coq" and "train" environments from the yml files (for interacting with coq/testing the model and training the model respectively)
gather theorem files from coq and for the test set (not provided as it contains class material and assignments)
run coq_parser.py to extract theorems from the *.v files
run test_proof.py to filter out theorems that don't compile
fill in OpenAI API keys in codex.py
run training_scripts/splits.py to create a train/validation/test split for the training data
run train.sh to train the model; fill in the path for the checkpoint for the model_name_or_path parameter in the training script for subsequent training sessions and the model_path variable in gpt_neo.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
training_scripts		training_scripts
.gitignore		.gitignore
README.md		README.md
codex.py		codex.py
coq_environment.yml		coq_environment.yml
coq_parser.py		coq_parser.py
coqtest.py		coqtest.py
gpt_neo.py		gpt_neo.py
test_proof.py		test_proof.py
train.sh		train.sh
train_environment.yml		train_environment.yml