-
Requirements
- Torch, Cutorch (http://torch.ch/docs/getting-started.html)
- Python packages unidiff, pygments:
pip install unidiff pygments
-
Setup environment
- Clone this repositoty:
cd ~git clone https://github.com/epochx/commitgen-dev.git - Create data path:
mkdir ~/data/preprocessing - Export env variable:
export env WORK_DIR=~/data(without trailing slash!)
- Clone this repositoty:
-
Download our paper data:
- Get the raw commit data used in our paper from https://osf.io/67kyc/?view_only=ad588fe5d1a14dd795553fb4951b5bf9 (click on "OSF Storage" and then on "Download as zip".) Unzip the file where convenient.
- Unzip the desired dataset zip and move the resulting folder to
~/data.
-
Pre-process data
- Parse and filter commits and messages:
cd ~/commitgenpython ./preprocess.py FOLDER_NAME --language LANGUAGE, whereFOLDER_NAMEis the name of the folder from the previous step. Add the '--atomic' flag to keep only atomic commits. This will generate a pre-processed version of the dataset in a pickle file in~/data/preprocessing. Trypython ./preprocess.py --helpfor more details on additional pre-processing parameters. - Generate training data:
cd ~/commitgen./buildData.sh PICKLE_FILE_NAME LANGUAGE(PICKLE_FILE_NAMEwith no .pickle).
- Parse and filter commits and messages:
-
Train the model 1.- Run the model
cd ~/commitgen./run.sh PICKLE_FILE_NAME LANGUAGE(PICKLE_FILE_NAME with no .pickle)
You can also dowload additional github project data by using our crawler do cd ~/commitgen and run python crawl_commits.py --help for more details on how to do it.