PyTorch implementation of Deep Code Search.
Tested in MacOS 10.12, Ubuntu 16.04
- Python 3.6
- PyTorch
- tqdm
pip install -r requirements.txt
models
: neural network models for code/desc representation and similarity measure.modules.py
: basic modules for model construction.train.py
: train and validate code/desc representaton models;repr_code.py
: encode code into vectors and store them to a file;search.py
: perform code search;configs.py
: configurations for models defined in themodels
folder. Each function defines the hyper-parameters for the corresponding model.data_loader.py
: A PyTorch dataset loader.utils.py
: utilities for models and training.
If you want a quick test, here is a pretrained model. Put it in ./output/JointEmbeder/github/202106140524/models/
and run:
python repr_code.py -t 202106140524 --reload_from 4000000
python search.py -t 202106140524 --reload_from 4000000
The /data
folder provides a small dummy dataset for quick deployment.
To train and test our model:
-
Download and unzip real dataset from Google Drive or Baidu Pan for Chinese users.
-
Replace each file in the
/data
folder with the corresponding real file.
Edit hyper-parameters and settings in config.py
python train.py --model JointEmbeder -v
python repr_code.py --model JointEmbeder -t XXX --reload_from YYY
where XXX
stands for the timestamp, and YYY
represents the iteration with the best model.
python search.py --model JointEmbeder -t XXX --_reload_from YYY
where XXX
stands for the timestamp, and YYY
represents the iteration with the best model.
Here is a screenshot of code search:
If you find it useful and would like to cite it, the following would be appropriate:
@inproceedings{gu2018deepcs,
title={Deep Code Search},
author={Gu, Xiaodong and Zhang, Hongyu and Kim, Sunghun},
booktitle={Proceedings of the 2018 40th International Conference on Software Engineering (ICSE 2018)},
year={2018},
organization={ACM}
}