Deep Learning Type Inference of Python Function Signatures using their Natural Language Context
DLTPy makes type predictions based on comments, on the semantic elements of the function name and argument names, and on the semantic elements of identifiers in the return expressions. Using the natural language of these different elements, we have trained a classifier that predicts types. We use a recurrent neural network (RNN) with a Long Short-Term Memory (LSTM) architecture.
Read our paper for the full details.
Downloads projects, extracts comments and typesm and gives a csv file per project containing all functions.
Start using:
$ python preprocessing/pipeline.py
Optional arguments:
-h, --help show this help message and exit
--projects_file PROJECTS_FILE
json file containing GitHub projects
--limit LIMIT limit the number of projects for which the pipeline
should run
--jobs JOBS number of jobs to use for pipeline.
--output_dir OUTPUT_DIR
output dir for the pipeline
--start START start position within projects list
input-preparation/generate_df.py
can be used to combine all the separate csv files per project into one big file
while applying filtering.
input-preparation/df_to_vec.py
can be used to convert this generated csv to vectors.
input-preparation/embedder.py
can be used to train word embeddings for input-preparation/df_to_vec.py
.
The different RNN models we evaluated can be found in learning/learn.py
.
$ pytest
The MIT License (MIT). Please see the license file for more information.