Hi there, welcome to this pape!
The page contains the code and data used in the paper Vulnerability Discovery with Function Representation Learning from Unlabeled Projects by Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan and Yang Xiang.
- Tensorflow
- Keras
- Python >= 2.7
- CodeSensor
The dependencies can be installed using Anaconda. For example:
$ bash Anaconda3-5.0.1-Linux-x86_64.shThe Vulnerabilities_info.xlsx file contains information of the collected function-level vulnerabilities. These vulnerabilities are from 3 open source projects: FFmpeg, LibTIFF and LibPNG. And vulnerability information was collected from National Vulnerability Database(NVD) until the mid of July 2017.
The "Data" folder contains the source code of vulnerable functions within the Zip file of the 3 projects. After unzipping the files, one will find that the source code of each function was named with its CVE ID.
The "Code" folder contains the Python code sample for invoking the CodeSensor to parse functions to ASTs (for detail information and usage of CodeSensor, please visiter the author's blog: http://codeexploration.blogspot.com.au/) It also contains the Python code sample for implementing LSTM based on Keras with Tensorflow backend.
We used Understand which is a commercial code enhancement tool for extracting function-level code metrics. In CodeMetrics.xlsx file, we include 23 code metrics extracted from the vulnerable functions of 3 projects.
You are welcomed to improve our code as well as our method. Please cite our paper if you use the code/data in your work. For acquiring more data or enquiries, please contact: junzhang@swin.edu.au.
Thanks and enjoy coding!