You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
## Function Representation Learning for Vulnerability Discovery
2
2
3
-
Hi there, welcome to this pape!
3
+
Hi there, welcome to this page!
4
4
5
5
The page contains the code and data used in the paper [Vulnerability Discovery with Function Representation Learning from Unlabeled Projects](https://dl.acm.org/citation.cfm?id=3138840) by Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan and Yang Xiang.
The Vulnerabilities_info.xlsx file contains information of the collected function-level vulnerabilities. These vulnerabilities are from 3 open source projects: [FFmpeg](https://github.com/FFmpeg/FFmpeg), [LibTIFF](https://github.com/vadz/libtiff) and [LibPNG](https://github.com/glennrp/libpng). And vulnerability information was collected from [National Vulnerability Database(NVD)](https://nvd.nist.gov/) until the mid of July 2017.
23
23
24
-
The "Data" folder contains the source code of vulnerable functions and vulnerable functions within the Zip file of the 3 projects. After unzipping the files, one will find that the source code of each vulnerable function was named with its CVE ID. For the non-vulnerable functions, they were named with "{filename}_{functionname}.c".
24
+
The "Data" folder contains the source code of vulnerable functions and vulnerable functions within the Zip file of the 3 projects. After unzipping the files, one will find that the source code of each vulnerable function was named with its CVE ID. For the non-vulnerable functions, they were named with "{filename}_{functionname}.c" format.
25
25
26
26
The "Code" folder contains the Python code samples.
27
-
1) ProcessCFilesWithCodeSensor.py file is for invoking the CodeSensor to parse functions to ASTs in serialized formart (for detail information and usage of CodeSensor, please visit the author's blog: http://codeexploration.blogspot.com.au/)
27
+
1) ProcessCFilesWithCodeSensor.py file is for invoking the CodeSensor to parse functions to ASTs in serialized format (for detail information and usage of CodeSensor, please visit the author's blog: http://codeexploration.blogspot.com.au/ for more details).
28
28
2) ProcessRawASTs_DFT.py file is to process the output of ProcessCFilesWithCodeSensor.py and convert the serialized ASTs to textual vectors.
29
29
3) BlurProjectSpecific.py file is to blur the project specific content and convert the textual vectors (the output of ProcessRawASTs_DFT.py) to numeric vectors which can be used as the input of ML algorithms.
30
-
4) LSTM.py file contains the Python code sample for implementing LSTM based on Keras with Tensorflow backend.
30
+
4) LSTM.py file contains the Python code sample for implementing LSTM network based on Keras with Tensorflow backend.
31
31
32
32
We used [Understand](https://scitools.com/) which is a commercial code enhancement tool for extracting function-level code metrics. In CodeMetrics.xlsx file, we include 23 code metrics extracted from the vulnerable functions of 3 projects.
0 commit comments