Skip to content

bibosx0/clocom

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

145 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INTRODUCTION:

This tool generates code comments automatically by 
analyzing existing software repositories. 
It applies code clone detection techniques to discover 
similar code segments and use the comments from some code segments 
to describe the other similar code segments. 

It is a hybrid technique that can generate code comments
by mining (1) StackOverflow and (2) open source projects.

INSTALLATION:

sudo apt-get install openjdk-8-jdk

Place the following tars under the folder, lib2,
which can be obtained from Stanford's website
(https://stanfordnlp.github.io/CoreNLP/): 

stanford-corenlp-3.8.0.jar  
stanford-corenlp-3.8.0-models.jar

HOW TO RUN:

1.
Configure "config.xml"'s <database> and <project> tag
with the appropriate folder path.
The folders should contain Java source code files.
The 'database' can be a StackOverflow database
and the 'project' can be any Java project's source code.

Code inside the <project> tag represents the target project
that you want to generate comments for.

Code inside the <database> tag represents the database.
CloCom will extract code comments from this folder for
the code inside the <project> folder.

See the "config" folder for xml examples and 
modify the parameters to suit your needs.

2. 
'make' to compile.

3. 
Run the shell script with the config file as the first input argument.

./cloneDigger.sh config.xml

4.

The output is a list of clones from the database path and 
the respective generated comment.

NOTES on the XML parameters:

databaseFormat - 0 for standard Java source code, 1 for autocomment format

minNumLines - number of statements a clone should have
matchAlgorithm - 1 for gapped, 0 for non-gapped
matchMode - 1 for between comparison (loads files in the project path into memory), 0 for full mesh
gapSize - set this for gapped comparision
meshBlockSize - number of source code files to load into memory for full mesh
numberThreads - 0 to disable multithreading, else specify the number of cores

debug - turn on debug statements
removeEmpty - remove the display of clones that doesn't have a comment
forceRetokenization - set true if we want to retokenize the database directory, else it will only tokenize as needed
loadDatabaseFilePaths - use the existing cached list of files

Please use commit
f89b780bac5ea7322633550c29fffd3c199e1f2e
if you would like to reproduce the results of
CloCom's SANER conference paper.

PUBLICATION:
- Edmund Wong, Taiyue Liu, Lin Tan, CloCom: Mining Existing Source Code for Automatic Comment Generation", International Conference on Software Analysis, Evolution, and Reengineering (SANER) 2015

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 97.4%
  • Python 2.5%
  • Other 0.1%