Skip to content

Latest commit

 

History

History
151 lines (95 loc) · 7.27 KB

File metadata and controls

151 lines (95 loc) · 7.27 KB

Crokage

Crowd Knowledge Answer Generator. Feel free to try our tool. For more details about the technique, please check our journal paper, which is an extension of our original paper. This project is an extension of the original project.

About

Goal

To provide comprehensive solutions for daily programming tasks containing code examples and succinct explanations (limited to Java in this initial vesion).

Input / Output

  • Input: API related query in natural language.
  • Output: Code examples containing explanations.

How

CROKAGE receives as input a query written in natural language and uses state-of-art text retrieval models combined with three state-of-art API recommender tools to retrieve the most related Stack Overflow answers to that query, sorted by relevance. CROKAGE then uses natural language processing to extract the code and relevant sentences to compose a summary containing the solution for the query.

Features and comparison with other state-of-art works (AnswerBot and BIKER)

  • AnswerBot is limited as it does not provide code.
  • BIKER is limited as its documentation is limited to JAVA SE and does not provide code for every query.
  • CROKAGE address both limitations by providing relevant code and explanations in form of summaries.

Prerequisites

Note: all the experiments were conducted over a server equipped with 86 GB RAM, 3.1 GHz on 12 cores and 64-bit Linux Mint Cinnamon operating system. We strongly recommend a similar or better hardware environment. The operating system however could be changed.

Softwares:

  1. Java 1.8
  2. Postgres 9.4 - Configure your DB to accept local connections. An example of pg_hba.conf configuration:
...
# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     md5
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
...
  1. PgAdmin (we used PgAdmin 4) but feel free to use any DB tool for PostgreSQL.

Configuring the dataset

  1. Download the SO Dump of March 2019 here. This is a preprocessed dump, downloaded from the official web site containing the main tables we use. The Postsmin table (representing posts table) has extra columns with the preprocessed data used by Crokage.

  2. On your DB tool, create a new database named stackoverflow2019emse-min. This is a query example:

CREATE DATABASE stackoverflow2019emse-min
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'en_US.UTF-8'
       LC_CTYPE = 'en_US.UTF-8'
       CONNECTION LIMIT = -1;
  1. Restore the downloaded dump to the created database.

Obs: restoring this dump would require at least 10 Gb of free space. If your operating system runs in a partition with insufficient free space, create a tablespace pointing to a larger partition and associate the database to it by replacing the "TABLESPACE" value to the new tablespace name: TABLESPACE = tablespacename.

  1. Assert the database is sound. Execute the following SQL command: select id, title,body,processedtitle,processedbody,code, processedcode from postsmin po limit 10. The return should list the main fields for 10 posts.

Running Crokage.

Download auxiliary files

1- Make a home dir, for example /home/user/crokage

2- Git clone this project inside the home dir:

git clone https://github.com/muldon/crokage-emse-replication-package.git

In the end, you will have the structure: /home/user/crokage/crokage-emse-replication-package

3- Download the jar

Download our fat jar here. Place it along with the downloaded files (/home/user/crokage/crokage-emse-replication-package).

4 - Check your instalation

Make sure your crokage folder (/home/user/crokage/crokage-emse-replication-package) contains this structure:

..
./data 
crokage.jar
main.properties   (not "main.properties.txt") 
...

Obs: if for some reason you opt to zip and download, make sure the extracted file main.properties does not change to main.properties.txt.

Obs 2: for now we only provide the replication package containing the files for the reproduction of CROKAGE, along with the results of the User Study, described in our paper. The complete source code will be released soon.

Setting Parameters

Edit main.properties and set the ######### Must be set following parameters:

CROKAGE_HOME = the root folder of the project (ex /home/user/crokage/crokage-emse-replication-package).

spring.datasource.username = your db user

spring.datasource.password= = your db password

spring.datasource.url= your database URL, as for example: jdbc:postgresql://localhost:5432/stackoverflow2019emse-min.

Running the jar

Open a terminal, go to the folder where the jar file and main.properties are located and run the following command: java -Xms1024M -Xmx50g -jar crokage.jar --spring.config.location=./main.properties . This command use the file main.properties to overwrite the default parameters which must be set as described above.

Results

The results are displayed in the terminal/console but also stored in the database in tables metricsresults. The following query should return the results:  

select * from metricsresults

Tool

We implemented our approach in form of a tool to assist developers with their daily programming issues. The figure below shows the tool architecture. We follow a REST (Representational State Transfer) architecture. The tool is in beta version and only provide solutions for Java language, but we expect to release the full version soon. If you wish to use our tool to provide solutions to your natural language queries, please follow the instructions here.

CROKAGE's architecture

Citation

If you intend to use this work, please cite us:

@article{SilvaEMSE2020,
author = {Silva, Rodrigo F. G. and Roy, Chanchal K. and Rahman, Mohammad Masudur and Schneider, Kevin A. and Paixao, Klerisson and Dantas, Carlos and de Almeida Maia, Marcelo},
title={{CROKAGE}: Effective Solution Recommendations for Programming Tasks by Leveraging Crowd Knowledge},
journal={Empirical Software Engineering (accepted)},
year=2020}

License

This project is licensed under the MIT License - see the LICENSE file for details