Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
sbmaruf committed Apr 17, 2023
1 parent 3c33d7a commit 106cddd
Show file tree
Hide file tree
Showing 3 changed files with 106 additions and 0 deletions.
106 changes: 106 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,29 @@ This repository contains the sample code and data link for xCodeEval [paper](htt

# Data Download

Data is uploaded as a git LFS repo in huggingface.

![xCodeEval_hf](xcodeeval.png)

You can download the full data using the following command. To Download the full dataset,

```
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval
cd xCodeEval
git lfs pull
```

To download a specific part of the dataset,

```
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval
cd xCodeEval
git lfs pull --include "apr/test/*"
```

**NOTE**: Currently we don't support huggingface `load_dataset()` module. At this moment use `git lfs` to download the data.


We propose 7 Tasks.

1. [Tag Classification](./tag_classification.md)
Expand All @@ -23,6 +40,95 @@ We propose 7 Tasks.
6. [Code-Code Retrieval](./retrieval.md)
7. [NL-Code Retrieval](./retrieval.md)

# Common Data for different tasks

![xCodeEval_fig_1](xcodeeval_fig_1.png)

We have two data files that are required for multiple tasks.

1. `problem_descriptions.jsonl`
2. `unittest_db.json`

You can find these two files in the root directory of the [main](https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval/tree/main) branch of huggingface dataset repository. To avoid data redundency we didn't include these data with relevant task, rather we add a unique id `src_uid` to retrieve these data.

## Structure of `problem_descriptions.jsonl`

A sample,

```json
{
"description": "There are $$$n$$$ positive integers $$$a_1, a_2, \\dots, a_n$$$. For the one move you can choose any even value $$$c$$$ and divide by two all elements that equal $$$c$$$.For example, if $$$a=[6,8,12,6,3,12]$$$ and you choose $$$c=6$$$, and $$$a$$$ is transformed into $$$a=[3,8,12,3,3,12]$$$ after the move.You need to find the minimal number of moves for transforming $$$a$$$ to an array of only odd integers (each element shouldn't be divisible by $$$2$$$).",
"input_from": "standard input",
"output_to": "standard output",
"time_limit": "3 seconds",
"memory_limit": "256 megabytes",
"input_spec": "The first line of the input contains one integer $$$t$$$ ($$$1 \\le t \\le 10^4$$$) \u2014 the number of test cases in the input. Then $$$t$$$ test cases follow. The first line of a test case contains $$$n$$$ ($$$1 \\le n \\le 2\\cdot10^5$$$) \u2014 the number of integers in the sequence $$$a$$$. The second line contains positive integers $$$a_1, a_2, \\dots, a_n$$$ ($$$1 \\le a_i \\le 10^9$$$). The sum of $$$n$$$ for all test cases in the input doesn't exceed $$$2\\cdot10^5$$$.",
"output_spec": "For $$$t$$$ test cases print the answers in the order of test cases in the input. The answer for the test case is the minimal number of moves needed to make all numbers in the test case odd (i.e. not divisible by $$$2$$$).",
"notes": "NoteIn the first test case of the example, the optimal sequence of moves can be as follows: before making moves $$$a=[40, 6, 40, 3, 20, 1]$$$; choose $$$c=6$$$; now $$$a=[40, 3, 40, 3, 20, 1]$$$; choose $$$c=40$$$; now $$$a=[20, 3, 20, 3, 20, 1]$$$; choose $$$c=20$$$; now $$$a=[10, 3, 10, 3, 10, 1]$$$; choose $$$c=10$$$; now $$$a=[5, 3, 5, 3, 5, 1]$$$ \u2014 all numbers are odd. Thus, all numbers became odd after $$$4$$$ moves. In $$$3$$$ or fewer moves, you cannot make them all odd.",
"sample_inputs": [
"4\n6\n40 6 40 3 20 1\n1\n1024\n4\n2 4 8 16\n3\n3 1 7"
],
"sample_outputs": [
"4\n10\n4\n0"
],
"tags": [
"number theory",
"greedy"
],
"src_uid": "afcd41492158e68095b01ff1e88c3dd4",
"difficulty": 1200,
"created_at": 1576321500
}
```

### Key Definitions

1. `description`: Problem description in textual format, math operations are written in latex.
2. `input_from`: How the program should take unit test.
3. `output_to`: Where the program should output the result of the unit test.
4. `time_limit`: Time limit to solve the problem.
5. `memory_limit`: Memory limit to solve the problem.
6. `input_spec`: How and what order the input will be given to the program. It also include the data range, types and sizes.
7. `output_spec`: How the outputs should be printed. Most of the time the unit test results are matched with *exact string match* or *floating point comparison* with a precision boundary.
8. `sample_inputs`: A sample input for the code that is expected to solve the problem described in `description`.
9. `sample_outputs`: The expected output for the `sample_input` that is expected to solve the problem described in `description`.
10. `notes`: Explanation of `sample_inputs` & `sample_outputs`.
11. `tags`: The problem categories.
12. `src_uid`: The unique id of the problem. This ID is referred in the task data samples instead of putting all these information.
13. `difficulty`: How difficult is it to solve the problem for a human (annotated by an expert human).
14. `created_at`: The unix timestamp at when the problem was released. Use `datetime` lib in python to parse it to a human readable format.

## Structure of `unittest_db.json`

The structure of the `json` file,

```python
unittest_db = {
"db884d679d9cfb1dc4bc511f83beedda" : [
{
"input": "4\r\n3 2 3 2\r\n",
"output": [
"1"
],
},
{
...
},
...
]
"3bc096d8cd3418948d5be6bf297aa9b5":[
...
],
...
}
```

### Key Definitions

1. `unittest_db.json` dict keys i.e., `db884d679d9cfb1dc4bc511f83beedda` are the `src_uid` from `problem_descriptions.jsonl`.
2. `input` : Input of the unit test.
3. `output` : List of expected outputs for the unit test.

# Citation

```
Expand Down
Binary file added xcodeeval.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added xcodeeval_fig_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 106cddd

Please sign in to comment.