Skip to content

Commit a60032b

Browse files
committed
refactor: optimize dataset name
1 parent 6d92053 commit a60032b

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

docs/curate_dataset.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ python scripts/curate/dataset_ensemble_clone.py
2020

2121
> [!Tip]
2222
>
23-
> **Output**: `repoqa-{datetime}.json` by adding a `"content"` field (path to content) for each repo.
23+
> **Output**: `repoqa-snf-{datetime}.json` by adding a `"content"` field (path to content) for each repo.
2424
2525

2626
### Step 3: Dependency analysis
@@ -45,23 +45,23 @@ python scripts/curate/dep_analysis/{language}.py # python
4545
### Step 4: Merge step 2 and step 3
4646

4747
```shell
48-
python scripts/curate/merge_dep.py --dataset-path repoqa-{datetime}.json
48+
python scripts/curate/merge_dep.py --dataset-path repoqa-snf-{datetime}.json
4949
```
5050

5151
> [!Tip]
5252
>
5353
> **Input**: Download dependency files in to `scripts/curate/dep_analysis/data`.
5454
>
55-
> **Output**: Update `repoqa-{datetime}.json` by adding a `"dependency"` field for each repository.
55+
> **Output**: Update `repoqa-snf-{datetime}.json` by adding a `"dependency"` field for each repository.
5656
5757

5858
### Step 5: Function collection with TreeSitter
5959

6060
```shell
6161
# collect functions (in-place)
62-
python scripts/curate/function_analysis.py --dataset-path repoqa-{datetime}.json
62+
python scripts/curate/function_analysis.py --dataset-path repoqa-snf-{datetime}.json
6363
# select needles (in-place)
64-
python scripts/curate/needle_selection.py --dataset-path repoqa-{datetime}.json
64+
python scripts/curate/needle_selection.py --dataset-path repoqa-snf-{datetime}.json
6565
```
6666

6767
> [!Tip]
@@ -72,7 +72,7 @@ python scripts/curate/needle_selection.py --dataset-path repoqa-{datetime}.json
7272
### Step 6: Annotate each function with description to make a final dataset
7373

7474
```shell
75-
python scripts/curate/needle_annotation.py --dataset-path repoqa-{datetime}.json
75+
python scripts/curate/needle_annotation.py --dataset-path repoqa-snf-{datetime}.json
7676
```
7777

7878
> [!Tip]
@@ -85,7 +85,7 @@ python scripts/curate/needle_annotation.py --dataset-path repoqa-{datetime}.json
8585
### Step 7: Merge needle description to the final dataset
8686

8787
```shell
88-
python scripts/curate/merge_annotation.py --dataset-path repoqa-{datetime}.json --annotation-path {output-desc-path}.jsonl
88+
python scripts/curate/merge_annotation.py --dataset-path repoqa-snf-{datetime}.json --annotation-path {output-desc-path}.jsonl
8989
```
9090

9191
> [!Tip]

0 commit comments

Comments
 (0)