- 93suhwan/logicalErrorFix ์์ ๊ฐ ์ฌ์์ฐ ๊ฐ๋ฅํ ์ฝ๋ ์์ฑ
- 93suhwan/logicalErrorFix ํ๊ฒ commit: 0905e284f417eef450359afb18a775d802fab3b2
93suhwan/logicalErrorFix
ํด๋ก ๋ฐ ์ธํ ์๋ฃlogicalErrorFix
ํด๋์์
git clone https://github.com/SemteulGaram/logicalErrorFix_LYK.git
- ํด๋๋ช ๋ณ๊ฒฝ
mv logicalErrorFix_LYK lyk
- ์๋ Available Scripts ๋ฆฌ์คํธ์์ ์ํ๋ ์คํฌ๋ฆฝํธ๋ฅผ ์ฐพ๊ณ ์คํฌ๋ฆฝํธ๋ฅผ ์ด์ด ์ต์๋จ
# How to use:
์ง์๋ฌธ ๋ฐ๋ฅด๊ธฐ - Scripts์ ๋ชจ๋ ๋ช
๋ น์ด๋
logicalErrorFix/lyk
ํด๋๊ฐ ์๋logicalErrorFix
์์ ์คํํด์ผํจ
mdcpp1_1_convert_riegeli2cpp.py
- Riegeli ๋ฅผ ์ด์ฉํด ๋ํํ์ผ์์ CPP ํ์ผ๊ณผ ๋ฐ์ดํฐ๋ง ์ถ๋ ฅํ๊ธฐ
- Input File:
/tmp/dm-code_contests/*
(๋ํ ํ์ผ, 93suhwan/logicalErrorFix ํ์ธ ํ ๊ฒ) - Output File:
data/[cppfiles_batch_[correct,incorrect]_[test,train,valid],descriptions,samples,private_samples,generated_samples]
- Diff: FS ๋ณ๋ชฉ์ ํด์ํ๊ธฐ ์ํด,
93suhwan/logicalErrorFix/convert-riegeli2cpp.py
์ ์ถ๋ ฅ์data/cppfiles_batch_[correct,incorrect]_[test,train,valid]/[problem_id].txt
ํ์ผ์ ๋ค์๊ณผ ๊ฐ์ ์์ผ๋ก ์๋ฆฌ์ผ๋ผ์ด์ฆํจ\nยถยถยถ\n[id]\nยถยถ\n[code]\nยถยถยถ\n[id]\nยถยถ\n[code] ...
BROKEN PIPE: unavailable data/edit_distance/pair_solution_[test,train,valid].txt
(need request to JSH)
mdcpp1_3-make_code_edit_dist.py
- Filtered with target edit_distance
- Input: Output of
mdcpp1-1-convert_riegeli2cpp.py
anddata/edit_distance/pair_solution_[test,train,valid].txt
- Output:
data/edit_distance/pair_code_edit_dist_[test,train,valid].txt
- edit_distance: Number of tokens that need to be fixed in the original code
- Diff: it can decode cppfiles_batch_*
mdcpp1_4_make_unique_correct_list.py
- Remove duplicated dataFrame Columns fromdata/edit_distance/pair_code_edit_dist_[test,train,valid].txt
mdcpp1_5_convert_cpp2gpp.py
- Filtered out only gpp can compile it (Compile)mdcpp1_6_changsup_make_code_edit_dist.py
- Filtered out only gpp can compile it (Validate)
fl1_4_check_eval.py
- .gold ํ์ผ๊ณผ ์์ธกํ .output ํ์ผ์ ๋น๊ตํด ๋น๊ต ๊ฒฐ๊ณผ ์ถ๋ ฅ
fl2_1_convert_dataset.py
- ๊ธฐ์กด
refined_pair_code_edit_dist_[test,train,valid].txt
๊ธฐ๋ฐ ํ์ผ์์ ์์ ํด์ผ ํ๋ ์ฝ๋ stmt์์ line_no ๋ง ๋จ๊ฒจ์ fl์ฉ์ผ๋ก ๋ณํ - Input:
data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
- Output:
data/edit_distance/refined_pair_code_edit_dist_fl_[test,train,valid].txt
fl2_2_run_train.py
- ๊ธฐ์กด
train.sh์์ run.py
๋ฅผ ํตํด ํ์ต์ ํ๋ ๊ฒ์fl2
ํ๋ก์ ํธ์ฉ์ผ๋ก ๋ณ๊ฒฝํด ์คํ - Input:
data/edit_distance/refined_pair_code_edit_dist_fl_[test,train,valid].txt
- Output:
model/cpp_refined_fl
gpt1_1_request_openai_gpt3.5.py
- GPT 3.5 inference๋ฅผ ํตํด ์ฝ๋ ์์ฑ
- Input
data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
- Output:
lyk/output/gpt1-[NAME]
gpt1_1_1_calc_tiktoken.py
- OpenAI์ tiktoken ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํด ์์ฒญ์ ์ฌ์ฉ๋ ํ ํฐ ์ ๊ณ์ฐ
- Input:
data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
- Output:
lyk/output/gpt1_1_1_[NAME].db
gpt1_2_parse_and_make_code.py
- GPT ์๋ต์ ๋ถ์ํด์ ๊ตฌ์กฐํ๋ ์ฝ๋๋ก ๋ณํํด ๋ฐ์ดํฐ๋ฒ ์ด์ค ์ ์ฅ
- Input:
lyk/output/gpt1_[NAME]/*
- Output:
lyk/output/gpt1_2_[NAME].db
gpt1_3_execute_test.py
- GPT ๋ผ์ธ ์์ ์ ์ ์ฝ๋๋ฅผ ๊ธฐ์กด Incorrect_code ์ ํฉ์ณ ์์ ํ ์ปดํ์ผ, AC, TLE, WA, RE, CE ์ฌ๋ถ ๋ฐ์ดํฐ๋ฒ ์ด์ค ์ ์ฅ
- Input
data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
- Input:
lyk/output/gpt1_2_[NAME].db
- Output:
lyk/output/gpt1_3_[NAME].db
gpt1_4_make_report.py
- ๋ฐ์ดํฐ๋ฒ ์ด์ค์ ์ ์ฅ๋ ๊ฐ์ข ์ ๋ณด๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ ํ๋์ ๋ณด๊ณ ์ ์ถ๋ ฅ
- Input:
lyk/output/gpt1_3_[NAME].db
- Output:
lyk/output/gpt1_4_[NAME]/*.{png,txt}
- ๐๏ธ
ccpy1_1_convert_riegeli2py.py
- Riegeli ๋ฅผ ์ด์ฉํด ๋ํํ์ผ์์ python ํ์ผ๊ณผ ๋ฐ์ดํฐ๋ง ์ถ๋ ฅํ๊ธฐ
- Input File:
/tmp/dm-code_contests/*
(๋ํ ํ์ผ, 93suhwan/logicalErrorFix ํ์ธ ํ ๊ฒ) - Output File:
lyk/archive/data/ccpy1_raw.db
- table:
problem
- primary key:
problem_id
- colums:
[INT]problem_id
,[TEXT]description
- primary key:
- table:
problem_correct
,problem_incorrect
- primary key:
problem_id
,correct_id
(์ฒ๋ฆฌ ์ฉ์ด์ฑ์ ์ํดincorrect_id
์ธ ๊ฒฝ์ฐ๋correct_id
๋ก ํ๊ธฐ) - columns:
[INT]problem_id
,[INT]correct_id
,[TEXT]code
- primary key:
- table:
problem_public_test
,problem_private_test
,problem_generated_test
- primary key:
problem_id
,input
,output
- columns:
[INT]problem_id
,[TEXT]input
,[TEXT]output
- primary key:
- table:
- ๐๏ธ
ccpy1_2_make_dataset.py
-ccpy1_1_convert_riegeli2py.py
๊ฒฐ๊ณผ๋ฌผ์ ์ด์ฉํด ๋ฐ์ดํฐ์ ์ ์
- Input File:
lyk/archive/data/ccpy1_raw.db
- Output File:
lyk/archive/data/ccpy1_dataset.db
- table:
train
,valid
,test
- primary key:
problem_id
,correct_id
- columns:
[INT]problem_id
,[INT]correct_id
,[TEXT]description
,[TEXT]code
,[TEXT]input
,[TEXT]output
- primary key:
- table: