Skip to content

๐Ÿฅผ 93suhwan/logicalErrorFix reproducible code across tasks

License

Notifications You must be signed in to change notification settings

KNU-PLML-Lab/logicalErrorFix_LYK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฅผ logicalErrorFix - LYK ์ฝ”๋“œ ์ €์žฅ์†Œ


โ“ How to use

  1. 93suhwan/logicalErrorFix ํด๋ก  ๋ฐ ์„ธํŒ… ์™„๋ฃŒ
  2. logicalErrorFix ํด๋”์—์„œ
git clone https://github.com/SemteulGaram/logicalErrorFix_LYK.git
  1. ํด๋”๋ช… ๋ณ€๊ฒฝ
mv logicalErrorFix_LYK lyk
  1. ์•„๋ž˜ Available Scripts ๋ฆฌ์ŠคํŠธ์—์„œ ์›ํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ฐพ๊ณ  ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์—ด์–ด ์ตœ์ƒ๋‹จ # How to use: ์ง€์‹œ๋ฌธ ๋”ฐ๋ฅด๊ธฐ
  2. Scripts์˜ ๋ชจ๋“  ๋ช…๋ น์–ด๋Š” logicalErrorFix/lyk ํด๋”๊ฐ€ ์•„๋‹Œ logicalErrorFix ์—์„œ ์‹คํ–‰ํ•ด์•ผํ•จ

๐Ÿ“œ Available Scripts

1. mdcpp1 (Make Data CPP)

  1. mdcpp1_1_convert_riegeli2cpp.py - Riegeli ๋ฅผ ์ด์šฉํ•ด ๋Œ€ํšŒํŒŒ์ผ์—์„œ CPP ํŒŒ์ผ๊ณผ ๋ฐ์ดํ„ฐ๋งŒ ์ถœ๋ ฅํ•˜๊ธฐ
  • Input File: /tmp/dm-code_contests/* (๋Œ€ํšŒ ํŒŒ์ผ, 93suhwan/logicalErrorFix ํ™•์ธ ํ•  ๊ฒƒ)
  • Output File: data/[cppfiles_batch_[correct,incorrect]_[test,train,valid],descriptions,samples,private_samples,generated_samples]
  • Diff: FS ๋ณ‘๋ชฉ์„ ํ•ด์†Œํ•˜๊ธฐ ์œ„ํ•ด, 93suhwan/logicalErrorFix/convert-riegeli2cpp.py ์˜ ์ถœ๋ ฅ์„ data/cppfiles_batch_[correct,incorrect]_[test,train,valid]/[problem_id].txt ํŒŒ์ผ์— ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ์‹œ๋ฆฌ์–ผ๋ผ์ด์ฆˆํ•จ \nยถยถยถ\n[id]\nยถยถ\n[code]\nยถยถยถ\n[id]\nยถยถ\n[code] ...

BROKEN PIPE: unavailable data/edit_distance/pair_solution_[test,train,valid].txt (need request to JSH)

  1. mdcpp1_3-make_code_edit_dist.py
  • Filtered with target edit_distance
  • Input: Output of mdcpp1-1-convert_riegeli2cpp.py and data/edit_distance/pair_solution_[test,train,valid].txt
  • Output: data/edit_distance/pair_code_edit_dist_[test,train,valid].txt
  • edit_distance: Number of tokens that need to be fixed in the original code
  • Diff: it can decode cppfiles_batch_*
  1. mdcpp1_4_make_unique_correct_list.py- Remove duplicated dataFrame Columns from data/edit_distance/pair_code_edit_dist_[test,train,valid].txt
  2. mdcpp1_5_convert_cpp2gpp.py - Filtered out only gpp can compile it (Compile)
  3. mdcpp1_6_changsup_make_code_edit_dist.py - Filtered out only gpp can compile it (Validate)

3. fl1 (Fault Localization - LCS์˜ cpp_refined ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ Fault Localization)

  1. fl1_4_check_eval.py - .gold ํŒŒ์ผ๊ณผ ์˜ˆ์ธกํ•œ .output ํŒŒ์ผ์„ ๋น„๊ตํ•ด ๋น„๊ต ๊ฒฐ๊ณผ ์ถœ๋ ฅ

4. fl2 (Fault Localization - cpp_refined_fl ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ Fault Localization)

  1. fl2_1_convert_dataset.py
  • ๊ธฐ์กด refined_pair_code_edit_dist_[test,train,valid].txt ๊ธฐ๋ฐ˜ ํŒŒ์ผ์—์„œ ์ˆ˜์ •ํ•ด์•ผ ํ•˜๋Š” ์ฝ”๋“œ stmt์—์„œ line_no ๋งŒ ๋‚จ๊ฒจ์„œ fl์šฉ์œผ๋กœ ๋ณ€ํ™˜
  • Input: data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
  • Output: data/edit_distance/refined_pair_code_edit_dist_fl_[test,train,valid].txt
  1. fl2_2_run_train.py
  • ๊ธฐ์กด train.sh์—์„œ run.py ๋ฅผ ํ†ตํ•ด ํ•™์Šต์„ ํ•˜๋˜ ๊ฒƒ์„ fl2 ํ”„๋กœ์ ํŠธ์šฉ์œผ๋กœ ๋ณ€๊ฒฝํ•ด ์‹คํ–‰
  • Input: data/edit_distance/refined_pair_code_edit_dist_fl_[test,train,valid].txt
  • Output: model/cpp_refined_fl

5. gpt1 (Code Repair using OpenAI GPT)

  1. gpt1_1_request_openai_gpt3.5.py
  • GPT 3.5 inference๋ฅผ ํ†ตํ•ด ์ฝ”๋“œ ์ƒ์„ฑ
  • Input data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
  • Output: lyk/output/gpt1-[NAME]
  1. gpt1_1_1_calc_tiktoken.py
  • OpenAI์˜ tiktoken ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด ์š”์ฒญ์— ์‚ฌ์šฉ๋  ํ† ํฐ ์–‘ ๊ณ„์‚ฐ
  • Input: data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
  • Output: lyk/output/gpt1_1_1_[NAME].db
  1. gpt1_2_parse_and_make_code.py
  • GPT ์‘๋‹ต์„ ๋ถ„์„ํ•ด์„œ ๊ตฌ์กฐํ™”๋œ ์ฝ”๋“œ๋กœ ๋ณ€ํ™˜ํ•ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ €์žฅ
  • Input: lyk/output/gpt1_[NAME]/*
  • Output: lyk/output/gpt1_2_[NAME].db
  1. gpt1_3_execute_test.py
  • GPT ๋ผ์ธ ์ˆ˜์ • ์ œ์•ˆ ์ฝ”๋“œ๋ฅผ ๊ธฐ์กด Incorrect_code ์™€ ํ•ฉ์ณ ์ˆ˜์ • ํ›„ ์ปดํŒŒ์ผ, AC, TLE, WA, RE, CE ์—ฌ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ €์žฅ
  • Input data/edit_distance/refined_pair_code_edit_dist_[test,train,valid].txt
  • Input: lyk/output/gpt1_2_[NAME].db
  • Output: lyk/output/gpt1_3_[NAME].db
  1. gpt1_4_make_report.py
  • ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅ๋œ ๊ฐ์ข… ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ •ํ™•๋„์™€ ๋ณด๊ณ ์„œ ์ถœ๋ ฅ
  • Input: lyk/output/gpt1_3_[NAME].db
  • Output: lyk/output/gpt1_4_[NAME]/*.{png,txt}

6. ccpy1 (Make CodeContest python dataset)

  1. ๐Ÿ—๏ธ ccpy1_1_convert_riegeli2py.py - Riegeli ๋ฅผ ์ด์šฉํ•ด ๋Œ€ํšŒํŒŒ์ผ์—์„œ python ํŒŒ์ผ๊ณผ ๋ฐ์ดํ„ฐ๋งŒ ์ถœ๋ ฅํ•˜๊ธฐ
  • Input File: /tmp/dm-code_contests/* (๋Œ€ํšŒ ํŒŒ์ผ, 93suhwan/logicalErrorFix ํ™•์ธ ํ•  ๊ฒƒ)
  • Output File: lyk/archive/data/ccpy1_raw.db
    • table: problem
      • primary key: problem_id
      • colums: [INT]problem_id, [TEXT]description
    • table: problem_correct, problem_incorrect
      • primary key: problem_id, correct_id (์ฒ˜๋ฆฌ ์šฉ์ด์„ฑ์„ ์œ„ํ•ด incorrect_id์ธ ๊ฒฝ์šฐ๋„ correct_id ๋กœ ํ‘œ๊ธฐ)
      • columns: [INT]problem_id, [INT]correct_id, [TEXT]code
    • table: problem_public_test, problem_private_test, problem_generated_test
      • primary key: problem_id, input, output
      • columns: [INT]problem_id, [TEXT]input, [TEXT]output
  1. ๐Ÿ—๏ธ ccpy1_2_make_dataset.py - ccpy1_1_convert_riegeli2py.py ๊ฒฐ๊ณผ๋ฌผ์„ ์ด์šฉํ•ด ๋ฐ์ดํ„ฐ์…‹ ์ •์ œ
  • Input File: lyk/archive/data/ccpy1_raw.db
  • Output File: lyk/archive/data/ccpy1_dataset.db
    • table: train, valid, test
      • primary key: problem_id, correct_id
      • columns: [INT]problem_id, [INT]correct_id, [TEXT]description, [TEXT]code, [TEXT]input, [TEXT]output

About

๐Ÿฅผ 93suhwan/logicalErrorFix reproducible code across tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages