IDC-3: Support HumanEvalPlus-Mini-v0.1.10, Support the DeepSeek Coder…

… Model Family in examples/run_identity_chain_huggingface.py (#4) * update README * support deepseekcoder * update prompt for fim * update run_identity_chain.sh * update HumanEvalPlus-Mini to v0.1.10 * Bump version: 0.0.1 → 0.1.0
marcusm117 · May 4, 2024 · 56e1ec4 · 56e1ec4
1 parent 867b706
commit 56e1ec4
Show file tree

Hide file tree

Showing 12 changed files with 553 additions and 206 deletions.
diff --git a/.bumpversion.cfg b/.bumpversion.cfg
@@ -1,5 +1,5 @@
 [bumpversion]
-current_version = 0.0.1
+current_version = 0.1.0
 commit = True
 tag = False
 

diff --git a/.gitignore b/.gitignore
@@ -7,7 +7,9 @@ temp_files/
 
 # unzipped data
 data/EvalPlus-Mini-v0.1.6_reformatted.jsonl
+data/EvalPlus-Mini-v0.1.9_reformatted.jsonl
 data/MBPP-S_test_reformatted.jsonl
+data/MbppPlus-v0.1.0_reformatted.jsonl
 
 # Coverage Report
 python_junit.xml

diff --git a/README.md b/README.md
@@ -41,8 +41,9 @@ Before the self-consistency evaluation, you need to make sure that one of the fo
 
 To evaluate your model using IdentityChain, you need to prepare the followings:
 
-1. An evaluation dataset in the format of one of the followings (you can also use these two directly):
+1. An evaluation dataset from one of the followings (or one of your own in the same format):
    - [EvalPlus-Mini-v0.1.6_reformatted.jsonl](./data/EvalPlus-Mini-v0.1.6_reformatted.jsonl.gz)
+   - [EvalPlus-Mini-v0.1.10_reformatted.jsonl](./data/EvalPlus-Mini-v0.1.10_reformatted.jsonl.gz)
    - [MBPP-S_test_reformatted.jsonl](./data/MBPP-S_test_reformatted.jsonl.gz)
 2. An NL-to-PL prompt for your model
 3. A PL-to-NL prompt for your model
@@ -51,14 +52,18 @@ To evaluate your model using IdentityChain, you need to prepare the followings:
 
 See [run_identity_chain_openai.py](./examples/run_identity_chain_openai.py) for an example of how to use IdentityChain to evaluate OpenAI models.
 
+See [run_identity_chain_google.py](./examples/run_identity_chain_google.py) for an example of how to use IdentityChain to evaluate Google models.
+
 See [run_identity_chain_huggingface.py](./examples/run_identity_chain_huggingface.py) for an example of how to use IdentityChain to evaluate HuggingFace open-source models. This example script already includes the following models:
 
-1. CodeLlama-Instruct-hf (7B, 13B, 34B)
-2. CodeLlama-hf (7B, 13B, 34B)
-3. starchat-beta
-4. starcoder
-5. starcoderplus
-6. starcoderbase (1B, 3B, 7B, 15B)
+1. CodeLlama-Instruct-hf (7B, 13B, 34B, 70B)
+2. CodeLlama-hf (7B, 13B, 34B, 70B)
+3. StarChat-Beta
+4. StarCoder
+5. StarCoderPlus
+6. StarCoderBase (1B, 3B, 7B, 15B)
+7. DeepSeekCoder-Instruct (1.3B, 6.7B, 33B, 7B-v1.5)
+8. DeepSeekCoder (1.3B, 6.7B, 33B, 7B-v1.5)
 
 ## Example
 

diff --git a/data/EvalPlus-Mini-v0.1.10_reformatted.jsonl b/data/EvalPlus-Mini-v0.1.10_reformatted.jsonl
diff --git a/data/EvalPlus-Mini-v0.1.10_reformatted.jsonl.gz b/data/EvalPlus-Mini-v0.1.10_reformatted.jsonl.gz
diff --git a/data/HumanEval/HumanEvalPlus-Mini-v0.1.10.jsonl.gz b/data/HumanEval/HumanEvalPlus-Mini-v0.1.10.jsonl.gz
diff --git a/examples/run_identity_chain.sh b/examples/run_identity_chain.sh
@@ -14,7 +14,7 @@ export IDENTITY_CHAIN_HOME=YOUR_OWN_PATH/IdentityChain  # no / at the end
 
 # for open-source models from HuggingFace, when using greedy, add the flag --greedy_early_stop to accelerate
 # for OpenAI models, don't use --greedy_early_stop!!! temperature = 0 is NOT greedy!!!
-# for EvalPlus-Mini-v0.1.6_reformatted.jsonl, use --resume_task_bs 1, since HumanEval/0 is used for prompt
+# for EvalPlus-Mini-v0.1.6_reformatted.jsonl (or other versions), use --resume_task_bs 1, since HumanEval/0 is used for prompt
 # for MBPP-S_test_reformatted.jsonl, use --resume_task_bs 0, since there's a separate prompt split
 
 for MODEL in "bigcode/starcoderbase-1b"  # feel free to add other supported models