Closed
Description
Issue
Hey everyone,
I was trying to eval some models on the BigCodeBench but I get very low pass@1 (which is way lower than what's been reported for this model) and this warning:
BigCodeBench-Complete-calibrated
Groundtruth pass rate: 0.000
Please be cautious!
pass@1: 0.033
For reproduction
I tried granite-3b-code-base
in this setup but for other models that I tried (stablelm-1..6b, granite-8b-code-base it was the same).
For both apptainer images I used docker images mentioned in this repo, both latest versions.
My cmd for evaluation:
IMAGE="/p/scratch/ccstdl/marianna/bigcodebench-evaluate_latest.sif"
SUBSET="complete"
SAVE_PATH="/p/scratch/ccstdl/marianna/bigcodebench_results/ibm-granite/granite-3b-code-base_bigcodebench_complete_0.0_1_vllm-sanitized-calibrated.jsonl"
CMD="apptainer -v run --bind $CONTAINER_HOME:/app,/tmp $IMAGE \
--subset $SUBSET \
--max-data-limit 16000 \
--samples $SAVE_PATH "
srun --cpus-per-task=$SLURM_CPUS_PER_TASK $CMD
My generation cmd:
IMAGE="/p/scratch/ccstdl/marianna/bigcodebench-generate_latest.sif"
MODEL="ibm-granite/granite-3b-code-base"
MODELS_DIR="/marianna/models/"
SUBSET="complete"
BS=1
TEMPERATURE=0.0
N_SAMPLES=1
NUM_GPUS=4
SAVE_DIR="/p/scratch/ccstdl/marianna/bigcodebench_results"
BACKEND="vllm"
SAVE_PATH="${SAVE_DIR}/${MODEL}_bigcodebench_${SUBSET}_${TEMPERATURE}_${N_SAMPLES}_${BACKEND}.jsonl"
CMD="apptainer -v run --nv --bind $(pwd):/app $IMAGE \
--subset $SUBSET \
--model $MODELS_DIR/$MODEL \
--greedy \
--temperature $TEMPERATURE \
--n_samples $N_SAMPLES \
--backend $BACKEND \
--tp $NUM_GPUS \
--trust_remote_code \
--resume \
--save_path $SAVE_PATH"
srun --cpus-per-task=$SLURM_CPUS_PER_TASK $CMD
Please let me know if it's an issue on my side or what I can do to solve it! Thanks in advance!
Metadata
Metadata
Assignees
Labels
No labels