Closed
Description
Hi!! Thank you for the excellent paper and wonderful results.
As researchers in low-resource languages, we want to reproduce the experimental results and apply/improve this Langbridge approach in our target languages.
We run the following code.
python eval_langbridge.py \
--checkpoint_path kaist-ai/metamath-langbridge-9b\
--enc_tokenizer kaist-ai/langbridge_encoder_tokenizer \
--tasks mgsm_en,mgsm_es,mgsm_fr,mgsm_de,mgsm_ru,mgsm_zh,mgsm_ja,mgsm_th,mgsm_sw,mgsm_bn,mgsm_te\
--instruction_template metamath \
--batch_size 1 \
--output_path eval_outputs/mgsm/metamath-langbridge_9b \
--device cuda:2 \
--no_cache
And we got this output.
kaist-ai/metamath-langbridge-9b (), limit: None, provide_description: False, num_fewshot: 0, batch_size: 1
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
mgsm_bn | 0 | acc | 0.040 | ± | 0.0124 |
mgsm_de | 0 | acc | 0.108 | ± | 0.0197 |
mgsm_en | 0 | acc | 0.152 | ± | 0.0228 |
mgsm_es | 0 | acc | 0.084 | ± | 0.0176 |
mgsm_fr | 0 | acc | 0.096 | ± | 0.0187 |
mgsm_ja | 0 | acc | 0.060 | ± | 0.0151 |
mgsm_ru | 0 | acc | 0.068 | ± | 0.0160 |
mgsm_sw | 0 | acc | 0.024 | ± | 0.0097 |
mgsm_te | 0 | acc | 0.036 | ± | 0.0118 |
mgsm_th | 0 | acc | 0.076 | ± | 0.0168 |
mgsm_zh | 0 | acc | 0.048 | ± | 0.0135 |
These values are quite lower than the output of the paper.
Have you faced this issue in the past? Could you tell me the script to run for the experimental results?
Thank you
Metadata
Metadata
Assignees
Labels
No labels