Skip to content

Cannot Reproduce the Experimental Results #11

Closed
@Kosei1227

Description

@Kosei1227

Hi!! Thank you for the excellent paper and wonderful results.

As researchers in low-resource languages, we want to reproduce the experimental results and apply/improve this Langbridge approach in our target languages.

We run the following code.

python eval_langbridge.py \
  --checkpoint_path kaist-ai/metamath-langbridge-9b\
  --enc_tokenizer kaist-ai/langbridge_encoder_tokenizer \
  --tasks mgsm_en,mgsm_es,mgsm_fr,mgsm_de,mgsm_ru,mgsm_zh,mgsm_ja,mgsm_th,mgsm_sw,mgsm_bn,mgsm_te\
  --instruction_template metamath \
  --batch_size 1 \
  --output_path eval_outputs/mgsm/metamath-langbridge_9b \
  --device cuda:2 \
  --no_cache

And we got this output.

kaist-ai/metamath-langbridge-9b (), limit: None, provide_description: False, num_fewshot: 0, batch_size: 1

Task Version Metric Value Stderr
mgsm_bn 0 acc 0.040 ± 0.0124
mgsm_de 0 acc 0.108 ± 0.0197
mgsm_en 0 acc 0.152 ± 0.0228
mgsm_es 0 acc 0.084 ± 0.0176
mgsm_fr 0 acc 0.096 ± 0.0187
mgsm_ja 0 acc 0.060 ± 0.0151
mgsm_ru 0 acc 0.068 ± 0.0160
mgsm_sw 0 acc 0.024 ± 0.0097
mgsm_te 0 acc 0.036 ± 0.0118
mgsm_th 0 acc 0.076 ± 0.0168
mgsm_zh 0 acc 0.048 ± 0.0135

These values are quite lower than the output of the paper.

Have you faced this issue in the past? Could you tell me the script to run for the experimental results?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions