Open
Description
hello @OmkarThawakar , I used the LLM360 Analysis repo to run eval for siqa task:
python Analysis360/eval/harness/main.py --device cuda:0 --model=hf-causal-experimental --batch_size=auto:1 --model_args="pretrained=MBZUAI/MobiLlama-05B,trust_remote_code=True,dtype=bfloat16" --tasks=social_iqa --num_fewshot=0 --output_path=Analysis360-MobiLlama-05B.json
it only gives 0.3327, which is close to random numbers, since there are only three choices.
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
social_iqa | 0 | none | 0 | acc | 0.3327 | ± | 0.0107 |
Could you share how you ran the siqa evaluation? Thanks
Metadata
Metadata
Assignees
Labels
No labels