@@ -21,13 +21,13 @@ Once you've chosen a benchmark, run it with `lighteval eval`. Below are examples
21211 . Evaluate a model via Hugging Face Inference Providers.
2222
2323``` bash
24- lighteval eval " hf-inference-providers/openai/gpt-oss-20b" " lighteval| gpqa:diamond|0 "
24+ lighteval eval " hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond
2525```
2626
27272 . Run multiple evals at the same time.
2828
2929``` bash
30- lighteval eval " hf-inference-providers/openai/gpt-oss-20b" " lighteval| gpqa:diamond|0,lighteval| aime25|0 "
30+ lighteval eval " hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond, aime25
3131```
3232
33333 . Compare providers for the same model.
@@ -37,7 +37,7 @@ lighteval eval \
3737 hf-inference-providers/openai/gpt-oss-20b:fireworks-ai \
3838 hf-inference-providers/openai/gpt-oss-20b:together \
3939 hf-inference-providers/openai/gpt-oss-20b:nebius \
40- " lighteval| gpqa:diamond|0 "
40+ gpqa:diamond
4141```
4242
4343You can also compare every providers serving one model in one line:
@@ -50,19 +50,19 @@ You can also compare every providers serving one model in one line:
50504 . Evaluate a vLLM or SGLang model.
5151
5252``` bash
53- lighteval eval vllm/HuggingFaceTB/SmolLM-135M-Instruct " lighteval| gpqa:diamond|0 "
53+ lighteval eval vllm/HuggingFaceTB/SmolLM-135M-Instruct gpqa:diamond
5454```
5555
56565 . See the impact of few-shot on your model.
5757
5858``` bash
59- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| gsm8k|0,lighteval| gsm8k|5"
59+ lighteval eval hf-inference-providers/openai/gpt-oss-20b " gsm8k|0,gsm8k|5"
6060```
6161
62626 . Optimize custom server connections.
6363
6464``` bash
65- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| gsm8k|0 " \
65+ lighteval eval hf-inference-providers/openai/gpt-oss-20b gsm8k \
6666 --max-connections 50 \
6767 --timeout 30 \
6868 --retry-on-error 1 \
@@ -73,13 +73,13 @@ lighteval eval hf-inference-providers/openai/gpt-oss-20b "lighteval|gsm8k|0" \
73737 . Use multiple epochs for more reliable results.
7474
7575``` bash
76- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| aime25|0 " --epochs 16 --epochs-reducer " pass_at_4"
76+ lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --epochs 16 --epochs-reducer " pass_at_4"
7777```
7878
79798 . Push to the Hub to share results.
8080
8181``` bash
82- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| hle|0 " \
82+ lighteval eval hf-inference-providers/openai/gpt-oss-20b hle \
8383 --bundle-dir gpt-oss-bundle \
8484 --repo-id OpenEvals/evals \
8585 --max-samples 100
@@ -99,17 +99,17 @@ Resulting Space:
9999You can use any argument defined in inspect-ai's API.
100100
101101``` bash
102- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| aime25|0 " --temperature 0.1
102+ lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --temperature 0.1
103103```
104104
10510510 . Use model-args to use any inference provider specific argument.
106106
107107``` bash
108- lighteval eval google/gemini-2.5-pro " lighteval| aime25|0 " --model-args location=us-east5
108+ lighteval eval google/gemini-2.5-pro aime25 --model-args location=us-east5
109109```
110110
111111``` bash
112- lighteval eval openai/gpt-4o " lighteval| gpqa:diamond|0 " --model-args service_tier=flex,client_timeout=1200
112+ lighteval eval openai/gpt-4o gpqa:diamond --model-args service_tier=flex,client_timeout=1200
113113```
114114
115115
0 commit comments