Include multiple annotators for WildBench #3283

liamjxu · 2025-01-22T06:39:40Z

Using GPT, LLaMA, and Claude to annotate the output in WildBench, and take an average for the scoring.

liamjxu · 2025-01-22T06:43:33Z

src/helm/benchmark/annotation/wildbench_annotator.py


 from helm.benchmark.adaptation.request_state import RequestState
 from helm.benchmark.annotation.annotator import Annotator
+from helm.benchmark.annotation.model_as_judge import _AnnotatorModelInfo


@yifanmai Should we rename the _AnnotatorModelInfo class and remove the leading underscore?

Yes, let's do that.

yifanmai · 2025-01-23T00:18:14Z

src/helm/benchmark/annotation/wildbench_annotator.py


 from helm.benchmark.adaptation.request_state import RequestState
 from helm.benchmark.annotation.annotator import Annotator
+from helm.benchmark.annotation.model_as_judge import _AnnotatorModelInfo


Yes, let's do that.

liamjxu added 3 commits January 21, 2025 22:10

update wildbench score calculation to use multiple annotators

f17cbb8

formatting

6113ebc

minor fix

679778c

liamjxu requested a review from yifanmai January 22, 2025 06:39

liamjxu commented Jan 22, 2025

View reviewed changes

yifanmai approved these changes Jan 23, 2025

View reviewed changes

yifanmai merged commit 80432dc into main Jan 23, 2025
8 checks passed

yifanmai deleted the jialiang/multiple_annotator branch January 23, 2025 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include multiple annotators for WildBench #3283

Include multiple annotators for WildBench #3283

Uh oh!

liamjxu commented Jan 22, 2025 •

edited

Loading

Uh oh!

liamjxu Jan 22, 2025

Uh oh!

yifanmai Jan 23, 2025

Uh oh!

yifanmai Jan 23, 2025

Uh oh!

Uh oh!

Uh oh!

Include multiple annotators for WildBench #3283

Include multiple annotators for WildBench #3283

Uh oh!

Conversation

liamjxu commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liamjxu Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

liamjxu commented Jan 22, 2025 •

edited

Loading