-
Notifications
You must be signed in to change notification settings - Fork 380
Pull requests: stanford-crfm/helm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix LocalWindowService to respect combined sequence budget
#4231
opened Apr 24, 2026 by
Erotemic
Contributor
Loading…
Security: Client-side decryption model exposes encryption keys and bypasses intended access restriction
#4225
opened Apr 23, 2026 by
tomaioo
Loading…
Fix GenderPerturbation missing words at text boundaries and consecutive matches
#4222
opened Apr 21, 2026 by
Chessing234
Contributor
Loading…
Fix missing f-string on CodeInsights past-mistake example headers
#4219
opened Apr 20, 2026 by
Chessing234
Contributor
Loading…
fix(presentation): dedupe duplicate main metric columns
#4217
opened Apr 20, 2026 by
MukundaKatta
Loading…
feat(frontend): add predictions page search filter
#4216
opened Apr 20, 2026 by
MukundaKatta
Loading…
Fix AnnotatorResponseParseFailure forwarding kwargs dict to Exception instead of message
#4215
opened Apr 20, 2026 by
Chessing234
Contributor
Loading…
feat(frontend): add metric filters to predictions page
#4213
opened Apr 20, 2026 by
MukundaKatta
Loading…
fix(frontend): keep leaderboard run links scoped to scenario
#4211
opened Apr 19, 2026 by
MukundaKatta
Loading…
Fix DisinformationScenario reiteration dropping every narrative when topic is None
#4209
opened Apr 19, 2026 by
Chessing234
Contributor
Loading…
Fix AutoBencherCapabilitiesScenario excluding the requested subject instead of including it
#4208
opened Apr 19, 2026 by
Chessing234
Contributor
Loading…
Fix CodeInsightsCorrectCodeScenario prompt conditional precedence
#4207
opened Apr 17, 2026 by
Chessing234
Contributor
Loading…
Fix AutobencherSafetyScenario parsing the file path string as JSON
#4206
opened Apr 17, 2026 by
Chessing234
Contributor
Loading…
Fix operator precedence skipping robustness metric group check
#4193
opened Apr 13, 2026 by
Chessing234
Contributor
Loading…
Fix unescaped '.' in final_number_exact_match regex
#4191
opened Apr 11, 2026 by
Chessing234
Contributor
Loading…
Fix duplicate entries in med_dialog dataset (#3746)
#4178
opened Apr 7, 2026 by
Chessing234
Contributor
Loading…
3 tasks
Benchmark Data Contamination Scenario
#4149
opened Apr 1, 2026 by
IriedsonSouto
Contributor
Loading…
5 tasks done
New inference-time approach for Private MedHelm Tasks
#3913
opened Oct 22, 2025 by
sronaghi
Loading…
LMKT: Add cultural knowledge remembering and cultural safety application
#3736
opened Jul 12, 2025 by
martinakaduc
Contributor
Loading…
Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645)
#3669
opened Jun 14, 2025 by
yuhengtu
Contributor
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-26.