Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645)#3669
Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645)#3669yuhengtu wants to merge 4 commits intostanford-crfm:mainfrom
Conversation
| help="EXPERIMENTAL: Full class name of the Summarizer class to use. If unset, uses the default Summarizer.", | ||
| ) | ||
| parser.add_argument( | ||
| "--validity-check", |
There was a problem hiding this comment.
I would prefer this to be --psychometric-validity-check because "validity" is a vague concept (it could be data completeness validation, or data schema validation, or other kinds of validation).
| def write_run_display_json(self, skip_completed: bool) -> None: | ||
| def process(run: Run) -> None: | ||
| write_run_display_json(run.run_path, run.run_spec, self.schema, skip_completed) | ||
| write_run_display_json(run.run_path, run.run_spec, self.schema, self.validity_check, skip_completed) |
There was a problem hiding this comment.
self.validity_check should be the last argument.
| verbose: bool, | ||
| num_threads: int, | ||
| allow_unknown_models: bool, | ||
| validity_check: bool, |
There was a problem hiding this comment.
Change this to psychometrics_validity_check or something that identifies the paper.
Also, set the default value to False to fix these errors:
src/helm/benchmark/presentation/torr_robustness_summarizer.py:36: error: Missing positional argument "validity_check" in call to "__init__" of "Summarizer" [call-arg]
src/helm/benchmark/presentation/test_summarize.py:13: error: Missing positional argument "validity_check" in call to "Summarizer" [call-arg]
src/helm/benchmark/presentation/test_summarize.py:31: error: Missing positional argument "validity_check" in call to "Summarizer" [call-arg]
| @htrack(None) | ||
| def write_run_display_json(run_path: str, run_spec: RunSpec, schema: Schema, skip_completed: bool) -> None: | ||
| def write_run_display_json( | ||
| run_path: str, run_spec: RunSpec, schema: Schema, skip_completed: bool, validity_check: bool = False |
There was a problem hiding this comment.
Change validity_check to psychometrics_validity_check or something that identifies the paper.
|
This fixes #3645. |
|
This pull request is still causing the type checker to fail. If you'd like to merge, please resolve the type checking issues and update this pull request. |
|
Hi, it's been a month since the last update; are you still working on this? |
We add a new bool argument --validity-check to helm-summarize. If it is activated, we load the four pre-calculated validity metrics values from HuggingFace and write them into the display_prediction.json. In this way, we achieve the goal of displaying the validity metrics values on the HELM website. The script to calculate those four validity metrics is in
scripts/validity_check.py.