-
Notifications
You must be signed in to change notification settings - Fork 326
Changed MMLU Pro for Non-COT Version #3108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good besides a few minor things. Thanks!
} | ||
for hf_split, split in splits.items(): | ||
data = dataset[hf_split].filter(lambda x: x["category"] == self.subject) | ||
print(f"Filtered instances in {hf_split}: {len(data)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the print
.
# Test for the "abstract_algebra" subject | ||
scenario = MMLUProScenario(subject="math") | ||
instances = scenario.get_instances(tmpdir) | ||
# assert len(instances) == 116 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment or delete this line.
assert instances[1].input == Input(text="Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.") | ||
assert instances[1].references == [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert changes in this file.
# Test for the "anatomy" subject | ||
scenario = MMLUProScenario(subject="health") | ||
instances = scenario.get_instances(tmpdir) | ||
# assert len(instances) == 154 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment or delete this line.
hlog(f"Processing data for {split} split") | ||
for row in data: | ||
question = row["question"] | ||
answers = row["options"][:10] # Limit to 10 answers if necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need [:10]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I will remove this.
Merging this pull request broke main: https://github.com/stanford-crfm/helm/actions/runs/11620257291 Could you open a new pull request to fix the tests and also to address the open comments in this pull requests? |
Added the mmlu_pro.py (scenario), lite_run_spec.py, and test_mmlu_pro_scenario.py
This is the implementation for MMLU Pro without Chain of Thought.