-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unitxt evaluator #156
base: main
Are you sure you want to change the base?
Unitxt evaluator #156
Conversation
This pull request has merge conflicts that must be resolved before it can be |
@Mergifyio refresh |
✅ Pull request refreshed |
b8229e2
to
abf67d5
Compare
a6d43e7
to
c22397b
Compare
64aff53
to
9ca436c
Compare
1154444
to
75e934b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
tests/test_unitxt.py
Outdated
print("===> Executing 'test_unitxt'...") | ||
try: | ||
model_path = "instructlab/granite-7b-lab" | ||
unitxt_recipe = "card=cards.wnli,template=templates.classification.multi_class.relation.default,max_train_instances=5,loader_limit=20,num_demos=3,demos_pool_size=10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like too much detail to ask for from the cli in 1 long string. Are all the values things we want users to specify or could some of them be implementation details? For the ones that are, I think we need to break them down into individual params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users should have the flexibility to specify all details. I can suggest the following, tell me which you prefer:
1.a - have the recipe written down in a file and then in the cli just provide a path
1.b - have the recipe written down in a file, but in a json format (e.g. {card: ..., template: ...}, which is more friendly
2 - add some prefixed parameters such as card and template, but also a freetext parameter, as there are many customization a unitxt user may want to make.
3 - keep it as is :)
unitxt_recipe: str, | ||
): | ||
unitxt_task = self.assign_task_name() | ||
tasks_dir = self.assign_tasks_dir(unitxt_task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this using a local directory? If so, it needs to be built off a param like output_dir with mt_bench.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a temporary directory, deleted at the end of the evaluation process. Would you prefer the user specified an output dir? It does not contain anything of use for the user, just the files required for lm eval to run unitxt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to use the output_dir so it doesn't confuse the user in a local directory. Also, since you do want to remove at the end, the create/remove logic should probably be in a try/finally block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a user would specify an output dir but will find it doesn't exist at the end of the run?
If the user specifies it, I guess I will not delete it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a user would specify an output dir but will find it doesn't exist at the end of the run?
The current output dir is a working dir for mt_bench.
If the user specifies it, I guess I will not delete it, right?
I was expecting you would create your directory inside the output_dir and then delete it when you are done. Or you could delete the directory before you start if there is some value in leaving it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to put all this into a memory filesystem? In general, it is best to avoid unnecessary disk writes, especially for something that's likely to run in a cloud service where it may or may not have write permissions on some sort of disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwm4 memory filesystem does not seem to work well, as directory is later accessed also by lm-eval inside the mmlu class and I don't want to start passing this filesystem around (unless owners support such an overall change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to use the output_dir so it doesn't confuse the user in a local directory.
@danmcp So I'm not entirely sure what you mean here. I see mt_bench has output_dir: str = "eval_output",
, but this is created only if one calls for mt_bench and does not enter a different output dir. Doing the following, although not sure it makes a lot of sense:
def assign_tasks_dir(self, task_name):
return os.path.join( "eval_output" ,f"{TEMP_DIR_PREFIX}_{task_name}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if my request wasn't clear, my suggestion was like mt_bench:
- Accept the root dir to use for output as a var
- Default it to the same root dir as mt_bench
@Roni-Friedman Could you explain in the description what benefit the Unitxt evaluator would have? Why would a user run the unitxt evaluator over just using the MMLUBranchEvaluator? |
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
905c81e
to
bbe7108
Compare
Signed-off-by: Roni Friedman-Melamed <Roni.friedman-melamed@il.ibm.com>
bbe7108
to
7c9e44c
Compare
Let's discuss this in our meeting as well. I initially wrote a generic evaluator that can use all of unitxt features, but now my understanding is that there are two clear use cases and PR should be adjusted accordingly: |
Adding unitxt evaluator.
To be complemented by adding unitxt as a benchmark in instructlab repo