-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trec_eval options #57
Comments
Three thoughts:
|
You're right, trec_eval only applies -M when making res_rels. And yes, if people always give a cutoff for recall then the problem there vanishes. But they won't and I don't think trec_eval assumes a cutoff aside from the standard measures. But what I'm trying to get rid of is specifying the cutoff separately for every measure. |
btw the docs don't show how to send multiple options to a measure, and the above doesn't work. About to code dive. |
its clear measures do support multiple options: I think they are indeed just kwargs to functions, so it should work.
Agreed, I support an -M option, but this is mainly @seanmacavaney's shindig.
We'd need to sort the lists before applying the cutoff. There might be a neat impl using https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.consecutive_groups or |
Yup I was wrong about multiple measure args sorry. |
Isn't it like quitting time in Scotland or something? |
I feel your pain here @isoboroff! It's indeed tedious to specify the same cutoff setting across all measures. One of the project's goals is to ensure that all settings that affect a measure's calculation are defined by the measure specification and the underlying provider. (Barring stuff like floating point stuff that can differ across CPU architectures, etc.) This goal ensures that giving only the measure specification and the provider will give the same results for the same input. It also gets away from needing to provide instructions like: "run the evaluation tool twice, once with these settings, one with these other ones" in some cases. OTOH, I see how it's pretty annoying to repeat this same default cutoff across a variety of measures. I'd like to think this through a bit more, but I'm potentially open to an
Would this be helpful?
We've made
Already handled with |
Consider adding some options from trec_eval. The one that drove me to post this issue was -M, so we could specify the maximum depth of a run. (Without this, runs can return >1000 docs and get better recall.)
A case might be made for -c and -J, but I think those are better implemented in specific measures.
No patch yet. The plan is to have read_trec_run take the full args object, and then we can count docs in the generator and know when to stop. With this implementation we are agnostic to providers.
The text was updated successfully, but these errors were encountered: