Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trec_eval options #57

Open
isoboroff opened this issue Sep 26, 2024 · 7 comments
Open

trec_eval options #57

isoboroff opened this issue Sep 26, 2024 · 7 comments

Comments

@isoboroff
Copy link

Consider adding some options from trec_eval. The one that drove me to post this issue was -M, so we could specify the maximum depth of a run. (Without this, runs can return >1000 docs and get better recall.)

A case might be made for -c and -J, but I think those are better implemented in specific measures.

No patch yet. The plan is to have read_trec_run take the full args object, and then we can count docs in the generator and know when to stop. With this implementation we are agnostic to providers.

@cmacdonald
Copy link
Collaborator

Three thoughts:

  1. an optional kwarg of depth : Optional[int] = None to read_trec_run() rather than passing an args object around - its cleaner to have a public API free of args objects that no client code an possible understand.

  2. Classical trec_eval performed cutoff after sorting, right?

  3. But should Recall always be defined by an @ cutoff?

@isoboroff
Copy link
Author

You're right, trec_eval only applies -M when making res_rels. And yes, if people always give a cutoff for recall then the problem there vanishes. But they won't and I don't think trec_eval assumes a cutoff aside from the standard measures.

But what I'm trying to get rid of is specifying the cutoff separately for every measure.
ir_measures -q
${QRELS} ${RUNS}/$runtag/$runtag
'nDCG(cutoff=1000)@20'
'P(cutoff=1000)@5'
'AP(cutoff=1000)'
'P(rel=3,cutoff=1000)@5'
'AP(rel=3,cutoff=1000)'

@isoboroff
Copy link
Author

btw the docs don't show how to send multiple options to a measure, and the above doesn't work. About to code dive.

@cmacdonald
Copy link
Collaborator

btw the docs don't show how to send multiple options to a measure, and the above doesn't work. About to code dive.

its clear measures do support multiple options:
https://github.com/terrierteam/ir_measures/blob/main/ir_measures/measures/accuracy.py#L14-L16

I think they are indeed just kwargs to functions, so it should work.

But what I'm trying to get rid of is specifying the cutoff separately for every measure.

Agreed, I support an -M option, but this is mainly @seanmacavaney's shindig.

then we can count docs in the generator

We'd need to sort the lists before applying the cutoff.

There might be a neat impl using https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.consecutive_groups or
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.map_reduce

@isoboroff
Copy link
Author

Yup I was wrong about multiple measure args sorry.

@isoboroff
Copy link
Author

Isn't it like quitting time in Scotland or something?

@seanmacavaney
Copy link
Collaborator

But what I'm trying to get rid of is specifying the cutoff separately for every measure.

I feel your pain here @isoboroff! It's indeed tedious to specify the same cutoff setting across all measures.

One of the project's goals is to ensure that all settings that affect a measure's calculation are defined by the measure specification and the underlying provider. (Barring stuff like floating point stuff that can differ across CPU architectures, etc.) This goal ensures that giving only the measure specification and the provider will give the same results for the same input. It also gets away from needing to provide instructions like: "run the evaluation tool twice, once with these settings, one with these other ones" in some cases.

OTOH, I see how it's pretty annoying to repeat this same default cutoff across a variety of measures. I'd like to think this through a bit more, but I'm potentially open to an -M-like option that sets a default cutoff setting for all measures where it's supported and not already provided. Note that this won't function exactly the same as trec_eval's -M, since it won't cut off at the data input level, won't affect measures without cutoff settings (like the set measures), etc. When presenting the measures, it would unambiguously show the full specification (satisfying the project goal), while reducing the burden when calling the command. E.g.,:

$ ir_measures -q ${QRELS} ${RUNS}/$runtag/$runtag -M 1000 'nDCG' 'nDCG@20' 'P@5' 'AP' 'P(rel=3)' 'AP(rel=3)'
nDCG@1000 #.####
nDCG@20 #.####
P@5 #.####
AP@1000 #.####
P(rel=3)@1000 #.####
AP(rel=3)@1000 #.####

Would this be helpful?

A case might be made for -c

We've made -c the default behavior across implementations, our reasoning here: https://ir-measur.es/en/latest/getting-started.html#empty-set-behaviour

and -J

Already handled with judged_only=True measure setting :) #44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants