You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, this is working as intended (i.e., it's not surprising you might have the same crop with reversed labels).
It is possible that two raters rated the same crop with differing opinions. Since in this task we cannot represent "equal", the closest we can do is have a 1 and a 0.
In the "real" task (i.e., not in the validation) we do take this into account in the rank-correlation computation, as opposed to in the accuracy task. We'll report all of these metrics on the leaderboard shortly (they're not reported).
For reference what I mean by "rank" is that we use each of these answers to compute a ranking of N compression methods which participate in the challenge. We compare the ranking obtained by running the classifier vs. the humans' preferences, which is what you're looking at.
In a future revision of the benchmark we are considering more aggressive data filtering, but given that this was released as is, and it has been used by multiple participants for testing purposes (not for competition), we don't plan to change this file at this time.
In oracle.csv there are these two lines:
Note that A and B are exactly the same file, but each time a different one is preferred.
The text was updated successfully, but these errors were encountered: