Skip to content

Semi-lazy judging for jury submissions / submissions with expected result #2266

Open
@meisterT

Description

@meisterT

(Suggested by @RagnarGrootKoerkamp)

When setting up a contest, jury submissions in the problem ZIP files are imported with an expected result (either coming from the annotation or the folder convention). Then jury submissions are being judged and one can run a judging verifier to see whether the expected results match the actual results.

Typically the jury then goes to the statistics page to check how far below the accepted submissions are below the time limit and how far the time limit submissions are above the time limit.

By default, we have lazy judging enabled and equal judging priority of all incorrect verdicts, so then we will abort judging when we hit the time limit. This might be misleading, imagine the following situation: the time limit is 5s, on the first secret test case the TLE submission times out with 5.001s, so barely above the time limit. The remaining test cases are not judged, so in the statistics overview page you would believe that this submission is close to being accepted although there might be a later test case where the submissions is clearly above the time limit, e.g. takes 20s.

Now, one could argue that the jury should just disable lazy judging, or we do that automatically for them if there is an expected result. However, this approach has the drawback that if a solution is clearly above the time-limit (say on the 2nd secret test case) and there are many test cases (which happens quite often nowadays), it would take "forever".

So you would likely want something semi-lazy in between the current laziness and judging everything (at least for submissions with an expected result). The idea would be to keep on judging even if we hit the "soft timelimit" for one test case until we hit the "hard timelimit" (including overshoot) at least once. This would give an acceptable compromise between more info and amount of time it takes to judge all jury submissions.

This is likely non-trivial to implement. One might first consider introducing a new verdict for the "gray zone" above the time limit (and perhaps even another new verdict for the gray zone below the time limit) with a lower priority than the rest of the incorrect verdicts. Then one could remap from these new verdicts to accepted/TLE as appropriate. However, since we (at least currently) remap each judging_run result as it comes in (and we consider this important if verdicts have a non-uniform priority), we currently lose this information.

Ideas welcome on how to implement this elegantly!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions