Skip to content

Add @EXPECTED_RESULTS@ tag. #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

RagnarGrootKoerkamp
Copy link
Collaborator

As discussed.

Suggestions on wording are welcome. I think the directory name should be one of:

  • mixed_results
  • mixed_verdicts
  • multiple_results (although only one would still be OK)
  • multiple_verdicts

@simonlindholm
Copy link
Member

Could be worth coming up with a syntax that naturally extends to multiple test groups. We have an unofficial tool https://github.com/nordicolympiad/testdata_tools/pull/13/files that parses @EXPECTED_GRADES@ AC AC TLE TLE as "expect groups 1 and 2 to pass, 3 and 4 to TLE"; the similar syntax with different semantics is a bit confusing. (Though it probably won't lead to errors in practice since they are used in different contexts.)

@ALLOWED_VERDICTS@ is another possible bikeshed color, not sure if a good one.

@eldering
Copy link
Collaborator

@ALLOWED_VERDICTS@ is another possible bikeshed color, not sure if a good one.

Unless there's a good reason to change, I'd prefer to stick to this name as it is already in use in DOMjudge and also by some problem setters.

@RagnarGrootKoerkamp
Copy link
Collaborator Author

We should use short hand AC/WA/RTE/TLE

@RagnarGrootKoerkamp
Copy link
Collaborator Author

drop mixed result

This tag implies that the submission may get any of the listed verdicts as
final verdict.

If `@EXPECTED_RESULTS@: ` is found in a submission in any of the other
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the @EXPECTED_VERDICTS@ above.

@jsannemo
Copy link
Contributor

jsannemo commented Sep 7, 2022

@RagnarGrootKoerkamp I think we agreed on calling the mixed-bag-directory rejected, but I would still like to rescue the EXPECTED_RESULTS for those submissions!

Mind bumping this PR (and fixing @eldering 's comment) and perhaps @niemela can take a look?

@RagnarGrootKoerkamp
Copy link
Collaborator Author

So the semantics would be:

  • @EXPECTED_RESULTS@ is only allowed in rejected/. (Or anywhere, for backwards compatibility?)
  • rejected submissions optionally include @EXPECTED_RESULTS@. If not, any non-AC verdict is ok.

What about submissions that are WA or AC based on randomness? Should they be in accepted or rejected?

@niemela
Copy link
Member

niemela commented Sep 7, 2022

What about submissions that are WA or AC based on randomness?

When are they useful?

Should they be in accepted or rejected?

Feels like that should be is_submission or ignore or something?

@jsannemo
Copy link
Contributor

jsannemo commented Sep 7, 2022

I would just dump such a submission in submissions/. I don't immediately see that it provides a statement about the problem that a problem verifier needs to check? If it's intended as a non-deterministic AC solution that you want to compute time limits with, fix a seed and make it accepted? :)

@jsannemo
Copy link
Contributor

jsannemo commented Sep 7, 2022

Replying to your original question, I agree that those are the semantics we want. I'd say to only allow it in rejected/? Problems allowing multiple results that are in one of the other subdirectories (except TLE that also allows WA) already break the spec, so it's not really breaking backwards compatibility to break them, except if they were added as a tautology (and fixing them is easy since the verifier will say what's wrong).

@RagnarGrootKoerkamp
Copy link
Collaborator Author

RagnarGrootKoerkamp commented Sep 7, 2022

Regarding having ACCEPTED as part of the EXPECTED_RESULTS: I have 23 submissions in the last 3 years of BAPC that do this, spread over 12 problems.

Reasons for doing this:

  • We have problems with guaranteed random input. Sometimes you have a solution that happens to pass all 100 testcases with say 50% probability (and WAs on the other half). This is unfortunate, but does not prevent us from wanting to write and test such a solution.
  • We have submissions that RTE/TLE/AC depending on whether pypy or cpython is used, because of recursion limit issues.
  • We have non-intented 'bruteforce' submissions that are only included to test correctness of the answers, that may sometimes happen to pass the tests anyway. These should not be used to set the timelimit (e.g. using 2* slowest submission), so are moved to another directory.

It would be nice to officially support this. That's as easy as allowing ACCEPTED as a possible verdict in the rejected directory.

@thorehusfeldt
Copy link
Contributor

thorehusfeldt commented May 6, 2023

I am late to this conversation; and much less experienced than many of you.

(I have, however, given this topic some thought, and used my own script at analyzetestgroups.py quite a bit. It focusses on @EXPECTED_GRADES@ for problems with test groups, and I’ve found it incredibly useful in particular in co-developing with less experienced setters. Still, there are many annoying issues with this.)

Here’s an idea for a very different approach. It

  1. avoids cluttering source code with what is ultimately problem development information
  2. allows rich specification of expected behaviour
  3. is easily accessed and parsed by tools
  4. allows changes to test group structure and expected submission behaviour during development (such as test group 2 being removed)
  5. is decoupled from other specifications

The idea is that the submissions directory can contain expected_verdicts.yaml that enumerates (some) submissions and specifies behaviour consistent with the layout of data/*, much like generators.yaml.

I want to be able to specify the allowed verdicts per submission and per named test group. (An allowed verdict can be a list, such as TLE, RTE.) The tree structure of data allows inheriting verdicts downwards, so I can quickly specify “everything should get AC”, but I could also specify “should get AC except for the huge instances in data/huge, where it should TLE or RTE, which depends on the recursion limit.

If one right, this allows arbitrarily fine-grained specification (say, specifying that this submission is guaranteed to fail on sample 3, but gets AC for the rest of the sample group.)

This would work (and be useful), independently of “we use test groups internally to organise test cases logically during development for pass/fail problems” or “we use test groups for graded problems with scoring and expose them to the solver.”

Here is an example, using fictional (and not complete thought-through) syntax:

submissions:
  accepted/th.py:
    sample: AC # redundant bc AC is default
    secret: AC # redundant
  partially_accepted/quadratic_time.py:
    secret:
        huge: [TLE, RTE] # times out on huge testcases in data/secret/huge
        *: AC # may be redundant as well
  partially_accepted/simple_graph.py:
    sample:
      '3': WA # whereas '1' and '2' are implicitly AC
    secret:
        non-simple: WA
        with-loops: WA
        *: AC # redundant
  partially_accepted/greedy.py # typical submission in a 4-testgroup scoring/grading problem
    secret:
      group1: AC
      group2: AC
      group3: WA
      group4:
        - TLE
        - RTE

For scoring problems with grade AC <integer>, one could specify the range of scores expected, also on a per-testgroup basis.

@niemela
Copy link
Member

niemela commented Jul 22, 2023

Will close this. @thorehusfeldt is working on an updated suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants