Skip to content

Conversation

@kaeluka
Copy link

@kaeluka kaeluka commented Aug 14, 2023

This adds additional candidates that are useful for regression testing.

For regression testing, we want to also look at candidates that we used to extract and then modeled using automodel.

To achieve that, we add the candidates to the extraction query, but also add metadata so that we can skip them during non-testing uses.

@kaeluka kaeluka requested a review from a team as a code owner August 14, 2023 10:58
@github-actions github-actions bot added the Java label Aug 14, 2023
@kaeluka kaeluka requested a review from tausbn August 14, 2023 10:59
@kaeluka kaeluka added the no-change-note-required This PR does not need a change note label Aug 14, 2023
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short and sweet! 👍

I have a small suggestion for how to make this more flexible (in case we need it), but otherwise this looks good to me.

@jhelie
Copy link
Contributor

jhelie commented Aug 15, 2023

⚠️ This is based on another PR (#13886) that is currently under review and should be merged first. It is sufficient to review starting with commit 551b34e

drive-by comment: I do that quite often too and I find that selecting the previous PR's branch as the merge target for the second PR is quite handy to keep things clear.

@jhelie
Copy link
Contributor

jhelie commented Aug 15, 2023

Thanks @kaeluka! Just to double check my understanding: this design does mean that during normal operation (i.e. not regression testing) we need to implement a filter in the script that will skip the candidates with an ai-{foo} provenance, correct?

In practice I imagine that the majority of candidates will not have an AI-provenance and so that skipping will be the exception (and so won't impact performance) but ooi have we also considered a 2 queries design whereby we would add a AutomodelApplicationModeExtractCandidatesRegression.ql query?

@kaeluka
Copy link
Author

kaeluka commented Aug 16, 2023

Just to double check my understanding: this design does mean that during normal operation (i.e. not regression testing) we need to implement a filter in the automodel.py script that will skip the candidates with an ai-{foo} provenance, correct?

Yes! 👍

In practice I imagine that the majority of candidates will not have an AI-provenance and so that skipping will be the exception (and so won't impact performance) but ooi have we also considered a 2 queries design whereby we would add a AutomodelApplicationModeExtractCandidatesRegression.ql query?

I have, and have discussed that in our standup on Monday, remember? Two queries will mean increased maintenance load, even though the two queries would be almost exactly the same.

@kaeluka
Copy link
Author

kaeluka commented Aug 16, 2023

drive-by comment: I do that quite often too and I find that selecting the previous PR's branch as the merge target for the second PR is quite handy to keep things clear.

Thanks! I considered that but have never done it before because I wasn't sure how the UI would present that! Will do that next time ;)

Edit: now that the 'base' branch was merged, I could update this PR and it only shows the new commits as intended.

@jhelie
Copy link
Contributor

jhelie commented Aug 16, 2023

Shall we have similar changes for framework mode in the same PR?

@jhelie
Copy link
Contributor

jhelie commented Aug 16, 2023

Two queries will mean increased maintenance load, even though the two queries would be almost exactly the same.

It feels a little weird to always extract all candidates instead of just the ones we need (seems like the wrong away around) but we can easily modify that later if need be.

In the meantime we should probably avoiding merging this until we have the filter implemented in the script.

@jhelie
Copy link
Contributor

jhelie commented Aug 23, 2023

I've just realised that the positive and negative extraction queries do not include the new alreadyAiModeled field. I'll try to add it in since t's more robust to have the same extracted features for examples and candidates (this is what allow us to easily classify candidates, regardless of whether or not they are known models, which can be useful for some regression testing).

edit: come to think of it I'm not entirely of the strength of the use case - we can handle the diff script side easily.

@jhelie
Copy link
Contributor

jhelie commented Aug 23, 2023

In the meantime we should probably avoiding merging this until we have the filter implemented in the script.

This is actually not true - it will only be an issue when we attempt bumping our codeql dependency.

@jhelie
Copy link
Contributor

jhelie commented Aug 29, 2023

Shall we have similar changes for framework mode in the same PR?

@tausbn thanks for the pointers on how to update the tests, I think the commit I just pushed took care of this - if you're happy with it I think this PR is now ready to be merged.

@jhelie jhelie requested a review from tausbn August 29, 2023 08:59
@jhelie jhelie changed the title Java: Automodel Application Mode: Add Candidates for Regression Testing Java: Automodel: Add Candidates for Regression Testing Aug 29, 2023
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, all of your test cases have alreadyAiModeled set to false (as witnessed by the file://:1:1:1:1 bit that precedes them -- i.e. the empty string).

This seems... not right? I would expect at least one of the test cases to actually populate this with a non-empty string (like we do for the application mode tests, where in one case it's "ai-manual").

@jhelie
Copy link
Contributor

jhelie commented Aug 29, 2023

No there is one that has the ai-manual provenance: de76c07 (so that's indeed the same situation as for the application mode)

See the 8th example in AutomodelFrameworkModeExtractCandidates.expected.

@jhelie jhelie requested a review from tausbn August 29, 2023 13:25
@tausbn
Copy link
Contributor

tausbn commented Aug 29, 2023

No there is one that has the ai-manual provenance: de76c07 (so that's indeed the same situation as for the application mode)

See the 8th example in AutomodelFrameworkModeExtractCandidates.expected.

Ah, so there is! Never mind, then. 🙂

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jhelie
Copy link
Contributor

jhelie commented Aug 29, 2023

Great thanks for having a quick look, let's merge it then! 🚢

@jhelie jhelie merged commit 41726f5 into main Aug 29, 2023
@jhelie jhelie deleted the kaeluka/add-provenance-to-metadata branch August 29, 2023 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants