Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: Add support for pretty-printed alert provenance in tests #16210

Merged
merged 8 commits into from
Jun 7, 2024

Conversation

aschackmull
Copy link
Contributor

This adds the list of referenced models in a qltest and renumbers the MaD ids to get stable test output. Where applicable, I've updated one test per language to demonstrate the conversion.

* @kind path-problem
*/

import codeql.ruby.AST

Check warning

Code scanning / CodeQL

Redundant import Warning test

Redundant import, the module is already imported inside
codeql.ruby.security.CommandInjectionQuery
.
Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this!

|
sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance, madId) and
model =
"Source: " + namespace + "; " + type + "; " + subtypes + "; " + name + "; " + signature + "; "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use ; as separator (this is what we typically do in other places we print models).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string is only for inclusion in test output, and I found that it was more readable with the space included. Originally, before we had MaD rows in external yml files, I went with the no-space separation for the QL embedded csv rows in the name of compactness, but that's not relevant here.

|
sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance, madId) and
model =
"Sink: " + namespace + "; " + type + "; " + subtypes + "; " + name + "; " + signature + "; " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance,
madId) and
model =
"Summary: " + namespace + "; " + type + "; " + subtypes + "; " + name + "; " + signature +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

)
}

query predicate edges(PathNode a, PathNode b, string key, string val) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for not printing the model itself instead of its "translated" id? Even though it is less likely, we would still need to update tests in case models suddenly swap order (not sure how stable the QlBuiltsins::ExtensionId is)? Furthermore, it might be easier to "identify" the model used by the edge from MaD format instead of an integer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the translated ids could change in case a new edge is added that uses a model, which changes (pushes) the ranking (compared to the last time the test was run). This could mean that a large "unrelated" part of an expected file needs to be updated (which might cause confusion)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this could be a source of irrelevant test changes, which we want to do our best to avoid. The current PR is an improvement, but Michael's suggestion of printing the whole model, while verbose, sounds like it would totally avoid the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe order swapping to be very likely, although @dbartol knows the full details for a more precise answer to that.
I do realise that there's shifting going on when a test changes the set of models that it's referring to, but I thought it was nicer to have the models on the side in order to have a narrower edges relation (narrow in the sense that the row in the .expected file fits within the width of a reasonably sized editor window). And balancing these two things, I think I'd prefer narrower edges over no shifting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a valid argument. I am just worried that "interesting" updates to the expected file could drown in noise. Also, it introduces added complexity to the expected files that one needs to know of the possibility of shifting.
It is ok with me, if we stick with the current solution, if we are willing to change it in case we frequently run into the problem above.

@owen-mc
Copy link
Contributor

owen-mc commented May 31, 2024

Just to be clear, applying this so that we don't get spurious changes in edge provenance when a new model is added will pretty much mean getting rid of .qlref tests, and to avoid duplicating logic this may involve moving more code out of .ql files. That is a lot of work, but there are other reasons why we want to do it as well (for inline expectations tests).

@aschackmull aschackmull force-pushed the dataflow/provenance-for-tests branch from 7fddee9 to d6fc62a Compare June 3, 2024 08:18
@aschackmull
Copy link
Contributor Author

I've fixed the merge conflicts, so if you want we can merge this.

@aschackmull aschackmull force-pushed the dataflow/provenance-for-tests branch from d6fc62a to eda5073 Compare June 4, 2024 06:18
@aschackmull aschackmull force-pushed the dataflow/provenance-for-tests branch from eda5073 to 7e980d9 Compare June 7, 2024 09:48
@aschackmull
Copy link
Contributor Author

Rebased again.

hvitved
hvitved previously approved these changes Jun 7, 2024
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an acceptable workaround until we have proper post-processing support.

Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go LGTM

@aschackmull aschackmull merged commit 32260e2 into github:main Jun 7, 2024
51 of 53 checks passed
@aschackmull aschackmull deleted the dataflow/provenance-for-tests branch June 7, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants