Dataflow: Add support for pretty-printed alert provenance in tests #16210

aschackmull · 2024-04-15T11:33:33Z

This adds the list of referenced models in a qltest and renumbers the MaD ids to get stable test output. Where applicable, I've updated one test per language to demonstrate the conversion.

ruby/ql/test/query-tests/security/cwe-078/CommandInjection/CommandInjection.ql

+ * @kind path-problem
+ */
+
+import codeql.ruby.AST


michaelnebel

Thank you for doing this!

michaelnebel · 2024-04-15T13:24:47Z

csharp/ql/lib/semmle/code/csharp/dataflow/internal/ExternalFlow.qll

+  |
+    sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance, madId) and
+    model =
+      "Source: " + namespace + "; " + type + "; " + subtypes + "; " + name + "; " + signature + "; "


Maybe just use ; as separator (this is what we typically do in other places we print models).

This string is only for inclusion in test output, and I found that it was more readable with the space included. Originally, before we had MaD rows in external yml files, I went with the no-space separation for the QL embedded csv rows in the name of compactness, but that's not relevant here.

michaelnebel · 2024-04-15T13:24:59Z

csharp/ql/lib/semmle/code/csharp/dataflow/internal/ExternalFlow.qll

+  |
+    sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance, madId) and
+    model =
+      "Sink: " + namespace + "; " + type + "; " + subtypes + "; " + name + "; " + signature + "; " +


same comment

michaelnebel · 2024-04-15T13:25:07Z

csharp/ql/lib/semmle/code/csharp/dataflow/internal/ExternalFlow.qll

+    summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance,
+      madId) and
+    model =
+      "Summary: " + namespace + "; " + type + "; " + subtypes + "; " + name + "; " + signature +


same comment

michaelnebel · 2024-04-15T13:39:02Z

shared/dataflow/codeql/dataflow/test/ProvenancePathGraph.qll

+    )
+  }
+
+  query predicate edges(PathNode a, PathNode b, string key, string val) {


Is there a reason for not printing the model itself instead of its "translated" id? Even though it is less likely, we would still need to update tests in case models suddenly swap order (not sure how stable the QlBuiltsins::ExtensionId is)? Furthermore, it might be easier to "identify" the model used by the edge from MaD format instead of an integer.

Also, the translated ids could change in case a new edge is added that uses a model, which changes (pushes) the ranking (compared to the last time the test was run). This could mean that a large "unrelated" part of an expected file needs to be updated (which might cause confusion)

I agree that this could be a source of irrelevant test changes, which we want to do our best to avoid. The current PR is an improvement, but Michael's suggestion of printing the whole model, while verbose, sounds like it would totally avoid the problem.

I don't believe order swapping to be very likely, although @dbartol knows the full details for a more precise answer to that.
I do realise that there's shifting going on when a test changes the set of models that it's referring to, but I thought it was nicer to have the models on the side in order to have a narrower edges relation (narrow in the sense that the row in the .expected file fits within the width of a reasonably sized editor window). And balancing these two things, I think I'd prefer narrower edges over no shifting.

That is a valid argument. I am just worried that "interesting" updates to the expected file could drown in noise. Also, it introduces added complexity to the expected files that one needs to know of the possibility of shifting.
It is ok with me, if we stick with the current solution, if we are willing to change it in case we frequently run into the problem above.

owen-mc · 2024-05-31T14:37:38Z

Just to be clear, applying this so that we don't get spurious changes in edge provenance when a new model is added will pretty much mean getting rid of .qlref tests, and to avoid duplicating logic this may involve moving more code out of .ql files. That is a lot of work, but there are other reasons why we want to do it as well (for inline expectations tests).

aschackmull · 2024-06-03T08:20:29Z

I've fixed the merge conflicts, so if you want we can merge this.

…vert one test.

…test.

aschackmull · 2024-06-07T09:48:26Z

Rebased again.

hvitved

This is an acceptable workaround until we have proper post-processing support.

owen-mc

Go LGTM

aschackmull added the no-change-note-required This PR does not need a change note label Apr 15, 2024

aschackmull requested review from a team as code owners April 15, 2024 11:33

github-actions bot added C# JS Java Python Go Ruby DataFlow Library labels Apr 15, 2024

aschackmull mentioned this pull request Apr 15, 2024

Dataflow: Support alert provenance #15501

Merged

github-advanced-security bot found potential problems Apr 15, 2024

View reviewed changes

michaelnebel reviewed Apr 15, 2024

View reviewed changes

aschackmull force-pushed the dataflow/provenance-for-tests branch from 7fddee9 to d6fc62a Compare June 3, 2024 08:18

aschackmull force-pushed the dataflow/provenance-for-tests branch from d6fc62a to eda5073 Compare June 4, 2024 06:18

aschackmull added 7 commits June 7, 2024 11:45

Dataflow/Java: Add support for pretty-printed provenace in tests. Con…

4ec4da4

…vert one test.

C#: Add support for pretty-printed provenace in tests. Convert one test.

0e8d72c

Go: Add support for pretty-printed provenace in tests. Convert one test.

a26c01d

Ruby: Add support for pretty-printed provenace in tests. Convert one …

5d51b5b

…test.

Python: Add support for pretty-printed provenace in tests.

68ddae2

Javascript: Add support for pretty-printed provenace in tests.

0c47203

Add a bit more qldoc.

7e980d9

aschackmull force-pushed the dataflow/provenance-for-tests branch from eda5073 to 7e980d9 Compare June 7, 2024 09:48

hvitved previously approved these changes Jun 7, 2024

View reviewed changes

Go: Fix test failure.

9b1e4d7

aschackmull dismissed hvitved’s stale review via 9b1e4d7 June 7, 2024 11:16

hvitved approved these changes Jun 7, 2024

View reviewed changes

owen-mc approved these changes Jun 7, 2024

View reviewed changes

aschackmull merged commit 32260e2 into github:main Jun 7, 2024
51 of 53 checks passed

aschackmull deleted the dataflow/provenance-for-tests branch June 7, 2024 12:54

aschackmull mentioned this pull request Jul 18, 2024

Shared: Add support for provenance pretty-printing as a qltest postprocess step. #17011

Merged

Dataflow: Add support for pretty-printed alert provenance in tests #16210

Dataflow: Add support for pretty-printed alert provenance in tests #16210

Uh oh!

Conversation

aschackmull commented Apr 15, 2024

Uh oh!

Check warning

Uh oh!

michaelnebel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

owen-mc commented May 31, 2024

Uh oh!

aschackmull commented Jun 3, 2024

Uh oh!

aschackmull commented Jun 7, 2024

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

owen-mc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!