Skip to content

Java/Dataflow: Add new light-weight data flow api and use it in XmlParsers #14268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 21, 2023

Conversation

aschackmull
Copy link
Contributor

XmlParsers.qll currently use 12 instances of data flow. Inter-procedural flow is generally needed, but it doesn't have to be fancy, as witnessed by the fact that field flow was disabled in all 12 instances. As such, the full data flow library is huge overkill and puts undue stress on compilation (ram usage, DIL size, and RA size).

This PR adds an experimental light-weight data flow api internally based on type-tracking, and switches XmlParsers to use that instead. This cuts the DIL size of XXE.ql in half (from 495535 lines to 246580 lines). I've tested each of the 12 flow configs on the MRVA top 1000 and there are only a small handful of result differences - except for one case where 4 flow tuples are removed, it's solely a case of some slight additional flow. The light-weight api has slightly lower precision for things like virtual dispatch and lack of type-pruning, but it compared to the current state in XmlParsers.qll it adds support for access paths of length 1, so that's likely the cause of the few additional flow tuples.

There's one additional step through ThreadLocals that's (temporarily) removed from one of the flow configurations, but really this ought to be a general-purpose step, so I plan to add it back as a general model in follow-up work. There's a qltest that tracks its absence.

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+    Java Standard Library,``java.*``,3,692,201,76,,9,,,18
-    Totals,,287,18883,2198,315,16,122,33,1,401
+    Totals,,287,18884,2198,315,16,122,33,1,401
  • Changes to framework-coverage-java.csv:
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+    Java Standard Library,``java.*``,3,692,201,76,,9,,,18
-    Totals,,287,18883,2198,315,16,122,33,1,401
+    Totals,,287,18884,2198,315,16,122,33,1,401
  • Changes to framework-coverage-java.csv:
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37

@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Sep 20, 2023
Copy link
Contributor

@atorralba atorralba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java-specific part LGTM, with one comment. The shared part looks plausible to me, but will probably need a more educated review from someone else.

@@ -628,15 +554,22 @@ class XmlReaderConfig extends ParserConfig {
}
}

private module ExplicitlySafeXmlReaderFlowConfig implements DataFlow::ConfigSig {
deprecated private module ExplicitlySafeXmlReaderFlowConfig implements DataFlow::ConfigSig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for deprecating these private modules instead of simply using the new SimpleGlobal flows elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because they are transitively used in a non-private member predicate that I also deprecated. I then also marked the private declarations that we eventually want to remove as deprecated as well in order to make it clear that they shouldn't be accidentally used for anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Wouldn't it be simpler to use the new flows in the non-private member predicate instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to use the new flows in the non-private member predicate instead?

That's certainly an alternative, but the point is that with the new flows we're only calculating sink-reachability and not source-sink pairs, so the couple of places where we accidentally exposed a public member predicate that links the sink to the source, then we need an additional computation. So instead of freshly introducing this via the light-weight api only to use it in a deprecated predicate, then I thought it was simpler to leave the old implementation.

I don't think it matters that much though, as it's all inside a now-deprecated and very likely completely unused predicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Makes sense, I just mentioned it because it seemed that we could remove even more code if we did it that way, but as you said it doesn't matter much indeed. Let's leave it as it is.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data flow changes LGTM.

@@ -52,8 +53,82 @@ module MakeImplCommon<InputSig Lang> {
class FeatureEqualSourceSinkCallContext extends FlowFeature, TFeatureEqualSourceSinkCallContext {
override string toString() { result = "FeatureEqualSourceSinkCallContext" }
}

signature predicate sourceNode(Node n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs QL doc.

predicate levelStepNoCall(Node n1, LocalSourceNode n2) { none() }

predicate levelStepCall(Node n1, LocalSourceNode n2) {
argumentValueFlowsThrough(n1, TReadStepTypesNone(), n2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argumentValueFlowsThrough needs to be cached now.

Comment on lines +115 to +117
predicate withContentStep(Node nodeFrom, LocalSourceNode nodeTo, ContentFilter f) { none() }

predicate withoutContentStep(Node nodeFrom, LocalSourceNode nodeTo, ContentFilter f) { none() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we only have their point-wise equivalents available, so doing something a bit more principled with these is left as potential future work.

@aschackmull aschackmull force-pushed the java/xmlparsers-typetrack branch from 4c6c9d0 to 3dadfa2 Compare September 21, 2023 09:52
@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+    Java Standard Library,``java.*``,3,692,201,76,,9,,,18
-    Totals,,287,18883,2198,315,16,122,33,1,401
+    Totals,,287,18884,2198,315,16,122,33,1,401
  • Changes to framework-coverage-java.csv:
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+    Java Standard Library,``java.*``,3,692,201,76,,9,,,18
-    Totals,,287,18883,2198,315,16,122,33,1,401
+    Totals,,287,18884,2198,315,16,122,33,1,401
  • Changes to framework-coverage-java.csv:
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37

@@ -1,4 +1,6 @@
private import codeql.dataflow.DataFlow
private import codeql.typetracking.TypeTracking as Tt

Check warning

Code scanning / CodeQL

Names only differing by case

Tt is only different by casing from TT that is used elsewhere for modules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFlow Library Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants