-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Java/Dataflow: Add new light-weight data flow api and use it in XmlParsers #14268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java/Dataflow: Add new light-weight data flow api and use it in XmlParsers #14268
Conversation
Click to show differences in coveragejavaGenerated file changes for java
- Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+ Java Standard Library,``java.*``,3,692,201,76,,9,,,18
- Totals,,287,18883,2198,315,16,122,33,1,401
+ Totals,,287,18884,2198,315,16,122,33,1,401
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37 |
Click to show differences in coveragejavaGenerated file changes for java
- Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+ Java Standard Library,``java.*``,3,692,201,76,,9,,,18
- Totals,,287,18883,2198,315,16,122,33,1,401
+ Totals,,287,18884,2198,315,16,122,33,1,401
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Java-specific part LGTM, with one comment. The shared part looks plausible to me, but will probably need a more educated review from someone else.
@@ -628,15 +554,22 @@ class XmlReaderConfig extends ParserConfig { | |||
} | |||
} | |||
|
|||
private module ExplicitlySafeXmlReaderFlowConfig implements DataFlow::ConfigSig { | |||
deprecated private module ExplicitlySafeXmlReaderFlowConfig implements DataFlow::ConfigSig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for deprecating these private modules instead of simply using the new SimpleGlobal
flows elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because they are transitively used in a non-private member predicate that I also deprecated. I then also marked the private declarations that we eventually want to remove as deprecated as well in order to make it clear that they shouldn't be accidentally used for anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Wouldn't it be simpler to use the new flows in the non-private member predicate instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to use the new flows in the non-private member predicate instead?
That's certainly an alternative, but the point is that with the new flows we're only calculating sink-reachability and not source-sink pairs, so the couple of places where we accidentally exposed a public member predicate that links the sink to the source, then we need an additional computation. So instead of freshly introducing this via the light-weight api only to use it in a deprecated predicate, then I thought it was simpler to leave the old implementation.
I don't think it matters that much though, as it's all inside a now-deprecated and very likely completely unused predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Makes sense, I just mentioned it because it seemed that we could remove even more code if we did it that way, but as you said it doesn't matter much indeed. Let's leave it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data flow changes LGTM.
@@ -52,8 +53,82 @@ module MakeImplCommon<InputSig Lang> { | |||
class FeatureEqualSourceSinkCallContext extends FlowFeature, TFeatureEqualSourceSinkCallContext { | |||
override string toString() { result = "FeatureEqualSourceSinkCallContext" } | |||
} | |||
|
|||
signature predicate sourceNode(Node n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs QL doc.
predicate levelStepNoCall(Node n1, LocalSourceNode n2) { none() } | ||
|
||
predicate levelStepCall(Node n1, LocalSourceNode n2) { | ||
argumentValueFlowsThrough(n1, TReadStepTypesNone(), n2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argumentValueFlowsThrough
needs to be cached
now.
predicate withContentStep(Node nodeFrom, LocalSourceNode nodeTo, ContentFilter f) { none() } | ||
|
||
predicate withoutContentStep(Node nodeFrom, LocalSourceNode nodeTo, ContentFilter f) { none() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these not implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we only have their point-wise equivalents available, so doing something a bit more principled with these is left as potential future work.
4c6c9d0
to
3dadfa2
Compare
Click to show differences in coveragejavaGenerated file changes for java
- Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+ Java Standard Library,``java.*``,3,692,201,76,,9,,,18
- Totals,,287,18883,2198,315,16,122,33,1,401
+ Totals,,287,18884,2198,315,16,122,33,1,401
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37 |
Click to show differences in coveragejavaGenerated file changes for java
- Java Standard Library,``java.*``,3,691,201,76,,9,,,18
+ Java Standard Library,``java.*``,3,692,201,76,,9,,,18
- Totals,,287,18883,2198,315,16,122,33,1,401
+ Totals,,287,18884,2198,315,16,122,33,1,401
- java.lang,31,,93,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,36
+ java.lang,31,,94,,13,,,,,,,,,,,,8,,,5,,,4,,,1,,,,,,,,,,,,,,57,37 |
XmlParsers.qll currently use 12 instances of data flow. Inter-procedural flow is generally needed, but it doesn't have to be fancy, as witnessed by the fact that field flow was disabled in all 12 instances. As such, the full data flow library is huge overkill and puts undue stress on compilation (ram usage, DIL size, and RA size).
This PR adds an experimental light-weight data flow api internally based on type-tracking, and switches XmlParsers to use that instead. This cuts the DIL size of XXE.ql in half (from 495535 lines to 246580 lines). I've tested each of the 12 flow configs on the MRVA top 1000 and there are only a small handful of result differences - except for one case where 4 flow tuples are removed, it's solely a case of some slight additional flow. The light-weight api has slightly lower precision for things like virtual dispatch and lack of type-pruning, but it compared to the current state in XmlParsers.qll it adds support for access paths of length 1, so that's likely the cause of the few additional flow tuples.
There's one additional step through ThreadLocals that's (temporarily) removed from one of the flow configurations, but really this ought to be a general-purpose step, so I plan to add it back as a general model in follow-up work. There's a qltest that tracks its absence.