-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Map operand nodes that are only used once onto the related instruction node #12004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Map operand nodes that are only used once onto the related instruction node #12004
Conversation
| | test1.c:41:11:41:11 | i | test1.c:7:26:7:29 | argv | test1.c:41:11:41:11 | i | An array indexing expression depends on $@ that might be outside the bounds of the array. | test1.c:7:26:7:29 | argv | a command-line argument | | ||
| | test1.c:41:11:41:11 | i | test1.c:7:26:7:29 | argv indirection | test1.c:41:11:41:11 | i | An array indexing expression depends on $@ that might be outside the bounds of the array. | test1.c:7:26:7:29 | argv indirection | a command-line argument | | ||
| | test1.c:41:11:41:11 | i | test1.c:7:26:7:29 | argv indirection | test1.c:41:11:41:11 | i | An array indexing expression depends on $@ that might be outside the bounds of the array. | test1.c:7:26:7:29 | argv indirection | a command-line argument | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this is a regression. However, this is also an FN on main, so I'm not immediately worried by this. Especially since the query actually digs into InstructionNodes.
MathiasVP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. LGTM if DCA is happy!
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
| */ | ||
| cached | ||
| Instruction getIRRepresentationOfOperand(Operand operand) { | ||
| operand = unique( | | result.getAUse()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered that, while getAUse is probably not wrong to use here, we might actually be able to reduce the number of nodes even more if we replace result.getAUse() with getAUse(result) (which is a predicate that only finds the uses that are not ignored for dataflow purposes). This will likely result in more unique operands 🎉.
We don't need to do that in this PR (since this PR is already doing wonderful usability improvements as-is), and I think a similar fix could be applied in the indirection case (where I think I made the exact same mistake).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to do a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly enough trying this gives me consistency errors in the dataflow tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's weird. Which ones are failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataflow tests:
uniqueNodeToString
+| dispatch.cpp:15:8:15:8 | Middle | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | Middle indirection | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | Middle indirection | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | this | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | this indirection | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | this indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | Bottom | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | Bottom indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | Bottom indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | this | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | this indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | this indirection | Node should have one toString but has 2. |
Fields tests:
uniqueNodeToString
+| A.cpp:9:9:9:9 | C1 | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | C1 indirection | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | C1 indirection | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | this | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | this indirection | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | this indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | C2 | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | C2 indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | C2 indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | this | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | this indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | this indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | C | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | C indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | C indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | this | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | this indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | this indirection | Node should have one toString but has 2. |
+| C.cpp:22:9:22:22 | C indirection [post update] | Node should have one toString but has 2. |
+| C.cpp:22:9:22:22 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | Bar | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | Bar indirection | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | Bar indirection | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | this | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | Bar indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | Bar indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection [post update] | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | LinkedList | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | LinkedList indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | LinkedList indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | this | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | this indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | this indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:34:45:43 | LinkedList indirection [post update] | Node should have one toString but has 2. |
+| conflated.cpp:45:34:45:43 | this indirection [post update] | Node should have one toString but has 2. |
missingToString
parameterCallable
localFlowIsLocal
@@ -28,7 +76,9 @@
| aliasing.cpp:77:11:77:11 | definition of w indirection | Node has multiple PostUpdateNodes. |
| aliasing.cpp:84:11:84:11 | definition of w indirection | Node has multiple PostUpdateNodes. |
| aliasing.cpp:91:11:91:11 | definition of w indirection | Node has multiple PostUpdateNodes. |
+| complex.cpp:22:3:22:5 | Bar indirection | Node has multiple PostUpdateNodes. |
| complex.cpp:22:3:22:5 | this indirection | Node has multiple PostUpdateNodes. |
+| complex.cpp:25:7:25:7 | Outer indirection | Node has multiple PostUpdateNodes. |
| complex.cpp:25:7:25:7 | this indirection | Node has multiple PostUpdateNodes. |
| complex.cpp:42:10:42:14 | inner indirection | Node has multiple PostUpdateNodes. |
| complex.cpp:43:10:43:14 | inner indirection | Node has multiple PostUpdateNodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They all seem to be related to constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple toStrings does sound like something we'd be able to fix, though 😄. That's probably also the reason for the new localFlowIsLocal inconsistencies (since multiple toStrings probably means that a inconsistency is reported multiple times).
Multiple toStrings often happens because a node belongs to several QL classes that each define a toString. Based on this, I think the way forward would be to use getAQlClass to determine the overlapping node classes for some particular entity that belong to multiple classes that both define a toString. We can then figure out if we can merge those two classes in certain cases, or exclude the toString result when they overlap.
If this turns out to be too complicated, feel free to create an issue for it and leave it as future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely seems fixable, and yeah need to determine what the overlapping classes are.
I'll think I create an issue for it in any case, as this in not the top priority item and not needed to get the feature branch merged. Will revisit once I've re-worked the feature branch to be opt-in.
|
I just realized that we might now have some local flow steps from |
|
I'll have a look on Monday. |
|
Another thing I thought of while I couldn't sleep last night (😭): Since |
Do you mean something like: private predicate simpleOperandLocalFlowStep(Instruction iFrom, Operand opTo) {
not opTo instanceof MemoryOperand and
+ not iFrom = Ssa::getIRRepresentationOfOperand(opTo) and
opTo.getDef() = iFrom
}
|
|
Yeah, something along those lines. |
|
Mmm, I'm now missing a bunch of results. They're all related to |
|
Ah, it might be because of my previous comment:
since otherwise an node.(InstructionNode0).getInstruction() = instr
or
exists(Operand op |
instr = Ssa::getIRRepresentationOfOperand(op) and
node.(SingleUseOperandNode0).getOperand() = op
) |
With: TInstructionNode0(Instruction i) {
not Ssa::ignoreInstruction(i) and
+ not exists(Operand op | i = Ssa::getIRRepresentationOfOperand(op)) and
// We exclude `void`-typed instructions because they cannot contain dataeven more results go missing. |
|
Converting this to draft, while further investigating the problems with this PR. |
I think the issue is that |
Hm, yes. That's a good point! Either of those solutions sounds good to me. |
|
This now implements the restriction to Looking into the IPA restriction now. That still loses test results, so needs more digging. |
|
Performance looks much better in the latest DCA run than in previous ones. Just need to have a look at the alert changes. |
I've removed this from the PR. It introduced a few new alerts, which seemed to be FPs. I think the conclusion is that that mapping instructions in operands to the same nodes is a usability improvement, but there's of course an edge case with multiple uses where more care might be needed by a query developer. |
|
Relevant DCA experiment is not the ultimate one linked from, but the penultimate one (ultimate one was with default taint tracking and |
|
@MathiasVP I re-requested a review, as this has changed substantially from the version you approved. |
MathiasVP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes LGTM! I think we should do the suggested refactoring to save ourselves some annoying debugging in the future 😂.
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once DCA is happy!
|
DCA seems unchanged. |
Also revert the changes to default taint tracking and
TaintedAllocationSize.qlas these are no longer needed with the above mapping.