Skip to content

Conversation

@jketema
Copy link
Contributor

@jketema jketema commented Jan 27, 2023

Also revert the changes to default taint tracking and TaintedAllocationSize.ql as these are no longer needed with the above mapping.

Comment on lines -46 to -48
| test1.c:41:11:41:11 | i | test1.c:7:26:7:29 | argv | test1.c:41:11:41:11 | i | An array indexing expression depends on $@ that might be outside the bounds of the array. | test1.c:7:26:7:29 | argv | a command-line argument |
| test1.c:41:11:41:11 | i | test1.c:7:26:7:29 | argv indirection | test1.c:41:11:41:11 | i | An array indexing expression depends on $@ that might be outside the bounds of the array. | test1.c:7:26:7:29 | argv indirection | a command-line argument |
| test1.c:41:11:41:11 | i | test1.c:7:26:7:29 | argv indirection | test1.c:41:11:41:11 | i | An array indexing expression depends on $@ that might be outside the bounds of the array. | test1.c:7:26:7:29 | argv indirection | a command-line argument |
Copy link
Contributor Author

@jketema jketema Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is a regression. However, this is also an FN on main, so I'm not immediately worried by this. Especially since the query actually digs into InstructionNodes.

@jketema jketema marked this pull request as ready for review January 27, 2023 10:32
@jketema jketema requested a review from a team as a code owner January 27, 2023 10:32
@jketema jketema added the no-change-note-required This PR does not need a change note label Jan 27, 2023
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work. LGTM if DCA is happy!

*/
cached
Instruction getIRRepresentationOfOperand(Operand operand) {
operand = unique( | | result.getAUse())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered that, while getAUse is probably not wrong to use here, we might actually be able to reduce the number of nodes even more if we replace result.getAUse() with getAUse(result) (which is a predicate that only finds the uses that are not ignored for dataflow purposes). This will likely result in more unique operands 🎉.

We don't need to do that in this PR (since this PR is already doing wonderful usability improvements as-is), and I think a similar fix could be applied in the indirection case (where I think I made the exact same mistake).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to do a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough trying this gives me consistency errors in the dataflow tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's weird. Which ones are failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataflow tests:

 uniqueNodeToString
+| dispatch.cpp:15:8:15:8 | Middle | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | Middle indirection | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | Middle indirection | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | this | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | this indirection | Node should have one toString but has 2. |
+| dispatch.cpp:15:8:15:8 | this indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | Bottom | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | Bottom indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | Bottom indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | this | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | this indirection | Node should have one toString but has 2. |
+| dispatch.cpp:21:8:21:8 | this indirection | Node should have one toString but has 2. |

Fields tests:

 uniqueNodeToString
+| A.cpp:9:9:9:9 | C1 | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | C1 indirection | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | C1 indirection | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | this | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | this indirection | Node should have one toString but has 2. |
+| A.cpp:9:9:9:9 | this indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | C2 | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | C2 indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | C2 indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | this | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | this indirection | Node should have one toString but has 2. |
+| A.cpp:14:9:14:9 | this indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | C | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | C indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | C indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | this | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | this indirection | Node should have one toString but has 2. |
+| C.cpp:22:3:22:3 | this indirection | Node should have one toString but has 2. |
+| C.cpp:22:9:22:22 | C indirection [post update] | Node should have one toString but has 2. |
+| C.cpp:22:9:22:22 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | Bar | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | Bar indirection | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | Bar indirection | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | this | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:22:3:22:5 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | Bar indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | Bar indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:22:11:22:17 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | Outer indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection [post update] | Node should have one toString but has 2. |
+| complex.cpp:25:7:25:7 | this indirection [post update] | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | LinkedList | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | LinkedList indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | LinkedList indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | this | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | this indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:3:45:12 | this indirection | Node should have one toString but has 2. |
+| conflated.cpp:45:34:45:43 | LinkedList indirection [post update] | Node should have one toString but has 2. |
+| conflated.cpp:45:34:45:43 | this indirection [post update] | Node should have one toString but has 2. |
 missingToString
 parameterCallable
 localFlowIsLocal
@@ -28,7 +76,9 @@
 | aliasing.cpp:77:11:77:11 | definition of w indirection | Node has multiple PostUpdateNodes. |
 | aliasing.cpp:84:11:84:11 | definition of w indirection | Node has multiple PostUpdateNodes. |
 | aliasing.cpp:91:11:91:11 | definition of w indirection | Node has multiple PostUpdateNodes. |
+| complex.cpp:22:3:22:5 | Bar indirection | Node has multiple PostUpdateNodes. |
 | complex.cpp:22:3:22:5 | this indirection | Node has multiple PostUpdateNodes. |
+| complex.cpp:25:7:25:7 | Outer indirection | Node has multiple PostUpdateNodes. |
 | complex.cpp:25:7:25:7 | this indirection | Node has multiple PostUpdateNodes. |
 | complex.cpp:42:10:42:14 | inner indirection | Node has multiple PostUpdateNodes. |
 | complex.cpp:43:10:43:14 | inner indirection | Node has multiple PostUpdateNodes. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They all seem to be related to constructors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple toStrings does sound like something we'd be able to fix, though 😄. That's probably also the reason for the new localFlowIsLocal inconsistencies (since multiple toStrings probably means that a inconsistency is reported multiple times).

Multiple toStrings often happens because a node belongs to several QL classes that each define a toString. Based on this, I think the way forward would be to use getAQlClass to determine the overlapping node classes for some particular entity that belong to multiple classes that both define a toString. We can then figure out if we can merge those two classes in certain cases, or exclude the toString result when they overlap.

If this turns out to be too complicated, feel free to create an issue for it and leave it as future work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely seems fixable, and yeah need to determine what the overlapping classes are.

I'll think I create an issue for it in any case, as this in not the top priority item and not needed to get the feature branch merged. Will revisit once I've re-worked the feature branch to be opt-in.

@MathiasVP
Copy link
Contributor

I just realized that we might now have some local flow steps from Instructions to Operand (i.e., the ones from the simpleOperandLocalFlowStep predicate) that are now the identity relation. They should probably be excluded from the predicate.

@jketema
Copy link
Contributor Author

jketema commented Jan 27, 2023

I'll have a look on Monday.

@MathiasVP
Copy link
Contributor

Another thing I thought of while I couldn't sleep last night (😭): Since Instructions are now also sometimes defined by the TSingleUseOperandNode0 shouldn't we also modify the body of the TInstructionNode0 branch to exclude the instructions which has a single use?

@jketema
Copy link
Contributor Author

jketema commented Jan 31, 2023

I just realized that we might now have some local flow steps from Instructions to Operand (i.e., the ones from the simpleOperandLocalFlowStep predicate) that are now the identity relation. They should probably be excluded from the predicate.

Do you mean something like:

   private predicate simpleOperandLocalFlowStep(Instruction iFrom, Operand opTo) {
     not opTo instanceof MemoryOperand and
+    not iFrom = Ssa::getIRRepresentationOfOperand(opTo) and
     opTo.getDef() = iFrom
   }
 

@MathiasVP
Copy link
Contributor

Yeah, something along those lines.

@jketema
Copy link
Contributor Author

jketema commented Jan 31, 2023

Mmm, I'm now missing a bunch of results. They're all related to argv sources.

@MathiasVP
Copy link
Contributor

Ah, it might be because of my previous comment:

Another thing I thought of while I couldn't sleep last night (😭): Since Instructions are now also sometimes defined by the TSingleUseOperandNode0 shouldn't we also modify the body of the TInstructionNode0 branch to exclude the instructions which has a single use?

since otherwise an Instruction could satisfy both disjuncts in the charpred for InstructionNode:

node.(InstructionNode0).getInstruction() = instr
or
exists(Operand op |
  instr = Ssa::getIRRepresentationOfOperand(op) and
  node.(SingleUseOperandNode0).getOperand() = op
)

@jketema
Copy link
Contributor Author

jketema commented Jan 31, 2023

Ah, it might be because of my previous comment

With:

     TInstructionNode0(Instruction i) {
       not Ssa::ignoreInstruction(i) and
+      not exists(Operand op | i = Ssa::getIRRepresentationOfOperand(op)) and
       // We exclude `void`-typed instructions because they cannot contain data

even more results go missing.

@jketema jketema marked this pull request as draft February 2, 2023 09:24
@jketema
Copy link
Contributor Author

jketema commented Feb 2, 2023

Converting this to draft, while further investigating the problems with this PR.

@jketema
Copy link
Contributor Author

jketema commented Feb 6, 2023

Mmm, I'm now missing a bunch of results. They're all related to argv sources.

I think the issue is that simpleOperandLocalFlowStep is used in two different ways. Once in simpleLocalFlowStep and once in indirectionInstructionFlow. Restricting the steps from instructions to operands probably only applies in the former case? If so, the change from #12004 (comment) either needs to move to simpleLocalFlowStep, or simpleOperandLocalFlowStep needs to know (boolean argument?) whether we're looking at the direct or indirect case.

@MathiasVP
Copy link
Contributor

I think the issue is that simpleOperandLocalFlowStep is used in two different ways. Once in simpleLocalFlowStep and once in indirectionInstructionFlow. Restricting the steps from instructions to operands probably only applies in the former case? If so, the change from #12004 (comment) either needs to move to simpleLocalFlowStep, or simpleOperandLocalFlowStep needs to know (boolean argument?) whether we're looking at the direct or indirect case.

Hm, yes. That's a good point! Either of those solutions sounds good to me.

@jketema
Copy link
Contributor Author

jketema commented Feb 6, 2023

This now implements the restriction to simpleOperandLocalFlowStep. Test changes look a whole lot nicer than before.

Looking into the IPA restriction now. That still loses test results, so needs more digging.

@jketema
Copy link
Contributor Author

jketema commented Feb 7, 2023

Performance looks much better in the latest DCA run than in previous ones. Just need to have a look at the alert changes.

@jketema
Copy link
Contributor Author

jketema commented Feb 9, 2023

Also revert the changes to default taint tracking and TaintedAllocationSize.ql as these are no longer needed with the above mapping.

I've removed this from the PR. It introduced a few new alerts, which seemed to be FPs. I think the conclusion is that that mapping instructions in operands to the same nodes is a usability improvement, but there's of course an edge case with multiple uses where more care might be needed by a query developer.

@jketema
Copy link
Contributor Author

jketema commented Feb 9, 2023

Relevant DCA experiment is not the ultimate one linked from, but the penultimate one (ultimate one was with default taint tracking and TaintedAllocationSize.ql change). As discussed internally, the alert changes for cpp/path-injection look reasonable. The disappearing alerts I've looked at (all PHP ones, and 3 of the 4 systemd ones) are due to the sanitizer now kicking in, where it wasn't before. I spot checked the new git alerts, and they look reasonable.

@jketema jketema requested a review from MathiasVP February 9, 2023 10:09
@jketema jketema marked this pull request as ready for review February 9, 2023 10:09
@jketema
Copy link
Contributor Author

jketema commented Feb 9, 2023

@MathiasVP I re-requested a review, as this has changed substantially from the version you approved.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM! I think we should do the suggested refactoring to save ourselves some annoying debugging in the future 😂.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once DCA is happy!

@jketema
Copy link
Contributor Author

jketema commented Feb 9, 2023

DCA seems unchanged.

@jketema jketema merged commit 9d6098a into github:mathiasvp/replace-ast-with-ir-use-usedataflow Feb 9, 2023
@jketema jketema deleted the single-use branch February 9, 2023 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants