Skip to content

C++: Fix join order in cpp/missing-check-scanf #12061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 2, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions cpp/ql/src/Critical/MissingCheckScanf.ql
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ private predicate fwdFlow(Instruction instr, ValueNumber vn) {
*/
pragma[nomagic]
predicate revFlow(Instruction instr, ValueNumber vn) {
fwdFlow(instr, vn) and
fwdFlow(instr, pragma[only_bind_out](vn)) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have imagined that the right join order here would be to always start with fwdFlow since there are most likely a lot more sinks for this query than there are sources (although I might be wrong on this). But this still allows revFlow -> getASuccessor -> fwdFlow, right? Is that the intention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way you get (revFlow#delta minus isBarrier) join getASuccessor on succ, whereas before it was (revFlow#delta minus isBarrier) join fwdFlow on vn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my understanding was we wanted an early join with getASuccessor, but we were happy to let the optimizer pick what it joined it with (fwdFlow or revFlow). As long as it isn't joining fwdFlow with revFlow any more.

Copy link
Contributor

@MathiasVP MathiasVP Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can totally see why joining on vn would be bad. But is it not even better to start with fwdFlow? So that we get something like fwdFlow -> getASuccessor -> revFlow#delta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you start optimizing join orders that aren't problematic any more (rather than let the optimizer do its job), you open a very, very large can of worms...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... though if you can do it still with just one pragma, I will be happy with that. :)

Copy link
Contributor

@MathiasVP MathiasVP Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just suggesting that we try the alternative join order and look at the tuple counts on the projects you've been checking today. Now that we're modifying the order we might as well get the right one the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I was too impatient to run the whole query "before" due to how long it took, so I don't have before/after tuple counts :P But for example, quick-eval'ing revFlow went from 7 minutes down to 8 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally understandable! I often also just post a partial "before" run (for example here) since it's often not even possible to fully evaluate the problematic pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this alternative join order explored at all?

(
isSink(instr, _, vn)
or
Expand All @@ -126,7 +126,7 @@ class Node extends MkNode {

final string toString() { result = instr.toString() }

final Node getASuccessor() { result = MkNode(instr.getASuccessor(), vn) }
final Node getASuccessor() { result = MkNode(pragma[only_bind_out](instr.getASuccessor()), vn) }

final Location getLocation() { result = instr.getLocation() }
}
Expand Down Expand Up @@ -167,7 +167,7 @@ predicate hasFlow(
) {
exists(ValueNumber vn |
isSource(call, index, source, vn, _) and
hasFlow(getNode(source, vn), getNode(sink, vn)) and
hasFlow(getNode(source, pragma[only_bind_into](vn)), getNode(sink, pragma[only_bind_into](vn))) and
isSink(sink, access, vn)
)
}
Expand Down