Skip to content

Conversation

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Aug 21, 2023

In C/C++ dataflow we reuse some dataflow nodes that we know are sementically equivalent to gain a small amount of performance, and to make it easier to select the right dataflow node in sources, sinks, barriers, etc. For example, consider the following snippet of IR:

r1(glval<char **>) = VariableAddress[a]          :
r2(char **)        = Load[a]                     : &:r1, m1
r3(glval<char *>)  = CopyValue                   : r2

For dataflow we have a node that represents &:r1 (i.e, the value of type char***), as well as nodes for the possible indirections of &:r1: *&:r1 (i.e., the value of type char**), **&:r1 (i.e, the value of type char*), and ***&:r1 (i.e., the value of type char).

Similarly, we have a node that represents the r2 operand (i.e., the value of type char**), as well as nodes for the possible indirections of r2: *r2 (i.e., the value of type char*), and **r2 (i.e., the value of type char).

However, some of these nodes represent the semantically same value. For example, both *&:r1 and r2 represent the char** value (i.e., the value you get when you dereference VariableAddress[a]).

The getIRRepresentationOfIndirectOperand and getIRRepresentationOfIndirectInstruction predicates are used to identify such semantically identical representations.

Currently, however, getIRRepresentationOfIndirectOperand and getIRRepresentationOfIndirectInstruction only identified that *&:r1 and r2 represented the same value. But **&:r1 and *r2, and ***&:r1 and **r2 also represent the same values. This PR extends those predicates to identify these cases.

This should (hopefully) give us a small speedup, a reduced memory footprint, and possible even remove some FPs in queries in cases where we selected a equivalent (but until now not identical!) node to the one on the actual dataflow path.

@github-actions github-actions bot added the C++ label Aug 21, 2023
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Aug 21, 2023
@MathiasVP MathiasVP force-pushed the reuse-even-more-nodes branch from 15c3a9e to 99cc417 Compare August 29, 2023 13:12
@MathiasVP MathiasVP marked this pull request as ready for review August 30, 2023 09:13
@MathiasVP MathiasVP requested a review from a team as a code owner August 30, 2023 09:13
@MathiasVP MathiasVP added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Aug 30, 2023
@MathiasVP
Copy link
Contributor Author

MathiasVP commented Aug 30, 2023

This PR is finally good to go. There's a couple of test changes in the internal repo that I need to accept as well. I'll do that once this PR has been reviewed.

DCA looks good. I've verified that all the lost results are instances of the test added in f65fe34.

@MathiasVP MathiasVP force-pushed the reuse-even-more-nodes branch from b42b8c9 to b092da4 Compare August 30, 2023 10:27
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core change looks plausible to me. I can't comment on performance but the tests show considerable deduplication. 🎉

A few comments / questions...

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

DCA shows a speedup.

I've verified that all the lost results are instances of the test added in f65fe34.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants