Skip to content

C++: Make InitializeParameter and Uninitialized return memory results #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 21, 2018

Conversation

dave-bartolomeo
Copy link
Contributor

The IR avoids having non-trivially-copyable and non-trivially-assignable types in register results, because objects of those types need to exist at a particular memory location. The InitializeParameter and Uninitialized instructions were violating this restriction because they returned register results, which were then stored into the destination location via a Store.

This change makes those two instructions take the destination address as an operand, and return a memory result representing the (un-)initialized memory, removing the need for a separate Store instruction.

Note: This PR is rebased on top of the commits for pending PR #67. Only commit 56db3c3 needs to be reviewed. Once #67 is merged, the other commits should disappear from this PR. I'll remove the WIP label once that happens.

@dave-bartolomeo dave-bartolomeo added C++ WIP This is a work-in-progress, do not merge yet! labels Aug 18, 2018
@jbj
Copy link
Contributor

jbj commented Aug 20, 2018

#67 is merged. To make GitHub forget the overlapping commits, you may have to close and re-open. Or maybe rebase.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't worked with InitializeParameter and Uninitialized before, so please help me understand them. Why do they work in a different way than UnmodeledDefinition? To me it seems like they are describing the same sort of thing: an SSA definition originates from a location that is external to the function we are analysing. For InitializeParameter and Uninitialized we know something extra about where the value came from, and this might be useful to other analyses, but why does that mean the memory edges need to be wired differently?

@dave-bartolomeo
Copy link
Contributor Author

I think this change actually makes Uninitialized and InitializeParameter more like UnmodeledDefinition. UnmodeledDefinition already produces a memory result. This change makes InitializeParameter and Uninitialized do the same. All of these represent memory locations whose contents were defined external to the function.

@dave-bartolomeo
Copy link
Contributor Author

To add the additional background you asked for:
InitializeParameter represents the initialization of the parameter with the argument value, which happens at the call site. The IR-based dataflow library will connect interprocedural flow between the definition of the argument in the caller and the corresponding InitializeParameter result in the callee.
Uninitialized is used to provide a definition for a local variable that does not have an initializer (implicit or explicit).

…ults

The IR avoids having non-trivially-copyable and non-trivially-assignable types in register results, because objects of those types need to exist at a particular memory location. The `InitializeParameter` and `Uninitialized` instructions were violating this restriction because they returned register results, which were then stored into the destination location via a `Store`.

This change makes those two instructions take the destination address as an operand, and return a memory result representing the (un-)initialized memory, removing the need for a separate `Store` instruction.
@dave-bartolomeo dave-bartolomeo removed the WIP This is a work-in-progress, do not merge yet! label Aug 20, 2018
@jbj
Copy link
Contributor

jbj commented Aug 21, 2018

Thanks for the clarification. I think my confusion is over whether I should conceptually think of these instructions as storing an unknown value into the variable or as terminators of the loose source ends in the SSA graph. I agree that this change brings them closer to the latter.

Looking at the test output in its new and helpful format (:heart: #65), a function with a parameter x changes from this:

#   50|     r0_2(int)          = InitializeParameter[x] : 
#   50|     r0_3(glval<int>)   = VariableAddress[x]     : 
#   50|     m0_4(int)          = Store                  : r0_3, r0_2 

to this:

#   50|     r0_2(glval<int>)   = VariableAddress[x]     : 
#   50|     m0_3(int)          = InitializeParameter[x] : r0_2 

That's an improvement, but I don't understand why x needs to be mentioned twice. If I shouldn't think of InitializeParameter as storing something at an address, maybe we can simplify it even further to this:

#   50|     m0_3(int)          = InitializeParameter[x] :

@dave-bartolomeo
Copy link
Contributor Author

In order to do alias analysis and build SSA, we need to know what memory location (or locations) may be defined by the result of each instruction. There are currently four kinds of memory access that alias analysis knows about (from "MemoryAccessKind.qll": IndirectMemoryAccess, PhiMemoryAccess, EscapedMemoryAccess, and UnmodeledMemoryAccess. IndirectMemoryAccess is used wherever possible; the other three are more-or-less unavoidable special cases. IndirectMemoryAccess requires one of the operands of the instruction to be the address of the memory location being referenced.

Merging the VariableAddress+InitializeParameter pair into a single InitializeParameter means that we'd have to add another special case to the alias analysis, because we would no longer have the address as an operand. Keeping it as an indirect access is consistent with our existing approach of handling all variable accesses as indirect. If we ever decided that accessing a local variable should be a single LoadVariable or StoreVariable instruction, rather than a VariableAddress+Load|Store combo, we would want to remove the VariableAddress from the InitializeParameter sequence as well for consistency.

If we wanted to only mention x once, we would either have to leave it off of the InitializeParameter, or leave it off of the VariableAddress. Leaving it off the VariableAddress would be inconsistent with other usage of VariableAddress. Leaving it off of InitializeParameter would be OK. However, any interprocedural data flow, and probably some intraprocedural stuff too, would want to easily map from the Parameter to the corresponding InitializeParameter that defines its value, so we'd want a getParameter() predicate on it anyway.

I'd like to reconsider the whole "all memory accesses are indirect" decision after we've written one or two real queries on the IR. My original theory was that making everything indirect would mean fewer cases to handle in anything that used the IR. If it turns out that our real usage of the IR doesn't need to make that distinction, and only the guts of alias analysis and SSA construction care, then I'd probably be OK with introducing direct loads and stores of variables and fields to make the IR a bit smaller.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. It sounds right that we should wait and see how these memory access kinds work out in practice.

@jbj jbj merged commit 2481bc7 into github:master Aug 21, 2018
@dave-bartolomeo dave-bartolomeo deleted the dave/InitMemory branch September 5, 2018 18:48
aibaars added a commit that referenced this pull request Oct 14, 2021
smowton pushed a commit to smowton/codeql that referenced this pull request Dec 6, 2021
…ion-repeats

Don't abort external class extraction after first duplicate
dbartol pushed a commit that referenced this pull request Dec 18, 2024
feat(queries): Improve Output Clobbering query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants