C++: Make `InitializeParameter` and `Uninitialized` return memory results #72

dave-bartolomeo · 2018-08-18T17:28:19Z

The IR avoids having non-trivially-copyable and non-trivially-assignable types in register results, because objects of those types need to exist at a particular memory location. The InitializeParameter and Uninitialized instructions were violating this restriction because they returned register results, which were then stored into the destination location via a Store.

This change makes those two instructions take the destination address as an operand, and return a memory result representing the (un-)initialized memory, removing the need for a separate Store instruction.

Note: This PR is rebased on top of the commits for pending PR #67. Only commit 56db3c3 needs to be reviewed. Once #67 is merged, the other commits should disappear from this PR. I'll remove the WIP label once that happens.

jbj · 2018-08-20T14:10:58Z

#67 is merged. To make GitHub forget the overlapping commits, you may have to close and re-open. Or maybe rebase.

jbj

I haven't worked with InitializeParameter and Uninitialized before, so please help me understand them. Why do they work in a different way than UnmodeledDefinition? To me it seems like they are describing the same sort of thing: an SSA definition originates from a location that is external to the function we are analysing. For InitializeParameter and Uninitialized we know something extra about where the value came from, and this might be useful to other analyses, but why does that mean the memory edges need to be wired differently?

dave-bartolomeo · 2018-08-20T14:49:58Z

I think this change actually makes Uninitialized and InitializeParameter more like UnmodeledDefinition. UnmodeledDefinition already produces a memory result. This change makes InitializeParameter and Uninitialized do the same. All of these represent memory locations whose contents were defined external to the function.

dave-bartolomeo · 2018-08-20T15:06:44Z

To add the additional background you asked for:
InitializeParameter represents the initialization of the parameter with the argument value, which happens at the call site. The IR-based dataflow library will connect interprocedural flow between the definition of the argument in the caller and the corresponding InitializeParameter result in the callee.
Uninitialized is used to provide a definition for a local variable that does not have an initializer (implicit or explicit).

…ults The IR avoids having non-trivially-copyable and non-trivially-assignable types in register results, because objects of those types need to exist at a particular memory location. The `InitializeParameter` and `Uninitialized` instructions were violating this restriction because they returned register results, which were then stored into the destination location via a `Store`. This change makes those two instructions take the destination address as an operand, and return a memory result representing the (un-)initialized memory, removing the need for a separate `Store` instruction.

jbj · 2018-08-21T07:14:59Z

Thanks for the clarification. I think my confusion is over whether I should conceptually think of these instructions as storing an unknown value into the variable or as terminators of the loose source ends in the SSA graph. I agree that this change brings them closer to the latter.

Looking at the test output in its new and helpful format (:heart: #65), a function with a parameter x changes from this:

#   50|     r0_2(int)          = InitializeParameter[x] : 
#   50|     r0_3(glval<int>)   = VariableAddress[x]     : 
#   50|     m0_4(int)          = Store                  : r0_3, r0_2

to this:

#   50|     r0_2(glval<int>)   = VariableAddress[x]     : 
#   50|     m0_3(int)          = InitializeParameter[x] : r0_2

That's an improvement, but I don't understand why x needs to be mentioned twice. If I shouldn't think of InitializeParameter as storing something at an address, maybe we can simplify it even further to this:

#   50|     m0_3(int)          = InitializeParameter[x] :

dave-bartolomeo · 2018-08-21T16:29:17Z

In order to do alias analysis and build SSA, we need to know what memory location (or locations) may be defined by the result of each instruction. There are currently four kinds of memory access that alias analysis knows about (from "MemoryAccessKind.qll": IndirectMemoryAccess, PhiMemoryAccess, EscapedMemoryAccess, and UnmodeledMemoryAccess. IndirectMemoryAccess is used wherever possible; the other three are more-or-less unavoidable special cases. IndirectMemoryAccess requires one of the operands of the instruction to be the address of the memory location being referenced.

Merging the VariableAddress+InitializeParameter pair into a single InitializeParameter means that we'd have to add another special case to the alias analysis, because we would no longer have the address as an operand. Keeping it as an indirect access is consistent with our existing approach of handling all variable accesses as indirect. If we ever decided that accessing a local variable should be a single LoadVariable or StoreVariable instruction, rather than a VariableAddress+Load|Store combo, we would want to remove the VariableAddress from the InitializeParameter sequence as well for consistency.

If we wanted to only mention x once, we would either have to leave it off of the InitializeParameter, or leave it off of the VariableAddress. Leaving it off the VariableAddress would be inconsistent with other usage of VariableAddress. Leaving it off of InitializeParameter would be OK. However, any interprocedural data flow, and probably some intraprocedural stuff too, would want to easily map from the Parameter to the corresponding InitializeParameter that defines its value, so we'd want a getParameter() predicate on it anyway.

I'd like to reconsider the whole "all memory accesses are indirect" decision after we've written one or two real queries on the IR. My original theory was that making everything indirect would mean fewer cases to handle in anything that used the IR. If it turns out that our real usage of the IR doesn't need to make that distinction, and only the guts of alias analysis and SSA construction care, then I'd probably be OK with introducing direct loads and stores of variables and fields to make the IR a bit smaller.

jbj

Thanks for the explanation. It sounds right that we should wait and see how these memory access kinds work out in practice.

CFG improvements

…ion-repeats Don't abort external class extraction after first duplicate

feat(queries): Improve Output Clobbering query

dave-bartolomeo added C++ WIP This is a work-in-progress, do not merge yet! labels Aug 18, 2018

dave-bartolomeo assigned jbj Aug 18, 2018

jbj reviewed Aug 20, 2018

View reviewed changes

dave-bartolomeo force-pushed the dave/InitMemory branch from 56db3c3 to f2053c4 Compare August 20, 2018 16:23

dave-bartolomeo removed the WIP This is a work-in-progress, do not merge yet! label Aug 20, 2018

dave-bartolomeo mentioned this pull request Aug 20, 2018

C++: IR generation for new and new[] #82

Merged

jbj approved these changes Aug 21, 2018

View reviewed changes

jbj merged commit 2481bc7 into github:master Aug 21, 2018

dave-bartolomeo deleted the dave/InitMemory branch September 5, 2018 18:48

aibaars added a commit that referenced this pull request Oct 14, 2021

Merge pull request #72 from github/aibaars/fix-cfg

a15a066

CFG improvements

smowton pushed a commit to smowton/codeql that referenced this pull request Dec 6, 2021

Merge pull request github#72 from github/smowton/fix/external-extract…

ee10428

…ion-repeats Don't abort external class extraction after first duplicate

dbartol pushed a commit that referenced this pull request Dec 18, 2024

Merge pull request #72 from github/query/output_clobbering

8560772

feat(queries): Improve Output Clobbering query

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Make `InitializeParameter` and `Uninitialized` return memory results #72

C++: Make `InitializeParameter` and `Uninitialized` return memory results #72

Uh oh!

dave-bartolomeo commented Aug 18, 2018

Uh oh!

jbj commented Aug 20, 2018

Uh oh!

jbj left a comment

Uh oh!

dave-bartolomeo commented Aug 20, 2018

Uh oh!

dave-bartolomeo commented Aug 20, 2018

Uh oh!

jbj commented Aug 21, 2018

Uh oh!

dave-bartolomeo commented Aug 21, 2018

Uh oh!

jbj left a comment

Uh oh!

Uh oh!

C++: Make InitializeParameter and Uninitialized return memory results #72

C++: Make InitializeParameter and Uninitialized return memory results #72

Uh oh!

Conversation

dave-bartolomeo commented Aug 18, 2018

Uh oh!

jbj commented Aug 20, 2018

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

dave-bartolomeo commented Aug 20, 2018

Uh oh!

dave-bartolomeo commented Aug 20, 2018

Uh oh!

jbj commented Aug 21, 2018

Uh oh!

dave-bartolomeo commented Aug 21, 2018

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

C++: Make `InitializeParameter` and `Uninitialized` return memory results #72

C++: Make `InitializeParameter` and `Uninitialized` return memory results #72