[LangRef] Mention allocation elision by nikic · Pull Request #177592 · llvm/llvm-project

nikic · 2026-01-23T14:13:28Z

allockind / alloc-family enable allocation elision, but this was not previously mentioned by LangRef.

Related discussion: https://discourse.llvm.org/t/rfc-clarifying-semantic-assumptions-for-custom-allocators/89469

I've documented this in terms of optimization, but if desired I could define this more operationally with non-determinism.

llvmbot · 2026-01-23T14:14:04Z

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

allockind / alloc-family enable allocation elision, but this was not previously mentioned by LangRef.

Related discussion: https://discourse.llvm.org/t/rfc-clarifying-semantic-assumptions-for-custom-allocators/89469

cc @RalfJung

I've documented this in terms of optimization, but if desired I could define this more operationally with non-determinism.

Full diff: https://github.com/llvm/llvm-project/pull/177592.diff

1 Files Affected:

(modified) llvm/docs/LangRef.rst (+9)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 103058d161f86..e77aab481c660 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -2077,6 +2077,15 @@ For example:
     The first three options are mutually exclusive, and the remaining options
     describe more details of how the function behaves. The remaining options
     are invalid for "free"-type functions.
+
+    Calls to functions annotated with ``allockind`` are subject to allocation
+    elision: Calls to allocator functions can be removed, and the allocation
+    served from a virtual allocator instead. Notably, this is allowed even if
+    the allocator calls have side-effects.
+
+    If multiple allocation functions operate on the same allocation (for
+    example, an "alloc" followed by "free"), allocation elision is only allowed
+    if all involved functions have the same ``"alloc-family"``.
 ``"alloc-variant-zeroed"="FUNCTION"``
     This attribute indicates that another function is equivalent to an allocator function,
     but returns zeroed memory. The function must have "zeroed" allocation behavior,

RalfJung · 2026-01-23T15:47:42Z

I've documented this in terms of optimization, but if desired I could define this more operationally with non-determinism.

It will not surprise you that I would prefer a more operational specification. ;)

RalfJung · 2026-01-23T15:48:26Z

+
+    If multiple allocation functions operate on the same allocation (for
+    example, an "alloc" followed by "free"), allocation elision is only allowed
+    if all involved functions have the same ``"alloc-family"``.


I also presume that elision will then replace either all or none of the operations?

So I initially added wording for that, but then realized that this is somewhat misleading. It's not really all or nothing, but rather "pairwise". For example, if we have an alloc + realloc + free, we can convert that into alloc + free (https://c.godbolt.org/z/Ghc6e9de8). I think we should also be allowed to convert alloc + realloc into alloc (though I'm not sure LLVM does that one right now).

So it's basically that we can elide either a leaked alloc, or an alloc + free pair where realloc is considered a combined alloc+free. For the sake of clarity, I ended up just writing out the possible cases.

Does that sound reasonable?

For example, if we have an alloc + realloc + free, we can convert that into alloc + free

That remaining alloc call then has the size of the realloc call, I presume? (Assuming that realloc is used to grow the allocation.)

That means the final actual call made to the allocator does not fully correspond to a call in the source program. We could try to come up with an operational approach that precisely captures this, but I wonder if it's not easier or even more accurate to say that LLVM is allowed to synthesize new calls to the allocator following the usual allocator contract?

This makes sense, with one caveat: is it correct that we can only turn alloc + realloc + free into alloc + free when the result of realloc is not used and its effects are not visible?

That remaining alloc call then has the size of the realloc call, I presume? (Assuming that realloc is used to grow the allocation.)

Heh, that depends on which pair we elide :) Either the alloc + realloc -> alloc, in which case the alloc has the size of the realloc, or realloc + free -> free, in which case it has the original size.

Of course, a separate sensible optimization in this space would be to shrink the allocation size if it's unused, though as far as I know LLVM doesn't do this.

That means the final actual call made to the allocator does not fully correspond to a call in the source program. We could try to come up with an operational approach that precisely captures this, but I wonder if it's not easier or even more accurate to say that LLVM is allowed to synthesize new calls to the allocator following the usual allocator contract?

Hm, I think that allowing creation of allocator calls out of thin air would be problematic for various reasons. For example, C malloc is not async signal safe, so synthesizing such a call inside a signal handler would be illegal.

Generally this seems a bit tough to specify operationally, due to the pairwise requirement. At the time of the allocator call, we don't yet know whether there is going to be a free call with matching alloc-family in the future or not.

So the case you have in mind is something like this?

define void @test() { %a = call ptr @alloc() allockind(alloc) "alloc-family"="foo" %c = load i1, ptr @was.allocated br i1 %c, label %if, label %else if: call ptr @dealloc(ptr allocptr %a) allockind(free) "alloc-family"="foo" ret void else: call ptr @inlined_dealloc(ptr allocptr %a) allockind(free) "alloc-family"="bar" ret void }

I don't think there needs to be a logical contradiction here. That depends on how you model it.

For example, if at the @alloc() call, we fork into two executions where:

In one, the allocator is called, @was.allocated is true, we go into the if branch, and call @dealloc(), everything is fine.

In the other, we replace with a virtual allocation, @was.allocated is false, we go into the else branch and call @inlined_dealloc(). As we replaced the original allocation in "alloc-family"="foo" and we now perform the dealloc call with a mismatching "alloc-family"="bar", we discard this execution as invalid.

The remaining execution is 1, so in this case we have one possible behavior, which is to not elide.

your proposal

FWIW, I'm not really proposing anything here, I'm just documenting existing behavior. I think it is better for us to document something, even if it is non-operational, than to just ignore it entirely.

I think the current wording here strikes the right balance between describing how this actually works, and how you would model this operationally at higher levels (like Miri) that do not have to deal with our peculiar partial inlining constraints.

One more thing that's probably worth mentioning here is that the modeling described in my previous comment, while angelic, is "less" angelic than the inttoptr exposed provenance synthesis case. In particular I think it does not have the problematic interaction with demonic choice (though correct me if I'm wrong on that).

If we take this variant of the test case:

define void @test() { %c = freeze i1 poison %a = call ptr @alloc() allockind(alloc) "alloc-family"="foo" br i1 %c, label %if, label %else if: call ptr @dealloc(ptr allocptr %a) allockind(free) "alloc-family"="foo" ret void else: call ptr @inlined_dealloc(ptr allocptr %a) allockind(free) "alloc-family"="bar" ret void }

Then moving the demonic freeze i1 poison below the allocation does not change the set of legal executions, unlike in the inttoptr case. That's because here there is a specific criterion for which executions are legal, rather than the criterion being "one of the executions that doesn't cause UB".

Edit: I think this specific example doesn't quite illustrate the point I'm trying to make...

we discard this execution as invalid.

This is known as "no-behavior" (NB) and it is comparabe to angelic choice. In particular, demonic choice + no-behavior can model prophecy variables, i.e., predicting what will happen in future parts of the program (we could have a variable %b where in all "valid" executions, the value of %b is equal to something that we read from stdin later during execution: take a guess, and later if the guess was wrong, trigger NB). NB makes the semantics non-executable. NB makes reordering difficult: "NB; UB" cannot be reordered to "UB; NB" (so, potentially-NB operations have to be treated like potentially-diverging operations).

NB would basically mean that the only way to know for sure that a trace is part of the possible set of behaviors of a program is to run the program all the way to completion -- if the program never terminates (which is expected for things like servers), you can never tell if any behavior you see is actually "real". Consider that there could have been side-effects between the alloc and the free and those can differ between the elided and non-elided case; we'd basically have to "take back" side-effects that already happened if the program later reaches NB.

Another odd point is that the else branch is dead code in the sense that no "valid" execution ever goes there, and yet it cannot be replaced by unreachable.

while angelic, is "less" angelic than the inttoptr exposed provenance synthesis case

Oh yeah, the angelic inttoptr model does not actually "work" as an executable model. I view it more as a guideline to compare the actual, to-be-developed model with. OTOH the inttoptr model at least only affects abstract state the program cannot directly observe (provenance) where the NB model for allocation could end up with a program that prints to stdout only to then go "jk that execution never happened".

FWIW, I'm not really proposing anything here, I'm just documenting existing behavior. I think it is better for us to document something, even if it is non-operational, than to just ignore it entirely.

I can agree with that. :) Seems fine to document something based on transformations for now (a strict improvement over the status quo), and open an issue to track that we don't really know what this means operationally and whether it is formally consistent with everything else.

Thanks, those are all good points. Do you have any references on the "no-behavior" concept? I'm pretty sure I've seen this before (maybe in the context of move elim?) but it's hard to find any references for NB and now it interacts with other semantics.

It has come up in a few papers, e.g. https://sf.snu.ac.kr/publications/ccr.pdf and https://iris-project.org/pdfs/2023-popl-dimsum.pdf. But I don't know a canonical citation.

NB is basically the natural interpretation of what it means to do demonic non-deterministic choice over the empty set of possible options. (This is dual to how UB is angelic choice over the empty set.) In the study of non-determinism in general (without giving it a demonic/angelic interpretation), the "empty set of choices" has existed since ~forever; for example, it is the typical way to model doomed branches of a backtracking search that is encoded via non-deterministic exploration.

nunoplopes · 2026-01-24T18:46:53Z

LGTM.

IamYJLee · 2026-01-26T00:09:13Z

LGTM.

@nikic
Thanks for incorporating my RFC into the LangRef.

RalfJung · 2026-02-10T18:03:00Z

+    * An "alloc" and "free" pair can be elided.
+    * A "realloc" and "free" pair can be converted into a "free" of the original
+      allocation.
+    * An "alloc" and "realloc" pair can be converted into an "alloc".


It's not always clear here where the conversion happens. For instance, when alloc+realloc are turned into alloc, is that put in the place of the original alloc (with the max of the sizes), or the original realloc (with some stack memory being used for the time between the original alloc and realloc)?

nunoplopes · 2026-02-10T19:51:58Z

I'm not sure what specification we should use in Alive2 though. Allocations functions can have side-effects and change errno. To allow deletion, we need to model all side-effects as being non-deterministic. @RalfJung does that sound right?

RalfJung · 2026-02-10T21:02:02Z

Yeah I would model a call to malloc (or other allocation functions that LLVM recognizes as such) as non-deterministically either invoking the underlying implementation or invoking some "built-in", "side-effect-free" allocator. (That's the "hidden" allocator I mentioned before.)

nunoplopes · 2026-02-10T21:10:27Z

sounds, thank you! 🙏

allockind / alloc-family enable allocation elision, but this was not previously mentioned by LangRef. Related discussion: https://discourse.llvm.org/t/rfc-clarifying-semantic-assumptions-for-custom-allocators/89469

nikic · 2026-02-11T08:42:39Z

I've added some explicit wording on non-determinism.

antoniofrighetto · 2026-02-11T11:59:05Z

+    served from a virtual allocator instead. Notably, this is allowed even if
+    the allocator calls have side-effects. In other words, for each allocation
+    there there is a non-deterministic choice between calling the allocator as
+    usual, or using a virtual, side-effect-free allocator instead.


Sorry, possibly missing some context, just wondering, am I understanding correctly that this, slightly rephrased, may imply that we substantially do not care whether there are side-effects or not for the purposes of above? That either a side-effect and a side-effect-free allocation call is fine?

Yes, in a specific sense. It is okay to elide all of the allocator side effects. But if the allocation is not elided, we also can't ignore the side effects.

RalfJung · 2026-02-19T11:39:44Z

+
+    Calls to functions annotated with ``allockind`` are subject to allocation
+    elision: Calls to allocator functions can be removed, and the allocation
+    served from a virtual allocator instead. Notably, this is allowed even if


Suggested change

served from a virtual allocator instead. Notably, this is allowed even if

served from a "virtual"/"built-in" allocator instead. Notably, this is allowed even if

I think scare quotes are appropriate here to indicate that this is an abstract concept, not a real allocator.

RalfJung

As mentioned in a comment already:

Seems fine to document something based on transformations for now (a strict improvement over the status quo), and open an issue to track that we don't really know what this means operationally and whether it is formally consistent with everything else.

nikic · 2026-03-02T11:29:22Z

I've filed #184102 to track this.

allockind / alloc-family enable allocation elision, but this was not previously mentioned by LangRef. Related discussion: https://discourse.llvm.org/t/rfc-clarifying-semantic-assumptions-for-custom-allocators/89469 The semantics here are specified in terms of allowed transforms. Making the semantics operational is tracked in llvm#184102.

nikic requested a review from nunoplopes January 23, 2026 14:13

llvmbot added the llvm:ir label Jan 23, 2026

RalfJung reviewed Jan 23, 2026

View reviewed changes

jyknight approved these changes Jan 26, 2026

View reviewed changes

RalfJung mentioned this pull request Feb 5, 2026

Extend state representation to track errno across control-flow AliveToolkit/alive2#1284

Draft

RalfJung reviewed Feb 10, 2026

View reviewed changes

nikic added 5 commits February 11, 2026 09:24

[LangRef] Mention allocation elision

b2f9fe0

allockind / alloc-family enable allocation elision, but this was not previously mentioned by LangRef. Related discussion: https://discourse.llvm.org/t/rfc-clarifying-semantic-assumptions-for-custom-allocators/89469

Mention that all or none have to be elided

170e43f

Explicitly clarify allowed transforms

c0f5a78

Clarify realloc interaction

989d9ea

Explicitly mention non-determinism

5a305f7

nikic force-pushed the langref-allocators branch from abceaf2 to 5a305f7 Compare February 11, 2026 08:35

antoniofrighetto reviewed Feb 11, 2026

View reviewed changes

Comment thread llvm/docs/LangRef.rst Outdated

typo

b493179

antoniofrighetto reviewed Feb 11, 2026

View reviewed changes

RalfJung reviewed Feb 19, 2026

View reviewed changes

scare quotes

c10dfd9

RalfJung approved these changes Mar 2, 2026

View reviewed changes

nikic mentioned this pull request Mar 2, 2026

Operational semantics for allocator elision #184102

Open

nikic merged commit 5d8c6c1 into llvm:main Mar 3, 2026
11 checks passed

nikic deleted the langref-allocators branch March 3, 2026 09:51

	served from a virtual allocator instead. Notably, this is allowed even if
	served from a "virtual"/"built-in" allocator instead. Notably, this is allowed even if

Conversation

nikic commented Jan 23, 2026

Uh oh!

llvmbot commented Jan 23, 2026

Uh oh!

RalfJung commented Jan 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IamYJLee Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nunoplopes commented Jan 24, 2026

Uh oh!

IamYJLee commented Jan 26, 2026

Uh oh!

RalfJung Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nunoplopes commented Feb 10, 2026

Uh oh!

RalfJung commented Feb 10, 2026

Uh oh!

nunoplopes commented Feb 10, 2026

Uh oh!

nikic commented Feb 11, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung left a comment

Choose a reason for hiding this comment

Uh oh!

nikic commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

IamYJLee Jan 26, 2026 •

edited

Loading

nikic Feb 26, 2026 •

edited

Loading

RalfJung Feb 27, 2026 •

edited

Loading

RalfJung Mar 2, 2026 •

edited

Loading

RalfJung Feb 10, 2026 •

edited

Loading