Skip to content

Can we make AllocId actually uniquely "identify" an allocation? #128775

Open

Description

The way AllocId works right now is super counter-intuitive: they are entirely a per-crate identifier, and when loading the metadata of another crate, we generate a fresh "local AllocId" for each ID we encounter in the other crate and re-map everything we load. (At least I think that's what happens, @oli-obk please correct me if I am wrong.)

Unfortunately this means that a ConstValue that holds a pointer isn't actually a "value" in the usual sense of the world: if the value is computed in one crate and then used in another crate, its AllocId gets re-mapped. During code generation, when we encounter such an AllocId, we just always generate a local copy of that allocation and point to there. This means the "same" ConstValue, codegen'd in different crates, can result in observably different values! That's extremely confusing for users and compiler devs alike (#84581, #123670). In many cases this will get de-duplicated later but we can't always rely on that.

So... I'd like to consider switching how AllocIds work, with the goal of making ConstValue actually be a value. This will make #121644 unnecessary: we can just evaluate the static once, store its final value, and use that in all crates without running into issues like this. This requires not re-mapping AllocId, and instead when crate B receives a ConstValue from crate A it should be able to point to the allocation already generates by crate A. Unfortunately I am largely unfamiliar with how we manage "cross-crate identity of objects" so I don't know what the possible options here look like.

Some first rough ideas that popped into my head:

  1. We could pick AllocId uniformly at random and fail when loading two crates that happened to get the same ID. That's fundamentally non-reproducible so either we have to make sure these AllocId don't matter for anything except the question whether they are equal or not (that seems hard to enforce) or we have to pick some deterministic scheme based on this. Also, curing codegen, how would we know whether the allocation has been previously already generated or whether it is our job to generate it? We'd have to keep track of which AllocId are "local", or so.
  2. Use the first 32bits of AllocId to store the CrateNum of the crate that generated the allocation, and the rest to store some sort of per-crate allocation ID. I guess this still has to be remapped on load, but then during codegen when we encounter another crate's allocation we'd import it instead of generating a copy.
  3. When interning an allocation, we always generate something akin to a DefId. AllocId outside of an interpreter session basically becomes DefId (or a new kind of ID with the same properties). We don't even need an alloc_map in tcx any more, we just have a new kind of "definition" that represents "global allocations" and a query taking a DefId and returning a GlobalAlloc. (That query would mostly, if not exclusively, be computed by feeding, maybe except for statics that it could evaluate directly. I guess if it is exclusively feeding it doesn't make much sense to make this a query rather than a normal hash map.)
    Inside the interpreter, we certainly don't want to generate a DefId for each allocation. I can imagine two schemes here:
    1. Reserve a CrateNum value to indicate "local interpreter instance" so that we can just make up DefIndexes locally while the interpreter runs and still know which allocations need to be looked up where. During interning, we generate proper DefId inside LOCAL_CRATE and remap everything we encounter.
    2. Still use the same AllocId type that we do now, but make it valid only inside an interpreter instance, and track a per-interpreter-instance mapping between global DefId and local AllocId. Unfortunately this means extra work whenever we "import" a global allocation into an interpreter instance as we need to apply that mapping (and then map back during interning).

The last two schemes (2 and 3) seem fairly similar, given that DefId is just CrateNum + per-crate DefIndex. The only difference is whether there's a single shared "index" namespace for everything or a dedicated namespace for allocations. My main concern with the single shared namespace is that we'd quite like to use some bits for other purposes inside AllocId: we want it to have a niche. We also probably need to distinguish allocations inside the current interpreter instance from "global allocations" (and do a remapping during interning), and at least inside an interpreter instance we are using some bits to track whether the pointer is derived from a shared reference and whether that shared reference had interior mutability. Option 2 could possibly entirely avoid doing any kind of mapping during interning, if we think that 2^30 total allocations are enough for every crate -- though I assume interning is already quite expensive so maybe it's not worth optimizing for that. It does seem worth optimizing for "no remapping when accessing previously interned global allocations", which excludes 3ii (which might otherwise be my favorite as it keeps everything fairly clear).

@oli-obk @rust-lang/wg-const-eval any thoughts?
@compiler-errors @wesleywiser I know you're not const-eval experts but maybe you know the query system sufficiently well to provide some helpful input. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    C-discussionCategory: Discussion or questions that doesn't represent real issues.Category: Discussion or questions that doesn't represent real issues.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions