Skip to content

Proposal: add @memCast for a class of safe pointer casts #23935

Open
@mlugg

Description

@mlugg

Background

After the merge of #22706 and #23919, you can now @ptrCast to a slice where the operand is any slice or single-item pointer. The idea, broadly speaking, is that if we know the number of bytes the operand points to, we can make our result point to the same number of bytes.

This is incredibly convenient, and simplifies a lot of things; for instance, when you want to get the plain byte slice ([]u8) underlying a slice or a single value, you can do that with a simple @ptrCast.

fn write(buf: []const u8) !void { ... }

const x: u32 = someStuff();
try write(@ptrCast(&x));

const vals: []const u32 = readMoreStuff();
try write(@ptrCast(vals));

However, there's a slight problem here. As the author of this code, we know these @ptrCast calls are safe, because the destination type is a []const u8, so the number of elements is computed (by the compiler in the first case, or at runtime in the second case). However, if the destination type were something else, this would be unsafe. For instance, if write took a *u64, the first call would be unsafe, because when write loaded from buf, it would read more memory than x owns. To a reader, it's not immediately clear that this @ptrCast is safe without checking the signature of write; and such a bug could actually be introduced in a refactor if the signature of write changed.

Let's look at another common case. A useful pattern in Zig is to provide type-safe wrappers around integers using enums. One nice addition here is that when using certain Data-Oriented Design patterns, these enum values can be directly packed into a big array of untyped "miscellaneous data" (we tend to call this "extra data", and it's a pattern used in std.zig.Ast, std.zig.Zir, and many more places throughout the compiler); which has a whole bunch of benefits. However, one of the annoying things here is that if you store a sequence of enum values in this array, it's not trivial to get at them; slicing the array gives you an untyped []u32, so you then need to @ptrCast to a []Air.Inst.Index to get at them. This pointer cast, again, looks quite dangerous to the untrained eye -- but it's absolutely not! We're reinterpreting memory in a well-defined way. The @ptrCast is more of an alarm bell than we really need here -- but you can see why, because if the destination type were accidentally changed to something other than a slice (e.g. a *[100]TheEnum, or a [*]Something), the cast would suddenly be "unsafe" in the sense that the result might not be dereferenceable.

The common thread connecting the two cases above is that we don't need the "full power" of @ptrCast. It's true that we might change the pointer "size" (e.g. change from a single-item pointer to a slice) and the element type (e.g. u32 to u8), but what's important is that we are trying to safely reinterpret memory, in a way which is known to be well-defined from the type system alone. This turns out to be a particularly common operation in some cases (after all, we had std.mem.asBytes and std.mem.sliceAsBytes to do this before the new @ptrCast semantics!), so it would be nice if there was a safer way to express it.

Proposal

Introduce another "pointer cast"-style builtin:

@memCast(ptr: anytype) anytype

Like other pointer cast builtins, it infers its return type from the context's Result Type, and can be chained directly with other pointer cast builtins (e.g. @alignCast) to combine effects. However, this builtin is most likely to be used standalone.

The builtin acts as a variant of @ptrCast with the added constraint that the returned pointer (or slice) refers to the exact same amount of memory as the operand pointer (or slice). This means the operand type and result type must both be single-item pointers or slices (they cannot be many-item pointers or C pointers; also disallowed are pointers to anyopaque). If both are slices, there may be a runtime safety check (depending on @sizeOf the respective elements) to ensure that the element count divides neatly.

The builtin also requires that the result pointer type does not have an element type with an ill-defined layout. For instance, you cannot cast *align(@alignOf(S)) [@sizeOf(S)]u8 to *S with this builtin. The logic here is that such a cast is not "safe", in the sense that it would be Illegal Behavior to use the resulting pointer if the operand does not point to a valid S value. Given #2414, we could allow this cast and introduce a safety check when casting to a type with ill-defined layout, but it seems like the definition given here will be more useful in practice (since Illegal Behavior is kept to a minimum).

When combined, these constraints turn out to give quite nice guarantees! In particular, we have the following:

If the operand to @memCast is a dereferenceable pointer, and if @memCast does not itself hit Safety-Checked Illegal Behavior (due to an incompatible slice length), then it is guaranteed that the returned pointer is also dereferenceable. For slices, this applies to all in-bounds elements.

Okay, that's a bit wordy, because I was trying to be precise. Informally, the idea is: valid pointer in, valid pointer out. That pointer has just reinterpreted the existing memory in a definitely-legal way.

This proposal removes the ability for @ptrCast to ever return a slice; users who want that behavior should be using @memCast instead, because @ptrCast returning a slice always refers to the same number of bytes. So, @ptrCast must now return a non-slice pointer (single-item, many-item, or C). In other words, @ptrCast doesn't give you any safety guarantees in terms of the returned pointer being dereferenceable.

EDIT: this proposal also renames @ptrCast to @elemCast, to make its function clearer: it changes what a pointer "points to". Then, distinction between @memCast and @elemCast is that the former returns a pointer which refers to the same region of memory (hence "mem").

Sentinels

One unresolved issue with this proposal is how to handle sentinels. How should an operand type of [:0]u8 be handled? Is the sentinel considered a part of the length of memory being reinterpreted, or no?

On the one hand, it would be consistent with pointer casting today to not include the sentinel in the bytes being reinterpreted. We could allow keeping a sentinel which matched an input one (e.g. allow [:0]u8 to [:0]i8), but nothing more. That seems like the obvious solution at first glance.

However, there's a problem here! Consider now the type *[5:0]u8. Should the sentinel be included in the bytes being reinterpreted? Well, there are arguments both ways:

  • On the one hand, this type is usually considered to be a "more comptime-known" version of [:0]u8; so, it should inherit the behavior of that type, and not include the 0 sentinel in the "pointee bytes".
  • On the other hand, the pointee [5:0]u8 clearly has identical layout to [1][5:0]u8, and so reinterpreting their memory should behave the same; but *[1][5:0]u8 is pretty clearly 6 bytes (the sentinel definitely isn't "special" when you nest it in an aggregate in this way). If you're not convinced by the nested array, extern struct { arr: [5:0]u8 } might be more convincing.

So, I think you could reasonably expect either behavior here -- and getting this wrong could cause subtle bugs. Given that fact, I personally believe the best behavior is to disallow the operand from having a sentinel: the caller must either absorb the sentinel into the slice itself (related: #23023 which adds std.mem.absorbSentinel), or coerce the sentinel away (e.g. coerce [:0]u8 to []u8). For the avoidance of doubt, the exact operand types I propose disallowing are:

  • A slice with a sentinel (like [:s]T)
  • A single-item pointer to an array with a sentinel (like *[n:s]T)

I'm open to discussion on this point.

Examples

var val: u32 = undefined;
const ptr: *i32 = @memCast(&val);
const values: []u32 = getSomeData();
write(@memCast(values)); // where `write` takes a `[]const u8`
var buf: [10]u32 = undefined;
read(@memCast(&buf)); // where `read` takes a `[]u8`
// given:
//   const TypedIndex = enum(u32) { _ };
// we do this:
const type_erased: []u32 = getSomeData();
const typed: []TypedIndex = @memCast(type_erased);
// or, more simply:
const typed2: []TypedIndex = @memCast(getSomeData());

Metadata

Metadata

Assignees

No one assigned

    Labels

    acceptedThis proposal is planned.breakingImplementing this issue could cause existing code to no longer compile or have different behavior.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions