-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enum-backed address spaces #21870
Comments
As an embedded C engineer, I'm just as concerned about null pointer bugs as i am about bad pointers pointing to invalid memory locations, so if this allows pointer safety checks against the target memory map then i think it's a great idea 👍 |
I like the non-intrusive optimization and debugging potential of this general idea. What would the decision graph look like for using |
That could be a reasonable thing to do. Using u64 would make it a little bit closer to status quo. I think people might be surprised if
In many cases you need an aligned pointer in order to do anything. Consider all the functions in a given codebase that accept a It's the same reason you would use 2 bools vs a bitmask. Choose between tighter storage, or fewer instructions to load and store the value. In a sense, it's the same decision as choosing how much compression to use when storing data. Types have default alignment so that pointers to them can be used interchangeably and so that loads and stores generally correspond to a single machine instruction. |
This looks like a nice solution. One question: Who and how is decided what the default address space for data pointers and function pointers is? |
I think you're onto something here. I really like the idea of making One major concern I have here is hardware pointer tagging features like Arm's Top Byte Ignore, Intel's Linear Address Masking, AMD's Upper Address Ignore, etc. When in use, these features make it so that, in general, you can no longer assume that the upper bits of user-space pointers are unimportant and discardable. The trouble with these is that they're enabled dynamically by a syscall to Are we confident that we can implement sufficient safety checks for a user to be made aware that they need to define a custom generic address space when using hardware pointer tagging?
I'm currently very confused about whether, in general,
Is the implication that if a |
There are some interesting discussions going on, but I would also like to add a quick bikeshed. Rather than the pub const Generic = enum(u64) {
_ = 0x00007ff0_00000000...0x0000ffff_ffffffff,
}; And you can specify multiple valid ranges by writing pub const Something = enum(u64) {
null = 0xffffffff_ffffffff,
_ = 0x00000000_00000000...0x00000000_ffffffff,
_ = 0x10000000_00000000...0x10000000_ffffffff,
}; EDIT: to be clear, |
Semantic analysis currently already has a notion of "default address space in a particular context". The namespace returned by the switches in the original proposal could be required to return a set of common ones which the compiler can then use on a particular location. For example pub const AddressSpace = switch (target.cpu.arch) {
.x86_64 => switch (target.os.tag) {
.linux => struct {
// Default used for variables
pub const Data = Generic;
// Default used for constants
pub const Constant = Generic;
// Default used for functions
pub const Code = Generic;
// Architecture specific...
pub const Generic = enum(u64) { ... };
},
// ...
},
.amdgcn => struct {
// Variables are instance-local by default.
pub const Data = Private;
pub const Constant = ...;
// We can provide a nicer error message than "expected type 'builtin.AddressSpace', found '@TypeOf(.enum_literal)'"
pub const Code = @compileError("this architecture doesn't support function pointers");
// Architecture specific...
pub const Flat = enum(u40) { ... };
pub const Private = ...;
// ...
},
.avr => struct {
pub const Data = Ram;
pub const Constant = Flash;
pub const Code = Flash;
// ...
},
// ...
}; I think |
I think this is a decent proposal in itself. I wonder if there is some more general synergy here with ranged ints: For example
|
Problem Statement
The address zero (
0
) is sometimes mapped. This is whyallowzero
exists. However it is also the case that other parts of the address range in any given space are unmapped. In such case, those nonzero unmapped values should be candidates for being the null value, and they should be available for packing data into pointers in a type-safe manner.As an example, on amdgcn it would be ideal for an optional pointer have the same size as a non-optional pointer while using the value
0xFFFFFFFF
for null.Furthermore, it would be ideal for pointers to take up only the correct number of bits in a packed struct and allow bit packing when used as peers of align(0) fields in auto-layout structs.
Proposal
This proposal depends on new enum syntax for marking ranges of integer values illegal.
The x86_64-linux address space would be defined like this:
On x86_64-freestanding it might instead be defined like this, since the pages at the beginning are mapped, but the hardware is still limited to 48 bits:
allowzero
is no longer needed because it is communicated by the valid range of the enum.By making value ranges unreachable, it means the language is free to pack data into those unused integer values when constructing types such as optionals or error unions. It also means that
@ptrFromInt
gains an additional safety check, ensuring the value is in-range. Notice that0xaaaaaaaaaaaaaaaa
is outside the valid pointer range on this very common triple.usize
would be redefined as the tag type of the default address space. Pointers carry address space data, so by indexing into a slice in a given address space, the result location type of the element index (i.e.ptr[i]
) would be the tag type of the respective address space.This is almost sufficient to address the problem statement, however, we need well-defined memory layout for pointers, including null pointers. So, an additional part of this proposal is recognizing the tag
null
in an address space enum:This also opens the door to automated bit-packing for auto-layout structs when pointers along with sibling fields use align(0):
In this case, using the above x86_64-linux address space definition, it would be legal, but not required, for a zig compiler to lower the struct with a memory layout that uses 8 bytes, packing the booleans into the unused integer value ranges. It also provides opportunity for the compiler to strategize around ensuring that the 0xAA bit pattern is unambiguously detectable as an invalid state by safety checks.
Each target would have a default pointer address space. When used in pointer syntax, it would be equivalent to omitting it. i.e. for x86_64-linux,
*addrspace(Generic) T == *T
.Implementation Details
std.builtin.AddressSpace
would change from an enum to something like this:A Zig compiler would have hard-coded awareness of the address space names within this namespace and how to map them to e.g., an LLVM address space number.
The address spaces would be user overridable in the root source file. This would be especially useful for a freestanding target.
The text was updated successfully, but these errors were encountered: