| 
 | 1 | +- Feature Name: guaranteed_slice_repr  | 
 | 2 | +- Start Date: 2025-02-18  | 
 | 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)  | 
 | 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)  | 
 | 5 | + | 
 | 6 | +# Summary  | 
 | 7 | +[summary]: #summary  | 
 | 8 | + | 
 | 9 | +This RFC guarantees the in-memory representation of slice and str references.  | 
 | 10 | +Specifically, `&[T]` is guaranteed to have the same layout as:  | 
 | 11 | + | 
 | 12 | +```rust  | 
 | 13 | +#[repr(C)]  | 
 | 14 | +struct Slice<T> {  | 
 | 15 | +    data: *const T,  | 
 | 16 | +    len: usize,  | 
 | 17 | +}  | 
 | 18 | +```  | 
 | 19 | + | 
 | 20 | +The layout of `&str` is the same as that of `&[u8]`, and the layout of  | 
 | 21 | +`&mut str` is the same as that of `&mut [u8]`.  | 
 | 22 | + | 
 | 23 | +# Motivation  | 
 | 24 | +[motivation]: #motivation  | 
 | 25 | + | 
 | 26 | +This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing  | 
 | 27 | +slices and to declare slice fields or locals.  | 
 | 28 | + | 
 | 29 | +For example, guaranteeing the representation of slices allows non-Rust code to  | 
 | 30 | +read from the `data` or `len` fields of `string` in the type below without  | 
 | 31 | +intermediate FFI calls into Rust:  | 
 | 32 | + | 
 | 33 | +```rust  | 
 | 34 | +#[repr(C)]  | 
 | 35 | +struct HasString {  | 
 | 36 | +    string: &'static str,  | 
 | 37 | +}  | 
 | 38 | +```  | 
 | 39 | + | 
 | 40 | +Note: prior to this RFC, the type above is not even properly `repr(C)` since the  | 
 | 41 | +size and alignment of slices were not guaranteed. However, the Rust compiler  | 
 | 42 | +accepts `repr(C)` declaration above without warning.  | 
 | 43 | + | 
 | 44 | +# Guide-level explanation  | 
 | 45 | +[guide-level-explanation]: #guide-level-explanation  | 
 | 46 | + | 
 | 47 | +Slices are represented with a pointer and length pair. Their in-memory layout is  | 
 | 48 | +the same as a `#[repr(C)]` struct like the following:  | 
 | 49 | + | 
 | 50 | +```rust  | 
 | 51 | +#[repr(C)]  | 
 | 52 | +struct Slice<T> {  | 
 | 53 | +    data: *const T,  | 
 | 54 | +    len: usize,  | 
 | 55 | +}  | 
 | 56 | +```  | 
 | 57 | + | 
 | 58 | +The precise ABI of slices is not guaranteed, so `&[T]` may not be passed by-value  | 
 | 59 | +or returned by-value from an `extern "C" fn`.  | 
 | 60 | + | 
 | 61 | +The validity requirements for the in-memory slice representation are the same  | 
 | 62 | +as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html).  | 
 | 63 | +Namely:  | 
 | 64 | + | 
 | 65 | +* `data` must be non-null, [valid] for reads for `len * mem::size_of::<T>()` many bytes,  | 
 | 66 | +  and it must be properly aligned. This means in particular:  | 
 | 67 | + | 
 | 68 | +    * The entire memory range of this slice must be contained within a single allocated object!  | 
 | 69 | +      Slices can never span across multiple allocated objects. See [below](#incorrect-usage)  | 
 | 70 | +      for an example incorrectly not taking this into account.  | 
 | 71 | +    * `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One  | 
 | 72 | +      reason for this is that enum layout optimizations may rely on references  | 
 | 73 | +      (including slices of any length) being aligned and non-null to distinguish  | 
 | 74 | +      them from other data. You can obtain a pointer that is usable as `data`  | 
 | 75 | +      for zero-length slices using [`NonNull::dangling()`].  | 
 | 76 | + | 
 | 77 | +* `data` must point to `len` consecutive properly initialized values of type `T`.  | 
 | 78 | + | 
 | 79 | +* The memory referenced by the returned slice must not be mutated for the duration  | 
 | 80 | +  of lifetime `'a`, except inside an `UnsafeCell`.  | 
 | 81 | + | 
 | 82 | +* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`,  | 
 | 83 | +  and adding that size to `data` must not "wrap around" the address space.  | 
 | 84 | +  See the safety documentation of [`pointer::offset`].  | 
 | 85 | + | 
 | 86 | +# Drawbacks  | 
 | 87 | +[drawbacks]: #drawbacks  | 
 | 88 | + | 
 | 89 | +## Zero-sized types  | 
 | 90 | + | 
 | 91 | +One could imagine representing `&[T]` as only `len` for zero-sized `T`.  | 
 | 92 | +This proposal would preclude that choice in favor of a standard representation  | 
 | 93 | +for slices regardless of the underlying type.  | 
 | 94 | + | 
 | 95 | +Alternatively, we could choose to guarantee that the data pointer is present if  | 
 | 96 | +and only if `size_of::<T> != 0`. This has the possibility of breaking exising  | 
 | 97 | +code which smuggles pointers through the `data` value in `from_raw_parts` /  | 
 | 98 | +`into_raw_parts`.  | 
 | 99 | + | 
 | 100 | +## Uninhabited types  | 
 | 101 | + | 
 | 102 | +Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]`  | 
 | 103 | +types into a ZST since the slice can only ever be length zero. This may offer  | 
 | 104 | +modest performance benefits for highly generic code which happens to create  | 
 | 105 | +empty slices of uninhabited types, but this is unlikely to be worth the  | 
 | 106 | +cost of maintaining a special case.  | 
 | 107 | + | 
 | 108 | +## Compatibility with C++ `std::span`  | 
 | 109 | + | 
 | 110 | +The largest drawback of this layout and set of validity requirements is that it  | 
 | 111 | +may preclude `&[T]` from being representationally equivalent to C++'s  | 
 | 112 | +`std::span<T, std::dynamic_extent>`.  | 
 | 113 | + | 
 | 114 | +* `std::span` does not currently guarantee its layout. In practice, pointer + length  | 
 | 115 | +  is the common representation. This is even observable using `is_layout_compatible`  | 
 | 116 | +  [on MSVC](https://godbolt.org/z/Y8ardrshY), though not  | 
 | 117 | +  [on GCC](https://godbolt.org/z/s4v4xehnG) nor  | 
 | 118 | +  [on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a  | 
 | 119 | +  different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy  | 
 | 120 | +  requirements) could preclude matching the layout with `&[T]`.  | 
 | 121 | + | 
 | 122 | +* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One  | 
 | 123 | +  possibile workaround for this would be to guarantee that `Option<&[T]>` uses  | 
 | 124 | +  `data: std::ptr::null()` to represent the `None` case, making `std::span<T>`  | 
 | 125 | +  equivalent to `Option<&[T]>` for non-zero-sized types.  | 
 | 126 | + | 
 | 127 | +* Rust uses a dangling pointer in the representation of zero-length slices.  | 
 | 128 | +  It's unclear whether C++ guarantees that a dangling pointer will remain  | 
 | 129 | +  unchanged when passed through `std::span`. However, it does support  | 
 | 130 | +  dangling pointers during regular construction via the use of  | 
 | 131 | +  [`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span)  | 
 | 132 | +  in the iterator constructors.  | 
 | 133 | + | 
 | 134 | +Note that C++ also does not support zero-sized types, so there is no naiive way  | 
 | 135 | +to represent types like `std::span<SomeZeroSizedRustType>`.  | 
 | 136 | + | 
 | 137 | +## Flexibility  | 
 | 138 | + | 
 | 139 | +Additionally, guaranteeing layout of Rust-native types limits the compiler's and  | 
 | 140 | +standard library's ability to change and take advantage of new optimization  | 
 | 141 | +opportunities.  | 
 | 142 | + | 
 | 143 | +# Rationale and alternatives  | 
 | 144 | +[rationale-and-alternatives]: #rationale-and-alternatives  | 
 | 145 | + | 
 | 146 | +* We could avoid committing to a particular representation for slices.  | 
 | 147 | + | 
 | 148 | +* We could try to guarantee layout compatibility with a particular target's  | 
 | 149 | +  `std::span` representation, though without standardization this may be  | 
 | 150 | +  impossible. Multiple different C++ stdlib implementations may be used on  | 
 | 151 | +  the same platform and could potentially have different span representations.  | 
 | 152 | +  In practice, current span representations also use ptr+len pairs.  | 
 | 153 | + | 
 | 154 | +* We could avoid storing a data pointer for zero-sized types. This would result  | 
 | 155 | +  in a more compact representation but would mean that the representation of  | 
 | 156 | +  `&[T]` is dependent on the type of `T`.  | 
 | 157 | + | 
 | 158 | +# Prior art  | 
 | 159 | +[prior-art]: #prior-art  | 
 | 160 | + | 
 | 161 | +The layout in this RFC is already documented in  | 
 | 162 | +[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html)  | 
 | 163 | + | 
 | 164 | +# Unresolved questions  | 
 | 165 | +[unresolved-questions]: #unresolved-questions  | 
 | 166 | + | 
 | 167 | +* Should `&[T]` include a pointer when `T` is zero-sized?  | 
 | 168 | + | 
 | 169 | +# Future possibilities  | 
 | 170 | +[future-possibilities]: #future-possibilities  | 
 | 171 | + | 
 | 172 | +* Consider defining a separate Rust type which is repr-equivalent to the platform's  | 
 | 173 | +  native `std::span<T, std::dynamic_extent>` to allow for easier  | 
 | 174 | +  interoperability with C++ APIs. Unfortunately, the C++ standard does not  | 
 | 175 | +  guarantee the layout of `std::span` (though the representation may be known  | 
 | 176 | +  and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC).  | 
 | 177 | +  Zero-sized types would also not be supported with a naiive implementation of  | 
 | 178 | +  such a type.  | 
0 commit comments