| 
 | 1 | +- Feature Name: guaranteed_slice_repr  | 
 | 2 | +- Start Date: 2025-02-18  | 
 | 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)  | 
 | 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)  | 
 | 5 | + | 
 | 6 | +# Summary  | 
 | 7 | +[summary]: #summary  | 
 | 8 | + | 
 | 9 | +This RFC guarantees the in-memory representation of slice and str references.  | 
 | 10 | +Specifically, `&[T]` is guaranteed to have the same layout as:  | 
 | 11 | + | 
 | 12 | +```rust  | 
 | 13 | +#[repr(C)]  | 
 | 14 | +struct Slice<T> {  | 
 | 15 | +    data: *const T,  | 
 | 16 | +    len: usize,  | 
 | 17 | +}  | 
 | 18 | +```  | 
 | 19 | + | 
 | 20 | +The layout of `&str` is the same as that of `&[u8]`, and the layout of  | 
 | 21 | +`&mut str` is the same as that of `&mut [u8]`.  | 
 | 22 | + | 
 | 23 | +# Motivation  | 
 | 24 | +[motivation]: #motivation  | 
 | 25 | + | 
 | 26 | +This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing  | 
 | 27 | +slices and to declare slice fields or locals.  | 
 | 28 | + | 
 | 29 | +For example, guaranteeing the representation of slices allows non-Rust code to  | 
 | 30 | +read from the `data` or `len` fields of `string` in the type below without  | 
 | 31 | +intermediate FFI calls into Rust:  | 
 | 32 | + | 
 | 33 | +```rust  | 
 | 34 | +#[repr(C)]  | 
 | 35 | +struct HasString {  | 
 | 36 | +    string: &'static str,  | 
 | 37 | +}  | 
 | 38 | +```  | 
 | 39 | + | 
 | 40 | +Note: prior to this RFC, the type above is not even properly `repr(C)` since the  | 
 | 41 | +size and alignment of slices were not guaranteed. However, the Rust compiler  | 
 | 42 | +accepts `repr(C)` declaration above without warning.  | 
 | 43 | + | 
 | 44 | +# Guide-level explanation  | 
 | 45 | +[guide-level-explanation]: #guide-level-explanation  | 
 | 46 | + | 
 | 47 | +Slices are represented with a pointer and length pair. Their in-memory layout is  | 
 | 48 | +the same as a `#[repr(C)]` struct like the following:  | 
 | 49 | + | 
 | 50 | +```rust  | 
 | 51 | +#[repr(C)]  | 
 | 52 | +struct Slice<T> {  | 
 | 53 | +    data: *const T,  | 
 | 54 | +    len: usize,  | 
 | 55 | +}  | 
 | 56 | +```  | 
 | 57 | + | 
 | 58 | +The validity requirements for the in-memory type are the same as [those  | 
 | 59 | +documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html).  | 
 | 60 | +Namely:  | 
 | 61 | + | 
 | 62 | +* `data` must be non-null, [valid] for reads for `len * mem::size_of::<T>()` many bytes,  | 
 | 63 | +  and it must be properly aligned. This means in particular:  | 
 | 64 | + | 
 | 65 | +    * The entire memory range of this slice must be contained within a single allocated object!  | 
 | 66 | +      Slices can never span across multiple allocated objects. See [below](#incorrect-usage)  | 
 | 67 | +      for an example incorrectly not taking this into account.  | 
 | 68 | +    * `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One  | 
 | 69 | +      reason for this is that enum layout optimizations may rely on references  | 
 | 70 | +      (including slices of any length) being aligned and non-null to distinguish  | 
 | 71 | +      them from other data. You can obtain a pointer that is usable as `data`  | 
 | 72 | +      for zero-length slices using [`NonNull::dangling()`].  | 
 | 73 | + | 
 | 74 | +* `data` must point to `len` consecutive properly initialized values of type `T`.  | 
 | 75 | + | 
 | 76 | +* The memory referenced by the returned slice must not be mutated for the duration  | 
 | 77 | +  of lifetime `'a`, except inside an `UnsafeCell`.  | 
 | 78 | + | 
 | 79 | +* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`,  | 
 | 80 | +  and adding that size to `data` must not "wrap around" the address space.  | 
 | 81 | +  See the safety documentation of [`pointer::offset`].  | 
 | 82 | + | 
 | 83 | +# Drawbacks  | 
 | 84 | +[drawbacks]: #drawbacks  | 
 | 85 | + | 
 | 86 | +## Zero-sized types  | 
 | 87 | + | 
 | 88 | +One could imagine representing `&[T]` as only `len` for zero-sized `T`.  | 
 | 89 | +This proposal would preclude that choice in favor of a standard representation  | 
 | 90 | +for slices regardless of the underlying type.  | 
 | 91 | + | 
 | 92 | +Alternatively, we could choose to guarantee that the data pointer is present if  | 
 | 93 | +and only if `size_of::<T> != 0`. This has the possibility of breaking exising  | 
 | 94 | +code which smuggles pointers through the `data` value in `from_raw_parts` /  | 
 | 95 | +`into_raw_parts`.  | 
 | 96 | + | 
 | 97 | +## Compatibility with C++ `std::span`  | 
 | 98 | + | 
 | 99 | +The largest drawback of this layout and set of validity requirements is that it  | 
 | 100 | +may preclude `&[T]` from being representationally equivalent to C++'s  | 
 | 101 | +`std::span<T, std::dynamic_extent>`.  | 
 | 102 | + | 
 | 103 | +* `std::span` does not currently guarantee its layout. In practice, pointer + length  | 
 | 104 | +  is the common representation. This is even observable using `is_layout_compatible`  | 
 | 105 | +  [on MSVC](https://godbolt.org/z/Y8ardrshY), though not  | 
 | 106 | +  [on GCC](https://godbolt.org/z/s4v4xehnG) nor  | 
 | 107 | +  [on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a  | 
 | 108 | +  different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy  | 
 | 109 | +  requirements) could preclude matching the layout with `&[T]`.  | 
 | 110 | + | 
 | 111 | +* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One  | 
 | 112 | +  possibile workaround for this would be to guarantee that `Option<&[T]>` uses  | 
 | 113 | +  `data: std::ptr::null()` to represent the `None` case, making `std::span<T>`  | 
 | 114 | +  equivalent to `Option<&[T]>` for non-zero-sized types.  | 
 | 115 | + | 
 | 116 | +* Rust uses a dangling pointer in the representation of zero-length slices.  | 
 | 117 | +  It's unclear whether   | 
 | 118 | + | 
 | 119 | +Note that C++ also does not support zero-sized types, so there is no naiive way  | 
 | 120 | +to represent types like `std::span<SomeZeroSizedRustType>`.  | 
 | 121 | + | 
 | 122 | +## Flexibility  | 
 | 123 | + | 
 | 124 | +Additionally, guaranteeing layout of Rust-native types limits the compiler's and  | 
 | 125 | +standard library's ability to change and take advantage of new optimization  | 
 | 126 | +opportunities.  | 
 | 127 | + | 
 | 128 | +# Rationale and alternatives  | 
 | 129 | +[rationale-and-alternatives]: #rationale-and-alternatives  | 
 | 130 | + | 
 | 131 | +* We could avoid committing to a particular representation for slices.  | 
 | 132 | + | 
 | 133 | +* We could try to guarantee layout compatibility with a particular target's  | 
 | 134 | +  `std::span` representation, though without standardization this may be  | 
 | 135 | +  impossible. Multiple different C++ stdlib implementations may be used on  | 
 | 136 | +  the same platform and could potentially have different span representations.  | 
 | 137 | +  In practice, current span representations also use ptr+len pairs.  | 
 | 138 | + | 
 | 139 | +* We could avoid storing a data pointer for zero-sized types. This would result  | 
 | 140 | +  in a more compact representation but would mean that the representation of  | 
 | 141 | +  `&[T]` is dependent on the type of `T`.  | 
 | 142 | + | 
 | 143 | +# Prior art  | 
 | 144 | +[prior-art]: #prior-art  | 
 | 145 | + | 
 | 146 | +The layout in this RFC is already documented in  | 
 | 147 | +[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html)  | 
 | 148 | + | 
 | 149 | +# Unresolved questions  | 
 | 150 | +[unresolved-questions]: #unresolved-questions  | 
 | 151 | + | 
 | 152 | +* Should `&[T]` include a pointer when `T` is zero-sized?  | 
 | 153 | + | 
 | 154 | +# Future possibilities  | 
 | 155 | +[future-possibilities]: #future-possibilities  | 
 | 156 | + | 
 | 157 | +* Consider defining a separate Rust type which is repr-equivalent to the platform's  | 
 | 158 | +  native `std::span<T, std::dynamic_extent>` to allow for easier  | 
 | 159 | +  interoperability with C++ APIs. Unfortunately, the C++ standard does not  | 
 | 160 | +  guarantee the layout of `std::span` (though the representation may be known  | 
 | 161 | +  and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC).  | 
 | 162 | +  Zero-sized types would also not be supported with a naiive implementation of  | 
 | 163 | +  such a type.  | 
0 commit comments