Description
In this thread and the comments that follow, the following new information was discovered.
repr(C)
unions have padding bits of the form:
- trailing padding: if the union size is larger than the size of its largest field, all
union_size - largest_field_size
trailing bits are padding bits. - interior padding: if the bit
i
of all union fields is a padding bit, the biti
of the inion is a padding bit
Note: because all
repr(C)
union fields are at offset 0, there is no padding before any field.
The content of padding bits of repr(C)
unions is always uninitialized. That is, they are not required to be preserved on copy / move / pass by value, etc. The implementation of the call ABI can exploit this knowledge.
For example, Rust, clang, and GCC all implement the SysV64 ABI, and when passing a #[repr(C)] union U { x: (u8, u32) }
around by value, @eddyb mentioned that Rust and GCC pass the bottom 32-bits (where the u8
is stored) while clang passes the bottom 8-bits. Both implementations are allowed. @comex also mentioned that in some ABIs like RISC-V ELF "appears to require callers to zero- or sign-extend
arguments in registers in a particular way. In other words, it requires the upper bits (which correspond to padding bytes) to have a specific value, and the callee can assume that they do have that value". That would be incompatible with allowing users to use the padding bits.
That is, repr(C)
unions are not and cannot be just "bags of bits" where one could write to any bit, and that bit value would need to be preserved on copy / move / pass-by-value.
We should document this for repr(C)
unions in the Unsafe Code Guidelines, so I'm re-opening this issue until that is resolved.