Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unions: call out field offset issues #627

Merged
merged 2 commits into from
Jun 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions src/items/unions.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ The key property of unions is that all fields of a union share common storage.
As a result writes to one field of a union can overwrite its other fields, and
size of a union is determined by the size of its largest field.

## Initialization of a union

A value of a union type can be created using the same syntax that is used for
struct types, except that it must specify exactly one field:

Expand All @@ -37,13 +39,17 @@ struct fields:
let f = u.f1;
```

## Reading and writing union fields

Unions have no notion of an "active field". Instead, every union access just
interprets the storage at the type of the field used for the access. Reading a
union field reads the bits of the union at the field's type. It is the
programmer's responsibility to make sure that the data is valid at that
type. Failing to do so results in undefined behavior. For example, reading the
value `3` at type `bool` is undefined behavior. Effectively, writing to and then
reading from a union is analogous to a [`transmute`] from the type used for
union field reads the bits of the union at the field's type. Fields might have a
non-zero offset (except when `#[repr(C)]` is used); in that case the bits
starting at the offset of the fields are read. It is the programmer's
responsibility to make sure that the data is valid at the field's type. Failing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really the role of the reference to say "it is the programmer's responsibility" or "this is how you use X" but it's fine here imo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion for wording it differently? I think it is useful to think in terms of responsibilities/obligations when talking about unsafe code, and hence I think it is useful for the reference to explain it as such.

I also have 0 experience writing standard-esque text.^^ Given my experience reading it though, it seems often so useless that I am not sure I am sad about that...

Copy link
Contributor

@Centril Centril Jun 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really; the text itself is good but it's probably more aimed at users (e.g. the nomicon) whereas the reference is intended as the seeds of a spec with just definitions of the language but not the "why" and "this is how I use it". That is, should be more like the details of one of your academic papers with operational semantics and judgements... it isn't now, but it should be. :)

In any case, I think we can afford a smidgen of tips towards the programmer also here because some users will read this.

Copy link
Member Author

@RalfJung RalfJung Jun 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the parts of the paper (or more often, the appendix) that only 2 people on this planet read because they are incomprehensible? ;)

Copy link
Contributor

@Centril Centril Jun 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the papers I read :)

I think something like the WASM spec has a nice structure:

That is, a mix of formally written semantics and English text to make it a bit less dense and more informative for non-experts.

to do so results in undefined behavior. For example, reading the value `3` at
type `bool` is undefined behavior. Effectively, writing to and then reading from
a `#[repr(C)]` union is analogous to a [`transmute`] from the type used for
writing to the type used for reading.

Consequently, all reads of union fields have to be placed in `unsafe` blocks:
Expand All @@ -70,6 +76,8 @@ u.f1 = 2;
Commonly, code using unions will provide safe wrappers around unsafe union
field accesses.

## Pattern matching on unions

Another way to access union fields is to use pattern matching. Pattern matching
on union fields uses the same syntax as struct patterns, except that the pattern
must specify exactly one field. Since pattern matching is like reading the union
Expand Down Expand Up @@ -119,6 +127,8 @@ fn is_zero(v: Value) -> bool {
}
```

## References to union fields

Since union fields share common storage, gaining write access to one field of a
union can give write access to all its remaining fields. Borrow checking rules
have to be adjusted to account for this fact. As a result, if one field of a
Expand Down
9 changes: 5 additions & 4 deletions src/types/union.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
# Union types

A *union type* is a nominal, heterogeneous C-like union, denoted by the name of
a [`union` item].
a [`union` item][item].

A union access transmutes the content of the union to the type of the accessed
Unions have no notion of an "active field". Instead, every union access
transmutes parts of the content of the union to the type of the accessed
field. Since transmutes can cause unexpected or undefined behaviour, `unsafe` is
required to read from a union field or to write to a field that doesn't
implement [`Copy`].
implement [`Copy`]. See the [item] documentation for further details.

The memory layout of a `union` is undefined by default, but the `#[repr(...)]`
attribute can be used to fix a layout.

[`Copy`]: special-types-and-traits.html#copy
[`union` item]: items/unions.html
[item]: items/unions.html