Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hygiene opt-out (escaping) for declarative macros 2.0 #2498

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 229 additions & 0 deletions text/0000-macro-hygiene-optout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
- Feature Name: macro_hygiene_optout
- Start Date: 2018-07-05
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

This feature introduces the ability to "opt-out" of the usual macro hygiene rules within definitions of [declarative macros][rfc-decl-macro] (macros 2.0), for designated occurrences of identifiers. In other words, the feature will enable one to annotate occurrences of identifiers with macro call-site hygiene rather than the default definition-site hygiene.

# Motivation
[motivation]: #motivation

The use of [hygienic macros][rust-hygienic-macros] in Rust is justified by much prior research and experience <sup>([1][paper-hygienic-macro-expansion] [2][scheme-hygiene])</sup>, and solves several common issues that programmers would otherwise encounter with macros due to the nature of syntactical substitution. The principal deficit of this approach is that it requires that names/identifiers of any items generated by a macro be *explicitly passed to* the macro as arguments. This both requires the logic for name selection to remain entirely external to the macro, and even if that is not a problem, the passing of all identifiers-to-export into a macro can quickly become unwieldy for macros that generate many identifiers.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Escaping of hygiene for identifiers within macros allows one to define identifiers with syntax contexts (**hygiene**) corresponding to the location in the source code from which the macro is invoked (the **call-site**) rather than the location it is defined (**definition-site**). It also enables one to use/reference existing identifiers from the call-site from within macro definitions, though this is not the true aim of the feature, but rather a side-effect, and will be discussed later.

Note that for the purposes of this RFC, an **identifier** can roughly be considered to be an textual name (e.g. `foo_bar`) of any sort (for a variable, function, trait, etc.) or a lifetime (e.g. `'a`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently all lifetime parameters are unhygienic, not sure if we will fix that for macros 2.0 or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Hopefully we will!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifetimes are already hygienic in macro macros and with Span::def_site() in proc macros.


To escape an identifier in code, one simply prefixes an identifier with the [sigil] `#`. This changes the syntax context (hygiene) of the identifier from the usual definition-site to the call-site.

## Guide: Example A
[guide-example-a]: #guide-example-a

```rust
#![feature(decl_macro)]
#![feature(macro_hygiene_optout)]

macro m() {
pub mod #foo {
pub const #BAR: u32 = 123;
}
}

fn main() {
m!(); // `foo` and `foo::BAR` both behave as if they were defined directly here.
assert_eq!(123, foo::BAR);
}
```

## Guide: Example B
[guide-example-b]: #guide-example-b

```rust
#![feature(decl_macro)]
#![feature(macro_hygiene_optout)]

macro m($mod_name:ident) {
pub mod $mod_name {
pub const #BAR: u32 = 123;
}
}

fn main() {
m!(foo); // `foo` and `foo::BAR` both behave as if they were defined directly here.
assert_eq!(123, foo::BAR);
}
```

## Guide: Example C
[guide-example-c]: #guide-example-c

```rust
#![feature(decl_macro)]
#![feature(macro_hygiene_optout)]

macro m($mod_name:ident) {
pub mod $mod_name {
pub const BAR: u32 = 123;
}
}

fn main() {
m!(foo);
let _ = foo::BAR;
//~^ ERROR cannot find value `BAR` in module `foo`
}
```

## Guide: Example D
[guide-example-d]: #guide-example-d

```rust
#![feature(decl_macro)]
#![feature(macro_hygiene_optout)]

macro m() {
pub mod #foo {
pub const BAR: u32 = 123;
}
}

fn main() {
m!();
let _ = foo::BAR;
//~^ ERROR cannot find value `BAR` in module `foo`
}
```

## Meta-variables
[meta-variables]: #meta-variables

Hygiene escaping of meta-variables (i.e. `#$foo` and `$#foo`) does not have immediately obvious semantics or usefulness, so is explicitly disallowed for the present, and yields error messages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The obvious semantics to me is that the resulting identifier takes the name from the metavariable and the hygiene context from the call site.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I really meant the former in't obviously useful, why the latter isn't obviously useful either nor does it have obvious semantics.


## Usage Notes
[usage-notes]: #usage-notes

While the motivation of this feature stems from defining or "exporting" new identifiers from macros to their call-site, where it is appropriate for the macro itself to choose/compute the name, it is clear from the above semantics that this feature allows for other potential uses cases. Most notably, one can use or "import" an identifier from their call-site. This, however, is *not* recommended, since this purpose is already fulfilled well by macro parameters. On the other hand, it is not explicitly disallowed, for two reasons:

- Defining an identifier with call-site hygiene within that macro and then using it is a perfectly reasonable scenario.
- Macro expansion is performed at the syntactical (token stream) level, before parsing, so definitions and uses cannot be easily distinguished.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The macro parser routine first parses the macro definition into a token stream (as before), but now also tags tokens and meta-variables with an enum value representing the kind of hygiene (definition-site or call-site). This is only enabled for new-style `macro!` macros (i.e. *decl_macro* or macros 2.0); for `macro_rules!` macros, the call-site sigil `#` is not handled specially, and gives rise to an error. The sigil is always treated as a separate token outside of macros, on the LHS of macro rules, and when not followed by an identifier on the RHS.

It should be noted that the sigil `#` has nothing to do with the syntax for [raw identifiers][rfc-raw-identifiers], and can be disambiguated without problems. In fact, they can be used together without any issues, e.g. `#r#foo`.

When the macro is invoked (expanded), each token tree is transcribed according to the following rules, depending on its hygiene tag.

- *definition-site*: a normal mark is applied for the current expansion, which leaves the syntax context alone
- *call-site*: a transparent mark is applied for the current expansion, which changes the syntax context for every identifier in the token tree to that of the call site.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax context of an identifier is a sequence of marks RootMark -> Mark1 -> Mark2.
Both "def-site" and "call-site" variants change it, the former to RootMark -> Mark1 -> Mark2 -> OpaqueMark, the latter to RootMark -> Mark1 -> Mark2 -> TransparentMark.
(All this is an implementation detail anyway.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are Marks? What does the sequence of marks in this example mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are Marks?

Right now a mark is a combination of expansion ID and transparency :)

What does the sequence of marks in this example mean?

A syntactic context fully identifying what macros produced an identifier (or other token).

I'll write some docs after doing a number of refactorings in the compiler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. And an expansion ID is a particular expansion (instance of an expansion) of a macro, as I understand. Furthermore, I believe a RootMark is constructed from a span or set of spans, though I'm not 100% clear on this. Perhaps @petrochenkov can clarify.


## Reference: Example A
[reference-example-a]: #reference-example-a

In [example A][guide-example-a], the identifiers `foo` (the name of the module) and `BAR` (the name of the constant within the module) are hygiene-escaped, giving them the syntax context of the call site. Thus, `foo::BAR` resolves fine, since `foo` has the same syntax context as the body of the `main` function.

## Reference: Example B
[reference-example-b]: #reference-example-b

In [example B][guide-example-b], the module is named using the identifier passed into the macro, which as a macro argument has the syntax context of the call site. Furthermore, the constant `BAR` within the module is hygiene-escaped, so likewise has the syntax context of the call site. Thus, `foo::BAR` resolves fine, since `foo` has the same syntax context as the body of the `main` function.

## Reference: Example C
[reference-example-c]: #reference-example-c

In [example B][guide-example-b], the situation is similar to [example B][reference-example-b], except that the constant `BAR` is not hygiene-escaped, and thus retains the default definite-site syntaxt context. Thus, when one tries to access `foo::BAR` within the `main` function, `foo` resolves fine, but the constant `BAR` within it is not visible due to hygiene rules, since it does not have a syntax context of the `main` function (or any parent context).

## Reference: Example D
[reference-example-d]: #reference-example-d

In [example D][guide-example-d], the situation is almost identical to [example C][reference-example-c], except that the name of the module is defined within the macro as `foo`, and hygiene-escaped, so that it has the call-site syntax context.

```rust
#![feature(decl_macro)]
#![feature(macro_hygiene_optout)]

macro m() {
pub mod #foo {
pub const BAR: u32 = 123;
}
}

fn main() {
m!();
let _ = foo::BAR;
//~^ ERROR cannot find value `BAR` in module `foo`
}
```

# Drawbacks
[drawbacks]: #drawbacks

- Introducing a new sigil such as `#` can be seen as increasing the syntactical complexity of the language, and potentially obfuscating code slightly.
- The ability to mark some occurences of an identifier with call-site hygiene and leave others with default definition-site hygiene is perhaps more fine-grained than necessary.
- It is not immediately obvious from a macro definition which (occurences of) identifiers take their syntax context from the call site. One has to read through the whole definition to figure it out.
- The syntax permits marking identifiers with call-site hygiene purely for "use" or "import" scenarios (as opposed to "defining" or "exporting" scenarios). Parameters are intended for this purpose, and accomplish the task much better, since they self-document uses of identifiers. However, this ability may actually be desirable more than problematic, as mentioned in the [usage notes][usage-notes].

# Rationale and alternatives
[alternatives]: #alternatives

The design in this RFC was chosen because of its simple syntax and semantics, and the fact it offers a good way to get experience with hygiene opt-out in general, due to its fine-grainedness.

The main alternative considered was having an `escapes` attribute for macros and not using a sigil.

```rust
#[escapes(S, T)]
macro m() {
struct S; // Defines `S` at the call-site.
T // Resolves at the call-site.
}
```

The above would then be equivalent to the following, using the sigil syntax.

```rust
macro m() {
struct #S; // Defines `S` at the call-site.
#T // Resolves at the call-site.
}
```

The obvious benefit of this is that is manifest which identifiers (`S` and `T` in the above example) are hygiene-escaped. A downside, which may or may not be significant, is that these identifiers are then *always* escaped within the macro definition, and thus can never be used with definition-site hygiene.

Going beyond a single `escapes` attribute, one can also imagine having two separate attributes: `defines`, for defining (exporting) identifiers, and `uses`, for using (importing) identifiers. The main issue here is the complexity of the semantics and implementation; indeed, it is not even clear whether one could clearly demarcate cases of definition and use at the syntactical level. As implied by the [usage notes][usage-notes], however, the `uses` attribute would largely overlap with the purpose of macro parameters.

In the end, the approach taken by this RFC was chosen due to the fact it has the most prior art, including an [existing working implementation][pr-47992]. It is also the most flexible in that it allows different hygiene to be applied to different *occurrences* of the same identifier. This will allow us to learn more about the use of hygiene opt-out in practice, while the feature is unstable.

# Prior art
[prior-art]: #prior-art

Extended discussion on this subject was carried out in a [pull request][pr-47992] for this feature, which was closed due to the decision that an RFC such as this one be accepted first. [Alternatives][pr-47992-alternatives] were originally evaluated there, with discussion initiated by @jseyfried, and [continued][pr-47992-alternatives-eval] by @petrochenkov.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect some discussion of how this works in other languages here. In particular, Scheme has a rich system for doing this sort of thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'd like to avoid learning Scheme properly for this... maybe I can dig up a decent explanation somewhere?


Further back, the initial sigil syntax was mentioned in [this comment][pr-40848-comment], and some discussion occrred in the [declarative macros 2.0 tracking issue][issue-decl-macro].

# Unresolved questions
[unresolved]: #unresolved-questions

- Is a sigil other than `#` more appropriate, perhaps?
- Do we want to somehow disallow pure importing of identifiers within macros aside from via parameters, as mentioned in the [drawbacks] section?
- Do we also want to implement the attribute-based approach as an alternative or in addition to the sigil-based approach?

[sigil]: https://en.wikipedia.org/wiki/Sigil_(computer_programming)
[paper-hygienic-macro-expansion]: https://www.cs.indiana.edu/pub/techreports/TR194.pdf
[scheme-hygiene]: http://community.schemewiki.org/?hygiene

[rust-hygienic-macros]: https://doc.rust-lang.org/1.7.0/book/macros.html#hygiene

[rfc-decl-macro]: https://github.com/rust-lang/rfcs/blob/master/text/1584-macros.md
[rfc-raw-identifiers]: https://github.com/rust-lang/rfcs/blob/master/text/2151-raw-identifiers.md
[issue-decl-macro]: https://github.com/rust-lang/rust/issues/39412
[pr-40848-comment]: https://github.com/rust-lang/rust/pull/40847#issuecomment-291186518
[pr-47992]: https://github.com/rust-lang/rust/pull/47992
[pr-47992-alternatives]: https://github.com/rust-lang/rust/pull/47992#issuecomment-364729651
[pr-47992-alternatives-eval]: https://github.com/rust-lang/rust/pull/47992#issuecomment-370268136