-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[librustdoc] Only split lang string on ,
,
, and \t
#78429
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@rfcbot fcp merge This changes the parsing of language strings for code blocks so that only This is a breaking change, but in practice it seems unlikely anyone is using this. It's possible that we could find out with a crater run if crater allows denying specific warnings. Possible alternatives:
|
Team member @jyn514 has proposed to merge this. The next step is review by the rest of the tagged team members:
No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
@casey as-is, this breaks trailing text after spaces:
What do you think about changing this to split on either spaces or |
@jyn514 Sounds good to me! I think using spaces is common, particularly with GitHub flavored markdown, so splitting on spaces seems reasonable. I also split on tabs, since it seems weird to split on spaces but not tabs, and since users sometimes use tabs for alignment. |
,
,
and
,
and
,
,
, and \t
Hum also: we use such headers in rust tests:
So splitting on whistepaces and tabs might not be a good idea... |
Unrecognized tokens are ignored, so this shouldn't change the set of recognized tokens in the above example. Before the change, After the change, |
Yes but what's the point of splitting on whitespaces and tabs? |
If it didn't split on spaces, then something like |
Oh I see, good point! |
Nominating for @rust-lang/lang meeting to draw attention to the PR. |
I'd like to see crater results, but I think this is reasonable. It'd be nice to have a survey of how other markdown systems approach this though. |
Also would like to see crater results, but I'm fine with this otherwise @rfcbot approve |
I did a mini survey of other markdown systems.
All four treat characters other than It seems like there's little agreement 😅 If there are other systems, that I should take a look at, let me know. Do I need to do anything to trigger a crater run? |
I'm not sure if crater would tell us anything meaningful here, actually. Rustdoc doesn't warn or error when it runs into a language string it doesn't understand, so even if there were a change in behavior we wouldn't see it on crater. |
We could see test failures if attributes stop being recognized. For example, this would currently get the ```rust&should_panic
panic!();
``` It would be ideal to see if the number of tests that ran changed. If they stayed the same and nothing failed, we could be confident that there were no breakages. |
@jyn514 correct, but we can push a change that makes this a hard error and then crater it |
Got it, thanks. @casey can you change the PR to give an error/give an assertion failure if the new and old parsing don't match? You'll have to temporarily disable the tests. Then I can start a crater run to see how often this happens in practice. |
I think this is ready for a crater run. I modified it to parse lang strings twice, once with the new splitting scheme, and once with the old, and panic if the parsed |
Thanks, looks great! @ollie27 can you take another look at #78429 (comment)? I think we'll be ready to go after that :) @casey in the meantime do you mind squashing the commits to something fewer than 24? |
Only split doctest lang strings on `,`, ` `, and `\t`. Additionally, to preserve backwards compatibility with pandoc-style langstrings, strip a surrounding `{}`, and remove leading `.`s from each token. Prior to this change, doctest lang strings were split on all non-alphanumeric characters except `-` or `_`, which limited future extensions to doctest lang string tokens, for example using `=` for key-value tokens. This is a breaking change, although it is not expected to be disruptive, because lang strings using separators other than `,` and ` ` are not very common
d5d26bd
to
66f4883
Compare
Yeah, 24 is a bit excessive for this change 😅 I squashed to one commit, and improved the commit message. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
I've approved on behalf of withoutboats, as per rust-lang/team#526. |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
@bors r+ |
📌 Commit 66f4883 has been approved by |
☀️ Test successful - checks-actions |
Awesome! Thank you @jyn514 for helping me get this through!
I love how polite rfcbot is 😊 |
You're very welcome! Thanks for sticking with it so long :) |
Package changes: * bump bootstraps to 1.51.0. * adjust patches and cargo checksums as required * 1.51 failed to build natively on 32-bit armv7, there is hope that this is fixed with 1.52. (1.51 can be built with netbsd32 emulation on a aarch64 system). Upsteream changes: Version 1.52.0 (2021-05-06) ============================ Language -------- - [Added the `unsafe_op_in_unsafe_fn` lint, which checks whether the unsafe code in an `unsafe fn` is wrapped in a `unsafe` block.][79208] This lint is allowed by default, and may become a warning or hard error in a future edition. - [You can now cast mutable references to arrays to a pointer of the same type as the element.][81479] Compiler -------- - [Upgraded the default LLVM to LLVM 12.][81451] Added tier 3\* support for the following targets. - [`s390x-unknown-linux-musl`][82166] - [`riscv32gc-unknown-linux-musl` & `riscv64gc-unknown-linux-musl`][82202] - [`powerpc-unknown-openbsd`][82733] \* Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [`OsString` now implements `Extend` and `FromIterator`.][82121] - [`cmp::Reverse` now has `#[repr(transparent)]` representation.][81879] - [`Arc<impl Error>` now implements `error::Error`.][80553] - [All integer division and remainder operations are now `const`.][80962] Stabilised APIs ------------- - [`Arguments::as_str`] - [`char::MAX`] - [`char::REPLACEMENT_CHARACTER`] - [`char::UNICODE_VERSION`] - [`char::decode_utf16`] - [`char::from_digit`] - [`char::from_u32_unchecked`] - [`char::from_u32`] - [`slice::partition_point`] - [`str::rsplit_once`] - [`str::split_once`] The following previously stable APIs are now `const`. - [`char::len_utf8`] - [`char::len_utf16`] - [`char::to_ascii_uppercase`] - [`char::to_ascii_lowercase`] - [`char::eq_ignore_ascii_case`] - [`u8::to_ascii_uppercase`] - [`u8::to_ascii_lowercase`] - [`u8::eq_ignore_ascii_case`] Rustdoc ------- - [Rustdoc lints are now treated as a tool lint, meaning that lints are now prefixed with `rustdoc::` (e.g. `#[warn(rustdoc::non_autolinks)]`).][80527] Using the old style is still allowed, and will become a warning in a future release. - [Rustdoc now supports argument files.][82261] - [Rustdoc now generates smart punctuation for documentation.][79423] - [You can now use "task lists" in Rustdoc Markdown.][81766] E.g. ```markdown - [x] Complete - [ ] Todo ``` Misc ---- - [You can now pass multiple filters to tests.][81356] E.g. `cargo test -- foo bar` will run all tests that match `foo` and `bar`. - [Rustup now distributes PDB symbols for the `std` library on Windows, allowing you to see `std` symbols when debugging.][82218] Internal Only ------------- These changes provide no direct user facing benefits, but represent significant improvements to the internals and overall performance of rustc and related tools. - [Check the result cache before the DepGraph when ensuring queries][81855] - [Try fast_reject::simplify_type in coherence before doing full check][81744] - [Only store a LocalDefId in some HIR nodes][81611] - [Store HIR attributes in a side table][79519] Compatibility Notes ------------------- - [Cargo build scripts are now forbidden from setting `RUSTC_BOOTSTRAP`.] [cargo/9181] - [Removed support for the `x86_64-rumprun-netbsd` target.][82594] - [Deprecated the `x86_64-sun-solaris` target in favor of `x86_64-pc-solaris`.] [82216] - [Rustdoc now only accepts `,`, ` `, and `\t` as delimiters for specifying languages in code blocks.][78429] - [Rustc now catches more cases of `pub_use_of_private_extern_crate`][80763] - [Changes in how proc macros handle whitespace may lead to panics when used with older `proc-macro-hack` versions. A `cargo update` should be sufficient to fix this in all cases.][84136] [84136]: rust-lang/rust#84136 [80763]: rust-lang/rust#80763 [82166]: rust-lang/rust#82166 [82121]: rust-lang/rust#82121 [81879]: rust-lang/rust#81879 [82261]: rust-lang/rust#82261 [82218]: rust-lang/rust#82218 [82216]: rust-lang/rust#82216 [82202]: rust-lang/rust#82202 [81855]: rust-lang/rust#81855 [81766]: rust-lang/rust#81766 [81744]: rust-lang/rust#81744 [81611]: rust-lang/rust#81611 [81479]: rust-lang/rust#81479 [81451]: rust-lang/rust#81451 [81356]: rust-lang/rust#81356 [80962]: rust-lang/rust#80962 [80553]: rust-lang/rust#80553 [80527]: rust-lang/rust#80527 [79519]: rust-lang/rust#79519 [79423]: rust-lang/rust#79423 [79208]: rust-lang/rust#79208 [78429]: rust-lang/rust#78429 [82733]: rust-lang/rust#82733 [82594]: rust-lang/rust#82594 [cargo/9181]: rust-lang/cargo#9181 [`char::MAX`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.MAX [`char::REPLACEMENT_CHARACTER`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.REPLACEMENT_CHARACTER [`char::UNICODE_VERSION`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.UNICODE_VERSION [`char::decode_utf16`]: https://doc.rust-lang.org/std/primitive.char.html#method.decode_utf16 [`char::from_u32`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32 [`char::from_u32_unchecked`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32_unchecked [`char::from_digit`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_digit [`Peekable::next_if`]: https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if [`Peekable::next_if_eq`]: https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if_eq [`Arguments::as_str`]: https://doc.rust-lang.org/stable/std/fmt/struct.Arguments.html#method.as_str [`str::split_once`]: https://doc.rust-lang.org/stable/std/primitive.str.html#method.split_once [`str::rsplit_once`]: https://doc.rust-lang.org/stable/std/primitive.str.html#method.rsplit_once [`slice::partition_point`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.partition_point [`char::len_utf8`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.len_utf8 [`char::len_utf16`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.len_utf16 [`char::to_ascii_uppercase`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_ascii_uppercase [`char::to_ascii_lowercase`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_ascii_lowercase [`char::eq_ignore_ascii_case`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.eq_ignore_ascii_case [`u8::to_ascii_uppercase`]: https://doc.rust-lang.org/stable/std/primitive.u8.html#method.to_ascii_uppercase [`u8::to_ascii_lowercase`]: https://doc.rust-lang.org/stable/std/primitive.u8.html#method.to_ascii_lowercase [`u8::eq_ignore_ascii_case`]: https://doc.rust-lang.org/stable/std/primitive.u8.html#method.eq_ignore_ascii_case
Pkgsrc changes: * Bump bootstrap kit version to 1.51.0. * Adjust patches as needed. * Update checksum adjustments. * Fix syntax error in commands adjusting libserde_derive for Darwin Upstream changes: Version 1.52.1 (2021-05-10) ============================ This release disables incremental compilation, unless the user has explicitly opted in via the newly added RUSTC_FORCE_INCREMENTAL=1 environment variable. This is due to the widespread, and frequently occuring, breakage encountered by Rust users due to newly enabled incremental verification in 1.52.0. Notably, Rust users **should** upgrade to 1.52.0 or 1.52.1: the bugs that are detected by newly added incremental verification are still present in past stable versions, and are not yet fixed on any channel. These bugs can lead to miscompilation of Rust binaries. These problems only affect incremental builds, so release builds with Cargo should not be affected unless the user has explicitly opted into incremental. Debug and check builds are affected. See [84970] for more details. [84970]: rust-lang/rust#84970 Version 1.52.0 (2021-05-06) ============================ Language -------- - [Added the `unsafe_op_in_unsafe_fn` lint, which checks whether the unsafe code in an `unsafe fn` is wrapped in a `unsafe` block.][79208] This lint is allowed by default, and may become a warning or hard error in a future edition. - [You can now cast mutable references to arrays to a pointer of the same type as the element.][81479] Compiler -------- - [Upgraded the default LLVM to LLVM 12.][81451] Added tier 3\* support for the following targets. - [`s390x-unknown-linux-musl`][82166] - [`riscv32gc-unknown-linux-musl` & `riscv64gc-unknown-linux-musl`][82202] - [`powerpc-unknown-openbsd`][82733] \* Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [`OsString` now implements `Extend` and `FromIterator`.][82121] - [`cmp::Reverse` now has `#[repr(transparent)]` representation.][81879] - [`Arc<impl Error>` now implements `error::Error`.][80553] - [All integer division and remainder operations are now `const`.][80962] Stabilised APIs ------------- - [`Arguments::as_str`] - [`char::MAX`] - [`char::REPLACEMENT_CHARACTER`] - [`char::UNICODE_VERSION`] - [`char::decode_utf16`] - [`char::from_digit`] - [`char::from_u32_unchecked`] - [`char::from_u32`] - [`slice::partition_point`] - [`str::rsplit_once`] - [`str::split_once`] The following previously stable APIs are now `const`. - [`char::len_utf8`] - [`char::len_utf16`] - [`char::to_ascii_uppercase`] - [`char::to_ascii_lowercase`] - [`char::eq_ignore_ascii_case`] - [`u8::to_ascii_uppercase`] - [`u8::to_ascii_lowercase`] - [`u8::eq_ignore_ascii_case`] Rustdoc ------- - [Rustdoc lints are now treated as a tool lint, meaning that lints are now prefixed with `rustdoc::` (e.g. `#[warn(rustdoc::non_autolinks)]`).][80527] Using the old style is still allowed, and will become a warning in a future release. - [Rustdoc now supports argument files.][82261] - [Rustdoc now generates smart punctuation for documentation.][79423] - [You can now use "task lists" in Rustdoc Markdown.][81766] E.g. ```markdown - [x] Complete - [ ] Todo ``` Misc ---- - [You can now pass multiple filters to tests.][81356] E.g. `cargo test -- foo bar` will run all tests that match `foo` and `bar`. - [Rustup now distributes PDB symbols for the `std` library on Windows, allowing you to see `std` symbols when debugging.][82218] Internal Only ------------- These changes provide no direct user facing benefits, but represent significant improvements to the internals and overall performance of rustc and related tools. - [Check the result cache before the DepGraph when ensuring queries][81855] - [Try fast_reject::simplify_type in coherence before doing full check][81744] - [Only store a LocalDefId in some HIR nodes][81611] - [Store HIR attributes in a side table][79519] Compatibility Notes ------------------- - [Cargo build scripts are now forbidden from setting `RUSTC_BOOTSTRAP`.][cargo/9181] - [Removed support for the `x86_64-rumprun-netbsd` target.][82594] - [Deprecated the `x86_64-sun-solaris` target in favor of `x86_64-pc-solaris`.][82216] - [Rustdoc now only accepts `,`, ` `, and `\t` as delimiters for specifying languages in code blocks.][78429] - [Rustc now catches more cases of `pub_use_of_private_extern_crate`][80763] - [Changes in how proc macros handle whitespace may lead to panics when used with older `proc-macro-hack` versions. A `cargo update` should be sufficient to fix this in all cases.][84136] [84136]: rust-lang/rust#84136 [80763]: rust-lang/rust#80763 [82166]: rust-lang/rust#82166 [82121]: rust-lang/rust#82121 [81879]: rust-lang/rust#81879 [82261]: rust-lang/rust#82261 [82218]: rust-lang/rust#82218 [82216]: rust-lang/rust#82216 [82202]: rust-lang/rust#82202 [81855]: rust-lang/rust#81855 [81766]: rust-lang/rust#81766 [81744]: rust-lang/rust#81744 [81611]: rust-lang/rust#81611 [81479]: rust-lang/rust#81479 [81451]: rust-lang/rust#81451 [81356]: rust-lang/rust#81356 [80962]: rust-lang/rust#80962 [80553]: rust-lang/rust#80553 [80527]: rust-lang/rust#80527 [79519]: rust-lang/rust#79519 [79423]: rust-lang/rust#79423 [79208]: rust-lang/rust#79208 [78429]: rust-lang/rust#78429 [82733]: rust-lang/rust#82733 [82594]: rust-lang/rust#82594 [cargo/9181]: rust-lang/cargo#9181 [`char::MAX`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.MAX [`char::REPLACEMENT_CHARACTER`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.REPLACEMENT_CHARACTER [`char::UNICODE_VERSION`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.UNICODE_VERSION [`char::decode_utf16`]: https://doc.rust-lang.org/std/primitive.char.html#method.decode_utf16 [`char::from_u32`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32 [`char::from_u32_unchecked`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32_unchecked [`char::from_digit`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_digit [`Peekable::next_if`]: https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if [`Peekable::next_if_eq`]: https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if_eq [`Arguments::as_str`]: https://doc.rust-lang.org/stable/std/fmt/struct.Arguments.html#method.as_str [`str::split_once`]: https://doc.rust-lang.org/stable/std/primitive.str.html#method.split_once [`str::rsplit_once`]: https://doc.rust-lang.org/stable/std/primitive.str.html#method.rsplit_once [`slice::partition_point`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.partition_point [`char::len_utf8`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.len_utf8 [`char::len_utf16`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.len_utf16 [`char::to_ascii_uppercase`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_ascii_uppercase [`char::to_ascii_lowercase`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_ascii_lowercase [`char::eq_ignore_ascii_case`]: https://doc.rust-lang.org/stable/std/primitive.char.html#method.eq_ignore_ascii_case [`u8::to_ascii_uppercase`]: https://doc.rust-lang.org/stable/std/primitive.u8.html#method.to_ascii_uppercase [`u8::to_ascii_lowercase`]: https://doc.rust-lang.org/stable/std/primitive.u8.html#method.to_ascii_lowercase [`u8::eq_ignore_ascii_case`]: https://doc.rust-lang.org/stable/std/primitive.u8.html#method.eq_ignore_ascii_case
Split markdown lang strings into tokens on
,
.The previous behavior was to split lang strings into tokens on any
character that wasn't a
_
,_
, or alphanumeric.This is a potentially breaking change, so please scrutinize! See discussion in #78344.
I noticed some test cases that made me wonder if there might have been some reason for the original behavior:
It seemed pretty peculiar to specifically test lang strings in braces, with all the tokens prefixed by
.
.I did some digging, and it looks like the test cases were added way back in this commit from 2014 by @skade.
It looks like they were added just to make sure that the splitting was permissive, and aren't testing that those strings in particular are accepted.
Closes #78344.