-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add complex case mapping and title case mapping. #26039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
This adds 120 mappings: Dž dž Dž DŽ Lj lj Lj LJ Nj nj Nj NJ Dz dz Dz DZ Ι ᾈ ᾀ ᾉ ᾁ ᾊ ᾂ ᾋ ᾃ ᾌ ᾄ ᾍ ᾅ ᾎ ᾆ ᾏ ᾇ ᾘ ᾐ ᾙ ᾑ ᾚ ᾒ ᾛ ᾓ ᾜ ᾔ ᾝ ᾕ ᾞ ᾖ ᾟ ᾗ ᾨ ᾠ ᾩ ᾡ ᾪ ᾢ ᾫ ᾣ ᾬ ᾤ ᾭ ᾥ ᾮ ᾦ ᾯ ᾧ ᾼ ᾳ ῌ ῃ ῼ ῳ Ⅰ ⅰ Ⅱ ⅱ Ⅲ ⅲ Ⅳ ⅳ Ⅴ ⅴ Ⅵ ⅵ Ⅶ ⅶ Ⅷ ⅷ Ⅸ ⅸ Ⅹ ⅹ Ⅺ ⅺ Ⅻ ⅻ Ⅼ ⅼ Ⅽ ⅽ Ⅾ ⅾ Ⅿ ⅿ ⅰ Ⅰ ⅱ Ⅱ ⅲ Ⅲ ⅳ Ⅳ ⅴ Ⅴ ⅵ Ⅵ ⅶ Ⅶ ⅷ Ⅷ ⅸ Ⅸ ⅹ Ⅹ ⅺ Ⅺ ⅻ Ⅻ ⅼ Ⅼ ⅽ Ⅽ ⅾ Ⅾ ⅿ Ⅿ Ⓐ ⓐ Ⓑ ⓑ Ⓒ ⓒ Ⓓ ⓓ Ⓔ ⓔ Ⓕ ⓕ Ⓖ ⓖ Ⓗ ⓗ Ⓘ ⓘ Ⓙ ⓙ Ⓚ ⓚ Ⓛ ⓛ Ⓜ ⓜ Ⓝ ⓝ Ⓞ ⓞ Ⓟ ⓟ Ⓠ ⓠ Ⓡ ⓡ Ⓢ ⓢ Ⓣ ⓣ Ⓤ ⓤ Ⓥ ⓥ Ⓦ ⓦ Ⓧ ⓧ Ⓨ ⓨ Ⓩ ⓩ ⓐ Ⓐ ⓑ Ⓑ ⓒ Ⓒ ⓓ Ⓓ ⓔ Ⓔ ⓕ Ⓕ ⓖ Ⓖ ⓗ Ⓗ ⓘ Ⓘ ⓙ Ⓙ ⓚ Ⓚ ⓛ Ⓛ ⓜ Ⓜ ⓝ Ⓝ ⓞ Ⓞ ⓟ Ⓟ ⓠ Ⓠ ⓡ Ⓡ ⓢ Ⓢ ⓣ Ⓣ ⓤ Ⓤ ⓥ Ⓥ ⓦ Ⓦ ⓧ Ⓧ ⓨ Ⓨ ⓩ Ⓩ
…5800 As a result, the iterator returned by `char::to_uppercase` sometimes yields two or three `char`s instead of just one.
But not str::to_titlecase which would require UAX#29 Unicode Text Segmentation which we decided not to include in of `std`: rust-lang/rfcs#1054
Regarding the table sizes: there used to be two It’s probably possible to trade off some CPU time in mapping execution to make the table smaller: most of the |
s.extend(self[..].chars().flat_map(|c| c.to_lowercase())); | ||
for (i, c) in self[..].char_indices() { | ||
if c == 'Σ' { | ||
map_uppercase_sigma(self, i, &mut s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment for why this is being specially cased here?
I'm ok adding the extra table sizes for now, with |
cc @rust-lang/libs |
@@ -37,6 +37,7 @@ extern crate rustc_unicode; | |||
mod binary_heap; | |||
mod bit; | |||
mod btree; | |||
mod char; // char isn't really a collection, but didn't find a better place for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mumble mumble collection of code points mumble mumble
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A char
is one code point.
Updated. |
Thanks @SimonSapin! |
⌛ Testing commit c57a412 with merge 1b36871... |
💔 Test failed - auto-win-gnu-64-nopt-t |
That’s not because of my patch, is it? |
@bors: retry On Mon, Jun 8, 2015 at 10:57 PM, Simon Sapin notifications@github.com
|
⌛ Testing commit c57a412 with merge 30f6796... |
💔 Test failed - auto-mac-64-opt |
Turns out This should fix the failure. |
I’m getting an LLVM assertion error in
|
⌛ Testing commit 6369dcb with merge 1fb2b19... |
💔 Test failed - auto-mac-64-nopt-t |
|
@bors: retry On Tue, Jun 9, 2015 at 11:26 AM, bors notifications@github.com wrote:
|
* Add “complex” mappings to `char::to_lowercase` and `char::to_uppercase`, making them yield sometimes more than on `char`: #25800. `str::to_lowercase` and `str::to_uppercase` are affected as well. * Add `char::to_titlecase`, since it’s the same algorithm (just different data). However this does **not** add `str::to_titlecase`, as that would require UAX#29 Unicode Text Segmentation which we decided not to include in of `std`: rust-lang/rfcs#1054 I made `char::to_titlecase` immediately `#[stable]`, since it’s so similar to `char::to_uppercase` that’s already stable. Let me know if it should be `#[unstable]` for a while. * Add a special case for upper-case Sigma in word-final position in `str::to_lowercase`: #26035. This is the only language-independent conditional mapping currently in `SpecialCasing.txt`. * Stabilize `str::to_lowercase` and `str::to_uppercase`. The `&self -> String` on `str` signature seems straightforward enough, and the only relevant issue I’ve found is #24536 about naming. But `char` already has stable methods with the same name, and deprecating them for a rename doesn’t seem worth it. r? @alexcrichton
This will need to be mentioned in the 1.2 release notes and it's a breaking change. |
Is it? I don’t really know how to interpret https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md#behavioral-changes here, but the new behaviour in this PR is the one that has been documented as intended since before 1.0. |
@SimonSapin It's allowed, but it changes behavior and so can break users' code (regardless of what the policy may consider 'breaking'). |
Ok. Agreed on mentioning in the release notes. |
I tagged this with stable-regression since it's a change in behavior and that's currently the best tag to track them, even though there is definitely a difference between intentional and unintentional regressions. |
char::to_lowercase
andchar::to_uppercase
, making them yield sometimes more than onchar
: Implement full to_{upper,lower}case algorithms #25800.str::to_lowercase
andstr::to_uppercase
are affected as well.char::to_titlecase
, since it’s the same algorithm (just different data). However this does not addstr::to_titlecase
, as that would require UAX#29 Unicode Text Segmentation which we decided not to include in ofstd
: Rename or replacestr::words
to side-step the ambiguity of “a word”. rfcs#1054 I madechar::to_titlecase
immediately#[stable]
, since it’s so similar tochar::to_uppercase
that’s already stable. Let me know if it should be#[unstable]
for a while.str::to_lowercase
: Word-final sigma instr::to_lowercase
#26035. This is the only language-independent conditional mapping currently inSpecialCasing.txt
.str::to_lowercase
andstr::to_uppercase
. The&self -> String
onstr
signature seems straightforward enough, and the only relevant issue I’ve found is Standardize on either 'upper_case' or 'uppercase' #24536 about naming. Butchar
already has stable methods with the same name, and deprecating them for a rename doesn’t seem worth it.r? @alexcrichton