-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve/modernize the handling of utf16 data in std::str #12317
Conversation
(This doesn't pass tests yet... will fix tomorrow) |
I reviewed 797396b791f4c2b78e2d3ef4e75aca5d6801e594 and preceding commits. Looks good to me. |
To align with how |
Ready for review: fixed tests & made |
/// use std::str; | ||
/// use std::str::{ScalarValue, LoneSurrogate}; | ||
/// | ||
/// // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't quite have an equivalent for a utf8 iterator like this, so perhaps this shouldn't be a public api just yet?
I may be missing some use cases though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but it was previously a public internal iterator (http://static.rust-lang.org/doc/master/std/str/fn.utf16_chars.html), so this change is only modernizing & adding the error-resistance, not adding something completely new. (Don't know if that's at all relevant to a decision...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was unaware of that!
Most of the tests are randomly generated with Python 3 and rely on it's UTF-16be encoder/decoder being correct.
This replaces the iterator with one that handles lone surrogates gracefully and uses that to implement `from_utf16_lossy` which replaces invalid `u16`s with U+FFFD.
The rest of the codebase is moving toward avoiding `fail!` so we do it here too!
Failed on windows because Windows actually uses UTF-16 for APIs (:cry:) and things generally want |
It looks like some places relied on the old behaviour of |
Ideally, NUL-termination should be orthogonal to decoding 16 bit units into code points. Could we have a separate |
Yeah, of course; I was just explaining the failure. :) |
I've added the truncation function. |
Many of the functions interacting with Windows APIs allocate a vector of 0's and do not retrieve a length directly from the API call, and so need to be sure to remove the unmodified junk at the end of the vector.
Iterators! Use them (in `is_utf16`), create them (in `utf16_items`). Handle errors gracefully (`from_utf16_lossy`) and `from_utf16` returning `Option<~str>` instead of failing. Add a pile of tests.
Add check for 'in_external_macro' and 'is_from_proc_macro' inside changelog: Fix rust-lang#12291 #[tracing::instrument()] triggers infinite_loop Added an in_external_macro and is_from_proc_macro check to the [infinite_loop] lint
Iterators! Use them (in
is_utf16
), create them (inutf16_items
).Handle errors gracefully (
from_utf16_lossy
) andfrom_utf16
returningOption<~str>
instead of failing.Add a pile of tests.