-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use iterators for std::str::is_utf8 & part of from_utf8_lossy. #12314
Conversation
Also:
|
Ack, I think makefiles have issues. The second make gives me "Nothing to be done for `all'" |
By adding a
but I'm not entirely convinced at the moment it's building correctly, because a subsequent |
That looks nicer! It doesn't explain what's going on with me... And I don't understand why Could someone else have a go too? |
@huonw What platform/architecture are you testing on? Maybe it's some sort of architecture-dependent issue? |
x86-64 linux with a i7-3517u. Another thing: some tiny tweaks (like changing the names of the functions) gives results more similar to yours. |
Names? Really? How strange. What if you simply |
Yeah, changing the name to That said, I'll just iterator-ise the whole |
So anyway with a rebase and a str::bench::from_utf8_lossy_100_ascii ... bench: 119 ns/iter (+/- 8)
str::bench::from_utf8_lossy_100_invalid ... bench: 454 ns/iter (+/- 16)
str::bench::from_utf8_lossy_100_multibyte ... bench: 116 ns/iter (+/- 9)
str::bench::from_utf8_lossy_invalid ... bench: 119 ns/iter (+/- 9) I think this is ready for merging. |
Code looks good to me. I'll run benchmarks myself shortly. |
Latest benchmark:
|
(I just realised I've left outdated comments in the source, I've removed the r+... don't r+ again for now, please.) |
This uses a vector iterator to avoid the necessity for unsafe indexing, and makes this function slightly faster. Unfortunately rust-lang#11751 means that the iterator comes with repeated `null` checks which means the pure-ASCII case still has room for significant improvement (and the other cases too, but it's most significant for just ASCII). Before: is_utf8_100_ascii ... bench: 143 ns/iter (+/- 6) is_utf8_100_multibyte ... bench: 134 ns/iter (+/- 4) After: is_utf8_100_ascii ... bench: 123 ns/iter (+/- 4) is_utf8_100_multibyte ... bench: 115 ns/iter (+/- 5)
This makes it very slightly faster, especially when the string is valid UTF-8, and completely removes the use of `unsafe` from the first half. Before: from_utf8_lossy_100_ascii ... bench: 151 ns/iter (+/- 17) from_utf8_lossy_100_invalid ... bench: 447 ns/iter (+/- 33) from_utf8_lossy_100_multibyte ... bench: 135 ns/iter (+/- 4) from_utf8_lossy_invalid ... bench: 124 ns/iter (+/- 10 After: from_utf8_lossy_100_ascii ... bench: 119 ns/iter (+/- 8) from_utf8_lossy_100_invalid ... bench: 454 ns/iter (+/- 16) from_utf8_lossy_100_multibyte ... bench: 116 ns/iter (+/- 9) from_utf8_lossy_invalid ... bench: 119 ns/iter (+/- 9)
See the commit messages for more details, but this makes `std::str::is_utf8` slightly faster and 100% non-`unsafe` and uses a similar thing to make the first scan of `from_utf8_lossy` 100% safe & faster.
…nas-schievink minor: simplify
…=flip1995 Default test output conflict handling to error oli-obk/ui_test#175 got rid of the `bool` that controlled the default handling so we need to specify it ourselves r? `@flip1995` changelog: none
See the commit messages for more details, but this makes
std::str::is_utf8
slightly faster and 100% non-unsafe
and uses a similar thing to make the first scan offrom_utf8_lossy
100% safe & faster.