Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with pointer-based slice prefix computation in normalizer #2433

Open
hsivonen opened this issue Aug 22, 2022 · 5 comments
Open

Experiment with pointer-based slice prefix computation in normalizer #2433

hsivonen opened this issue Aug 22, 2022 · 5 comments
Labels
A-performance Area: Performance (CPU, Memory) C-collator Component: Collation, normalization S-small Size: One afternoon (small bug fix or enhancement)

Comments

@hsivonen
Copy link
Member

#2378 uses a pattern where there is a full slice and another slice that's known to be its suffix, and the prefix of the full slice is computed either so that the suffix is excluded or the suffix and a number of code units before the suffix is excluded.

Currently, this is done by taking the length of the full slice and subtracting the length of the suffix slice. However, in practice, the suffix slice comes from as_slice()/as_str() on a by-char iterator. The iterator may not actually store the length internally. Whether or not the iterator actually stores the length, it does store its start pointer.

It could be a tiny bit more efficient to compute the prefix length from the pointer distance. Done in the pointer domain, this requires unsafe. By casting the pointers to usize, this can be done in safe code. It's unclear to me if real optimization opportunities are lost by casting away pointerness before subtracting.

@hsivonen hsivonen added A-performance Area: Performance (CPU, Memory) S-small Size: One afternoon (small bug fix or enhancement) C-collator Component: Collation, normalization labels Aug 22, 2022
@hsivonen
Copy link
Member Author

(If this is a win for the &str case, it then makes sense to explore making utf8_iter and utf16_iter store a pointer past the end instead of storing the remaining slice.)

@sffc
Copy link
Member

sffc commented Sep 8, 2022

Is this fixed by #2378?

@hsivonen
Copy link
Member Author

hsivonen commented Sep 9, 2022

Is this fixed by #2378?

No, this is a follow-up for that one.

It's unclear to me if real optimization opportunities are lost by casting away pointerness before subtracting.

@Gankra 's RustConf talk says this is, in principle, an operation that can pessimize other uses of the pointer, so it's probably a bad idea to go this way.

Sadly, it appears there isn't a middle-ground analogous to integer overflow: Pointer distance computation that 1) would not require unsafe to call, 2) wouldn't make the pointers "exposed", 3) wouldn't make it open-ended UB to have mismatched provenance but would instead return the address distance upon provenance mismatch on architectures that can't efficiently abort on provenance mismatch.

In any case, as far as micro optimizations go, what's contemplated here is very micro.

@sffc
Copy link
Member

sffc commented Oct 17, 2022

@hsivonen Can you set an assignee (or "help wanted") and a milestone (or "backlog")?

@hsivonen
Copy link
Member Author

I marked this backlog, but I didn't suggest "help wanted" at this time, because this optimization is so micro that it makes more sense to focus on other perf issues first.

@sffc sffc added this to the Backlog milestone Dec 22, 2022
@sffc sffc removed the backlog label Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-performance Area: Performance (CPU, Memory) C-collator Component: Collation, normalization S-small Size: One afternoon (small bug fix or enhancement)
Projects
None yet
Development

No branches or pull requests

2 participants