Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align rustc_metadata Table to word boundary #76620

Closed

Conversation

ishitatsuyuki
Copy link
Contributor

@ishitatsuyuki ishitatsuyuki commented Sep 12, 2020

This can save a few cycles when converting the bytes to u32. That said, probably very marginal benefits. The motivation is more like, borrowing ideas from FlatBuffers/Cap'n Proto.

I wonder if inserting arbitrary paddings (without communicating them to the decoder) between lazy items are OK? From my understanding it's fine, since lazy items are only referenced by relative/absolute position and does not have to be in a packed layout.

Also note for perf: this will not reduce the instruction count, which is the default metric, since the decoder part uses the same (machine) code.

@rust-highfive
Copy link
Collaborator

r? @ecstatic-morse

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 12, 2020
@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Sep 12, 2020

I'm not familiar with metadata encoding, so

r? @Mark-Simulacrum (or someone who can verify the following assumption)

lazy items are only referenced by relative/absolute position and does not have to be in a packed layout.

I believe recent Intel processors have a hardware counter for unaligned loads, although you'll need to pass an additional flag to perf (it's not captured by default). It would be interesting to see if this change causes a noticeable improvement in that metric, although I'm not sure how noisy it is or how to weight it vs. instruction count/branch mispredictions.

@Mark-Simulacrum
Copy link
Member

I guess we should at least gather perf stats, though that's on a Ryzen 3600. (Instructions should basically not change, but wall-times will still be interesting).

I think probably @eddyb or maybe @petrochenkov know the metadata system best right now.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 12, 2020

⌛ Trying commit e647c6a with merge bf8d56e992aa4e07bacb0236d07c67f1a921bbdb...

@bors
Copy link
Contributor

bors commented Sep 12, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: bf8d56e992aa4e07bacb0236d07c67f1a921bbdb (bf8d56e992aa4e07bacb0236d07c67f1a921bbdb)

@rust-timer
Copy link
Collaborator

Queued bf8d56e992aa4e07bacb0236d07c67f1a921bbdb with parent 7adeb2c, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (bf8d56e992aa4e07bacb0236d07c67f1a921bbdb): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@Mark-Simulacrum
Copy link
Member

Looking at wall times, perf looks mixed, and cpu clock just shows regressions, so I feel like this is probably not really an improvement (or at least not always one), and given that it's pretty finicky code wise I'm inclined to close.

@ishitatsuyuki
Copy link
Contributor Author

I agree that the change is hard to justify without meaningful performance improvement, so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants