Skip to content

Use zero_mem instead of a zerointializer for init intrinsic #21282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 19, 2015

Conversation

Aatch
Copy link
Contributor

@Aatch Aatch commented Jan 17, 2015

LLVM gets overwhelmed when presented with a zeroinitializer for a large
type. In unoptimised builds, it generates a long sequence of stores to
memory. In optmised builds, it manages to generate a standard memset of
zero values, but takes a long time doing so.

Call out to the llvm.memset function to zero out the memory instead.

Fixes #21264

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see CONTRIBUTING.md for more information.

@alexcrichton
Copy link
Member

Nice! Can you add the test from #21264 as well to make sure we don't regress on the performance again? It should hopefully time out the test suite or set off some alarm if it goes back to the way it was.

It may also be worth leaving a comment why we're using zero_mem for everything (including small values).

As a final bit, I'm not sure how much this matters, but non-optimized builds which use mem::zeroed for a small scalar get translated to an immediate constant 0, but this causes them to allocate a stack slot, call memset, and then load it. I don't think the mem::zeroed intrinsic is too close to a hot loop though, so this probably doesn't matter too much.

@Aatch
Copy link
Contributor Author

Aatch commented Jan 17, 2015

@alexcrichton I'll add the test case.

However, non-optimised builds didn't get translated as immediate constant 0 directly. Instead it would create the stack slot, store a constant 0 and then load the value from that slot. This just changes that middle step to a call to memset instead. This might be slightly slower in the case of small immediate, but is much, much faster for large values (which are also more likely to be used with mem::zeroed)

pub fn init<T>() -> T;
}

const SIZE: usize = 512 * 1024 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1024 * 1024 is enough to reliably reproduce the issue. No need to make this test require 512MiB of RAM to run?

@alexcrichton
Copy link
Member

r=me with the size decrease @nagisa mentioned

Aatch added 3 commits January 19, 2015 09:21
LLVM gets overwhelmed when presented with a zeroinitializer for a large
type. In unoptimised builds, it generates a long sequence of stores to
memory. In optmised builds, it manages to generate a standard memset of
zero values, but takes a long time doing so.

Call out to the `llvm.memset` function to zero out the memory instead.
@alexcrichton
Copy link
Member

@bors: r+ 25a4adc

@bors
Copy link
Collaborator

bors commented Jan 19, 2015

⌛ Testing commit 25a4adc with merge dd8f887...

bors added a commit that referenced this pull request Jan 19, 2015
LLVM gets overwhelmed when presented with a zeroinitializer for a large
type. In unoptimised builds, it generates a long sequence of stores to
memory. In optmised builds, it manages to generate a standard memset of
zero values, but takes a long time doing so.

Call out to the `llvm.memset` function to zero out the memory instead.

Fixes #21264
@bors
Copy link
Collaborator

bors commented Jan 19, 2015

💔 Test failed - auto-mac-64-nopt-t

@alexcrichton
Copy link
Member

@bors: retry

bors added a commit that referenced this pull request Jan 19, 2015
LLVM gets overwhelmed when presented with a zeroinitializer for a large
type. In unoptimised builds, it generates a long sequence of stores to
memory. In optmised builds, it manages to generate a standard memset of
zero values, but takes a long time doing so.

Call out to the `llvm.memset` function to zero out the memory instead.

Fixes #21264
@bors
Copy link
Collaborator

bors commented Jan 19, 2015

⌛ Testing commit 25a4adc with merge 43f2c19...

@bors
Copy link
Collaborator

bors commented Jan 19, 2015

@bors bors merged commit 25a4adc into rust-lang:master Jan 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mem::zeroed unrolls hilariously enormous loops
5 participants