-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[CodeGen] Adjust global-split remat heuristic to match LICM #160709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This heuristic was originally added in 40c4aa with the stated purpose of avoiding global split on live long ranges created by MachineLICM hoisting trivially rematerializable instructions. In the meantime, various backends have introduced non-trivial rematerialization cases, MachineLICM gained an explicitly triviality check, and we've reworked our APIs to match naming wise. Let's move this heuristic back to truely trivial remat only. This is a functional change, though somewhat hard to hit. This change will cause non-trivially rematerializable instructions to be globally split more often. This is likely a good thing since non-trivial remat may not be legal at all possible points in the live interval, but may cost slightly more compile time. I don't have a motivating example; I found it when reviewing the callers of isRemMaterializable(MI).
|
@lukel97 You appear to have good infrastructure for looking at spill/fill impacts, would you mind sanity checking this one just to make sure there's no unexpected negative interactions? Not expecting any, but well, RegAlloc is famous for the unexpected. |
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested and there were no codegen/spills/fills changes on llvm test suite or SPEC CPU 2017 on rva23u64 -O3.
) This heuristic was originally added in 40c4aa with the stated purpose of avoiding global split on live long ranges created by MachineLICM hoisting trivially rematerializable instructions. In the meantime, various backends have introduced non-trivial rematerialization cases, MachineLICM gained an explicitly triviality check, and we've reworked our APIs to match naming wise. Let's move this heuristic back to truely trivial remat only. This is a functional change, though somewhat hard to hit. This change will cause non-trivially rematerializable instructions to be globally split more often. This is likely a good thing since non-trivial remat may not be legal at all possible points in the live interval, but may cost slightly more compile time. I don't have a motivating example; I found it when reviewing the callers of isRemMaterializable(MI).
#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%
…erialization (#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However llvm/llvm-project#160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and llvm/llvm-project#160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%
llvm#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However llvm#160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and llvm#160709 and llvm#159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%
This heuristic was originally added in 40c4aa with the stated purpose of avoiding global split on live long ranges created by MachineLICM hoisting trivially rematerializable instructions. In the meantime, various backends have introduced non-trivial rematerialization cases, MachineLICM gained an explicitly triviality check, and we've reworked our APIs to match naming wise. Let's move this heuristic back to truely trivial remat only.
This is a functional change, though somewhat hard to hit. This change will cause non-trivially rematerializable instructions to be globally split more often. This is likely a good thing since non-trivial remat may not be legal at all possible points in the live interval, but may cost slightly more compile time.
I don't have a motivating example; I found it when reviewing the callers of isRemMaterializable(MI).