-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[opt] Update alloc_refs on the stack to have stack memory kind. #31423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please benchmark |
1 similar comment
@swift-ci please benchmark |
Does anyone know why the benchmark failed? It looks like it didn't actually run any tests and everything compiled OK. I did a clean build/test on two presets locally and couldn't reproduce the error. I'll keep investigating. |
@swift-ci please benchmark |
48097a3
to
aa1dd01
Compare
@@ -92,7 +92,6 @@ AccessedStorage::AccessedStorage(SILValue base, Kind kind) { | |||
value = base; | |||
break; | |||
case Stack: | |||
assert(isa<AllocStackInst>(base)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the assertion here but, it would require duplicating the code in visitClassAccess
and I don't think it's worth it just for the assertion. WDYT?
@swift-ci please benchmark |
1 similar comment
@swift-ci please benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
I think the benchmarks will be improved after #31134 lands. |
Now that #31134 is committed, I'm going to run the benchmarks again. |
@swift-ci please benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -OnoneCode size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@atrick friendly ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this isn't nearly as simple as changing the fundamental abstraction AccessedStorage. That abstraction models the language rules so can't change its meaning.
It's very easy to write incorrect access enforcement optimizations that give you great-looking performance wins. No tests will fail, and no programs will crash. We won't find out that exclusivity isn't being enforced until a whole bunch of code has shipped without enforcement.
It looks like you have several really nice test cases where we want to improve AccessEnforcementOpts. So for each one, we need to make an argument for why the optimizer should be able to recognize it, decide which of the AccessEnforcementOpts should handle it, construct a logical proof for an algorithm that could handle that case while being generally correct. Finally, we can add information to the analyses that feed the optimization to provide extra info.
For example, removeLocalNonNestedAccess should demote access markers to [static] for any locally allocated objects that don't escape. The pass does not currently use EscapeAnalysis, but it could.
Knowing that the object was allocated on the stack will probably not be useful for access enforcement, but knowing that a reference does not escape is very useful.
@@ -534,6 +534,10 @@ static void addClosureSpecializePassPipeline(SILPassPipelinePlan &P) { | |||
static void addLowLevelPassPipeline(SILPassPipelinePlan &P) { | |||
P.startPipeline("LowLevel"); | |||
|
|||
P.addAccessEnforcementOpts(); | |||
P.addAccessEnforcementWMO(); | |||
P.addAccessMarkerElimination(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, when adding very expensive passes to the pipeline, it's good to document why they need to be run at that point so we can improve the pipeline later. What other passes are they dependent on that requires them to be rerun?
With AccessMarkerElimination, ideally we would fix whatever optimization pass can't handle access markers rather than running the elimination pass since a short term goal is supporting access markers throughout the pipeline.
while (isa<StructElementAddrInst>(operand) || | ||
isa<RefElementAddrInst>(operand) || | ||
isa<TupleElementAddrInst>(operand) || isa<LoadInst>(operand)) | ||
operand = operand.getDefiningInstruction()->getOperand(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's happening here with peeking through operands. The UseDefChainVisitor needs to visit each instruction on the "access path" between the address of the memory access and the address of the access to formal storage.
operand = operand.getDefiningInstruction()->getOperand(0); | ||
|
||
if (isa<AllocStackInst>(operand)) | ||
return asImpl().visitBase(field, AccessedStorage::Stack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an access is to a class property, it doesn't matter where the reference to the class is stored. Once you reach the base of the formal access, you've found the AccessStorage. This corresponds to language-level constructs: global variable, class property, local variable.
return asImpl().visitBase(field, AccessedStorage::Stack); | ||
if (auto allocRef = dyn_cast<AllocRefInst>(operand)) { | ||
if (allocRef->isAllocatingStack()) | ||
return asImpl().visitBase(field, AccessedStorage::Stack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class properties have specific semantics at the language level and specific structural properties in SIL. Whether the object is allocated on the heap or stack is irrelevant to any of this. If we have an optimization that can somehow benefit from knowing that an object is stack-allocated, then we can teach that optimization about stack-allocated objects. But we can never pretend that a class property is a local variable.
@atrick thanks for the review. There's a lot of really great information here, thanks for sharing it :) I'm going to close this PR because, as you said, the entire premise relies on an incorrect idea: stack allocations and local variables are synonymous. I've created #32844 which is the start of a more correct implementation of this same optimization. |
Update MemAccessUtils to clasify class allocs on the stack as "AccessedStorage::Kind::Stack" instead of "::Class". This allows class element access markers to be promoted to have static access enforcement.
aa1dd01
to
da23b35
Compare
@swift-ci please benchmark |
1 similar comment
@swift-ci please benchmark |
I'm re-testing after removing |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -OnoneCode size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
The interesting thing shown above is that putting |
@zoecarver yeah, that's pretty clearly a problem of some pass not handling access markers. That should just be fixed independently. |
Update MemAccessUtils to classify class allocs on the stack as "AccessedStorage::Kind::Stack" instead of "::Class". This allows class element access markers to be promoted to have static access enforcement.
Right now we're really bad at optimizing classes. I've made a few patches recently to address some of the issues I've seen in the optimizer around class optimization. They're mostly similar patches (or at least addressing similar problems) and over time they seem to have gotten more and more general. I think this may finally be the right patch.
Fixing this edge case around access enforcement is (hopefully, we'll see what the benchmarks say) a big performance improvement for a relatively small patch. And, more importantly, this feels like the "correct" way to address the problem (rather than storing class members in tuples =P).