-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[ownership] But just for macOS #35958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ownership] But just for macOS #35958
Conversation
… ossa/non-ossa SIL. In SILCombine, we do not want to add or delete edges. We are ok with swapping edges or replacing edges when the CFG structure is preserved. This becomes an issue since by performing this optimization, we are going to get rid of the error parameter but leave a try_apply, breaking SIL invariants. So to do perform this optimization, we would need to convert to an apply and eliminate the error edge, breaking the aforementioned SILCombine invariant. So, just do not perform this for now and leave it to other passes like SimplifyCFG.
…ths where there is dynamically no value by inserting compensating destroys. This commit is fixing two things: 1. In certain cases, we are seeing cases where either SILGen or the optimizer are eliminating destroy_addr along paths where we know that an enum is dynamically trivial. This can not be expressed in OSSA, so I added code to pred-deadalloc-elim so that I check if any of our available values after we finish promoting away an allocation now need to have their consuming use set completed. 2. That led me to discover that in certain cases load [take] that we were promoting were available values of other load [take]. This means that we have a memory safety issue if we promote one load before the other. Consider the following SIL: ``` %mem = alloc_stack store %arg to [init] %mem %0 = load [take] %mem store %0 to [init] %mem %1 = load [take] %mem destroy_value %1 dealloc_stack %mem ``` In this case, if we eliminate %0 before we eliminate %1, we will have a stale pointer to %0. I also took this as an opportunity to turn off predictable mem access opt on SIL that was deserialized canonicalized and non-OSSA SIL. We evidently need to still do this for pred mem opts for perf reasons (not sure why). But I am pretty sure this isn't needed and allows me to avoid some nasty code.
…complete available value when cleaning up takes.
While looking at the performance of the verifier running with -sil-verify-all on the stdlib, I noticed that we are spending ~30% of the total time in the verifier performing this check. Introducing the cache mitigates this issue. I believe the reason is that we were walking for each operand the use list of its associated value which I think is quadratic.
…br that do not involve objects directly. Just to reduce the size of the CFG.
…dress through the phi using a RawPointer. In OSSA, we do not allow for address phis, but in certain cases the logic of LoopRotate really wants them. To work around this issue, I added some code in this PR to loop rotate that as a post-pass fixes up any address phis by inserting address <-> raw pointer adapters and changing the address phi to instead be of raw pointer type.
…that have at least one enum tuple sub-elt. Just until MemoryLifetime can handle enums completely.
…passes. This eliminates some regressions by eliminating phase ordering in between ARCSequenceOpts/inlining with read only functions whose read onlyness is lost after inlining.
LLVM seems to not support this today. With ownership SSA, we now produce these for accelerate for some reason, causing lldb/TestAccelerateSIMD.py to fail. I debugged this with Adrian and we got this fix. I filed the radar below against Adrian to fix. rdar://74287800
I am doing it this way so that I can: 1. There is some sort of ASAN issue that this exposes on Linux, so I am going to do this on Darwin and then debug the Linux issue using ASAN over the weekend/next week. 2. We are seeing I think large CFGs on 32-bit platforms since we are not simplifying some sort of pattern. I don't have time to figure that out right now, so I am disabling it on non-macOS platforms so we can get something in.
@swift-ci smoke test |
I did the same thing I did with Linux, namely that I made it so that macOS is the only platform that the stdlib gets the special treatment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was little fuzzy on some of the PredictableMemOps logic, but overall everything looks ready to check in, with minor follow-up comments
@@ -1709,10 +1709,6 @@ SILInstruction *SILCombiner::visitTryApplyInst(TryApplyInst *AI) { | |||
if (isa<PartialApplyInst>(AI->getCallee())) | |||
return nullptr; | |||
|
|||
if (auto *CFI = dyn_cast<ConvertFunctionInst>(AI->getCallee())) { | |||
return optimizeApplyOfConvertFunctionInst(AI, CFI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you keeping track of deleted optimizations that we expect to be reimplemented in a different pass later?
@@ -2665,6 +2665,8 @@ bool SimplifyCFG::simplifyBlocks() { | |||
for (auto &BB : Fn) | |||
addToWorklist(&BB); | |||
|
|||
bool hasOwnership = Fn.hasOwnership(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, I have a different version of enabling simplify-cfg for trivial cases, where most of the optimizations will kick in most of the time. Just working through the test cases.
@@ -2265,6 +2265,12 @@ bool AllocOptimize::canPromoteTake( | |||
if (!agg.canTake(loadTy, firstElt)) | |||
return false; | |||
|
|||
// As a final check, make sure that we have an available value for each value, | |||
// if not bail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: what happens if you don't do this check?
// block, before the terminator. | ||
auto loc = RegularLocation::getAutoGeneratedLocation(); | ||
SILBuilderWithScope localBuilder(termInst); | ||
localBuilder.createDestroyValue(loc, v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just SILBuilderWithScope(termInst).createDestroyValue(loc, v)
consumingUseBlocks.push_back(block); | ||
} | ||
} | ||
findJointPostDominatingSet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, the whole point of ValueLifetimeAnalysis is to do what you're doing manually here. That utility has been corrupted and is broken, so you shouldn't use it as-is, but I have a fixed, pure version on a branch that just does what you're doing manually and I'll put it up ASAP. Here you're putting destroys in successor blocks when they don't need to be, which seems like a bad idea but maybe was expedient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
I am actually doing the full thing here: |
I think we may be missing a pattern and thus are hitting large CFGs in i386. Just in case we run into a timeout issue, in this PR, I am just doing macOS.