Skip to content

[ownership] But just for macOS #35958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gottesmm
Copy link
Contributor

I think we may be missing a pattern and thus are hitting large CFGs in i386. Just in case we run into a timeout issue, in this PR, I am just doing macOS.

gottesmm and others added 11 commits February 12, 2021 12:49
… ossa/non-ossa SIL.

In SILCombine, we do not want to add or delete edges. We are ok with swapping
edges or replacing edges when the CFG structure is preserved. This becomes an
issue since by performing this optimization, we are going to get rid of the
error parameter but leave a try_apply, breaking SIL invariants. So to do perform
this optimization, we would need to convert to an apply and eliminate the error
edge, breaking the aforementioned SILCombine invariant. So, just do not perform
this for now and leave it to other passes like SimplifyCFG.
…ths where there is dynamically no value by inserting compensating destroys.

This commit is fixing two things:

1. In certain cases, we are seeing cases where either SILGen or the optimizer
are eliminating destroy_addr along paths where we know that an enum is
dynamically trivial. This can not be expressed in OSSA, so I added code to
pred-deadalloc-elim so that I check if any of our available values after we
finish promoting away an allocation now need to have their consuming use set
completed.

2. That led me to discover that in certain cases load [take] that we were
promoting were available values of other load [take]. This means that we have a
memory safety issue if we promote one load before the other. Consider the
following SIL:

```
%mem = alloc_stack
store %arg to [init] %mem
%0 = load [take] %mem
store %0 to [init] %mem
%1 = load [take] %mem
destroy_value %1
dealloc_stack %mem
```

In this case, if we eliminate %0 before we eliminate %1, we will have a stale
pointer to %0.

I also took this as an opportunity to turn off predictable mem access opt on SIL
that was deserialized canonicalized and non-OSSA SIL. We evidently need to still
do this for pred mem opts for perf reasons (not sure why). But I am pretty sure
this isn't needed and allows me to avoid some nasty code.
…complete available value when cleaning up takes.
While looking at the performance of the verifier running with -sil-verify-all on
the stdlib, I noticed that we are spending ~30% of the total time in the
verifier performing this check. Introducing the cache mitigates this issue.

I believe the reason is that we were walking for each operand the use list of
its associated value which I think is quadratic.
…br that do not involve objects directly.

Just to reduce the size of the CFG.
…dress through the phi using a RawPointer.

In OSSA, we do not allow for address phis, but in certain cases the logic of
LoopRotate really wants them. To work around this issue, I added some code in
this PR to loop rotate that as a post-pass fixes up any address phis by
inserting address <-> raw pointer adapters and changing the address phi to
instead be of raw pointer type.
…that have at least one enum tuple sub-elt.

Just until MemoryLifetime can handle enums completely.
…passes.

This eliminates some regressions by eliminating phase ordering in between
ARCSequenceOpts/inlining with read only functions whose read onlyness is lost
after inlining.
LLVM seems to not support this today. With ownership SSA, we now produce these
for accelerate for some reason, causing lldb/TestAccelerateSIMD.py to fail. I
debugged this with Adrian and we got this fix. I filed the radar below against
Adrian to fix.

rdar://74287800
I am doing it this way so that I can:

1. There is some sort of ASAN issue that this exposes on Linux, so I am going to do
this on Darwin and then debug the Linux issue using ASAN over the weekend/next
week.

2. We are seeing I think large CFGs on 32-bit platforms since we are not
simplifying some sort of pattern. I don't have time to figure that out right
now, so I am disabling it on non-macOS platforms so we can get something in.
@gottesmm gottesmm requested a review from atrick February 13, 2021 00:23
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

I did the same thing I did with Linux, namely that I made it so that macOS is the only platform that the stdlib gets the special treatment.

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was little fuzzy on some of the PredictableMemOps logic, but overall everything looks ready to check in, with minor follow-up comments

@@ -1709,10 +1709,6 @@ SILInstruction *SILCombiner::visitTryApplyInst(TryApplyInst *AI) {
if (isa<PartialApplyInst>(AI->getCallee()))
return nullptr;

if (auto *CFI = dyn_cast<ConvertFunctionInst>(AI->getCallee())) {
return optimizeApplyOfConvertFunctionInst(AI, CFI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you keeping track of deleted optimizations that we expect to be reimplemented in a different pass later?

@@ -2665,6 +2665,8 @@ bool SimplifyCFG::simplifyBlocks() {
for (auto &BB : Fn)
addToWorklist(&BB);

bool hasOwnership = Fn.hasOwnership();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, I have a different version of enabling simplify-cfg for trivial cases, where most of the optimizations will kick in most of the time. Just working through the test cases.

@@ -2265,6 +2265,12 @@ bool AllocOptimize::canPromoteTake(
if (!agg.canTake(loadTy, firstElt))
return false;

// As a final check, make sure that we have an available value for each value,
// if not bail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: what happens if you don't do this check?

// block, before the terminator.
auto loc = RegularLocation::getAutoGeneratedLocation();
SILBuilderWithScope localBuilder(termInst);
localBuilder.createDestroyValue(loc, v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just SILBuilderWithScope(termInst).createDestroyValue(loc, v)

consumingUseBlocks.push_back(block);
}
}
findJointPostDominatingSet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, the whole point of ValueLifetimeAnalysis is to do what you're doing manually here. That utility has been corrupted and is broken, so you shouldn't use it as-is, but I have a fixed, pure version on a branch that just does what you're doing manually and I'll put it up ASAP. Here you're putting destroys in successor blocks when they don't need to be, which seems like a bad idea but maybe was expedient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@gottesmm
Copy link
Contributor Author

I am actually doing the full thing here:

#35872.

@gottesmm gottesmm closed this Feb 13, 2021
@gottesmm gottesmm deleted the ossa-serialize-then-lower-ownership-macos-only branch February 13, 2021 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants