[opt] Update alloc_refs on the stack to have stack memory kind. #31423

zoecarver · 2020-04-29T22:48:55Z

Update MemAccessUtils to classify class allocs on the stack as "AccessedStorage::Kind::Stack" instead of "::Class". This allows class element access markers to be promoted to have static access enforcement.

Right now we're really bad at optimizing classes. I've made a few patches recently to address some of the issues I've seen in the optimizer around class optimization. They're mostly similar patches (or at least addressing similar problems) and over time they seem to have gotten more and more general. I think this may finally be the right patch.

Fixing this edge case around access enforcement is (hopefully, we'll see what the benchmarks say) a big performance improvement for a relatively small patch. And, more importantly, this feels like the "correct" way to address the problem (rather than storing class members in tuples =P).

zoecarver · 2020-04-29T22:49:17Z

@swift-ci please benchmark

zoecarver · 2020-04-29T23:54:26Z

@swift-ci please benchmark

zoecarver · 2020-04-30T04:19:23Z

Does anyone know why the benchmark failed? It looks like it didn't actually run any tests and everything compiled OK. I did a clean build/test on two presets locally and couldn't reproduce the error. I'll keep investigating.

zoecarver · 2020-04-30T16:32:14Z

@swift-ci please benchmark

zoecarver · 2020-05-01T04:42:35Z

lib/SIL/Utils/MemAccessUtils.cpp

@@ -92,7 +92,6 @@ AccessedStorage::AccessedStorage(SILValue base, Kind kind) {
    value = base;
    break;
  case Stack:
-    assert(isa<AllocStackInst>(base));


I'd like to keep the assertion here but, it would require duplicating the code in visitClassAccess and I don't think it's worth it just for the assertion. WDYT?

zoecarver · 2020-05-01T04:46:02Z

@swift-ci please benchmark

zoecarver · 2020-05-01T04:46:19Z

@swift-ci please benchmark

swift-ci · 2020-05-01T05:17:17Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	4183	6276	+50.0%	0.67x (?)
RecursiveOwnedParameter	55	65	+18.2%	0.85x (?)
FlattenListLoop	2865	3322	+16.0%	0.86x (?)
ProtocolDispatch	196	217	+10.7%	0.90x (?)
PointerArithmetics	21700	23900	+10.1%	0.91x (?)
DataAppendBytesMedium	2980	3220	+8.1%	0.93x (?)
DropFirstArrayLazy	13	14	+7.7%	0.93x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
RecursiveOwnedParameter.o	2270	2318	+2.1%	0.98x
ObjectAllocation.o	4571	4667	+2.1%	0.98x
NopDeinit.o	4007	4071	+1.6%	0.98x

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
ClassArrayGetter2	1260	80	-93.7%	15.75x
MapReduceClass2	148	29	-80.4%	5.10x
MapReduceLazyCollectionShort	67	36	-46.3%	1.86x (?)
MapReduceClassShort2	268	145	-45.9%	1.85x (?)
MapReduceString	48	42	-12.5%	1.14x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
DictTest4.o	14040	14184	+1.0%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
DictionaryBridge.o	3595	3491	-2.9%	1.03x
LinkedList.o	2545	2473	-2.8%	1.03x
ArrayOfGenericRef.o	9246	9038	-2.2%	1.02x
ArrayOfRef.o	9587	9379	-2.2%	1.02x
OpaqueConsumingUsers.o	2630	2582	-1.8%	1.02x
RGBHistogram.o	20112	19808	-1.5%	1.02x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubNSDateRefAccess	3822	4498	+17.7%	0.85x (?)
DataAppendDataSmallToSmall	2860	3080	+7.7%	0.93x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

zoecarver · 2020-05-01T19:28:50Z

I think the benchmarks will be improved after #31134 lands.

zoecarver · 2020-05-08T18:09:00Z

Now that #31134 is committed, I'm going to run the benchmarks again.

zoecarver · 2020-05-08T18:09:09Z

@swift-ci please benchmark

swift-ci · 2020-05-08T19:07:08Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	7942	9875	+24.3%	0.80x (?)
FlattenListLoop	4570	5291	+15.8%	0.86x (?)
ObjectiveCBridgeStubToNSDate2	550	620	+12.7%	0.89x (?)
DataAppendDataSmallToSmall	4180	4660	+11.5%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObserverForwarderStruct	1030	905	-12.1%	1.14x (?)
String.data.LargeUnicode	136	121	-11.0%	1.12x (?)
DictionarySubscriptDefaultMutationOfObjects	1680	1500	-10.7%	1.12x (?)
ObserverUnappliedMethod	1300	1180	-9.2%	1.10x (?)
String.data.Medium	121	112	-7.4%	1.08x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
RecursiveOwnedParameter.o	2270	2318	+2.1%	0.98x
ObjectAllocation.o	4571	4667	+2.1%	0.98x
NopDeinit.o	4007	4071	+1.6%	0.98x

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
ClassArrayGetter2	1820	130	-92.9%	14.00x
MapReduceClass2	211	52	-75.4%	4.06x
MapReduceClassShort2	374	223	-40.4%	1.68x
ObjectiveCBridgeStubToNSDate2	630	550	-12.7%	1.15x (?)
DictionarySubscriptDefaultMutationOfObjects	1720	1540	-10.5%	1.12x (?)
FlattenListFlatMap	7611	6927	-9.0%	1.10x (?)
SortAdjacentIntPyramids	1695	1575	-7.1%	1.08x

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
DictTest4.o	14040	14184	+1.0%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
DictionaryBridge.o	3595	3491	-2.9%	1.03x
LinkedList.o	2545	2473	-2.8%	1.03x
ArrayOfGenericRef.o	9246	9038	-2.2%	1.02x
ArrayOfRef.o	9587	9379	-2.2%	1.02x
OpaqueConsumingUsers.o	2630	2582	-1.8%	1.02x
RGBHistogram.o	20112	19808	-1.5%	1.02x

Performance: -Onone

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

zoecarver · 2020-07-08T18:41:32Z

@atrick friendly ping :)

atrick

I'm afraid this isn't nearly as simple as changing the fundamental abstraction AccessedStorage. That abstraction models the language rules so can't change its meaning.

It's very easy to write incorrect access enforcement optimizations that give you great-looking performance wins. No tests will fail, and no programs will crash. We won't find out that exclusivity isn't being enforced until a whole bunch of code has shipped without enforcement.

It looks like you have several really nice test cases where we want to improve AccessEnforcementOpts. So for each one, we need to make an argument for why the optimizer should be able to recognize it, decide which of the AccessEnforcementOpts should handle it, construct a logical proof for an algorithm that could handle that case while being generally correct. Finally, we can add information to the analyses that feed the optimization to provide extra info.

For example, removeLocalNonNestedAccess should demote access markers to [static] for any locally allocated objects that don't escape. The pass does not currently use EscapeAnalysis, but it could.

Knowing that the object was allocated on the stack will probably not be useful for access enforcement, but knowing that a reference does not escape is very useful.

atrick · 2020-07-08T19:19:23Z

lib/SILOptimizer/PassManager/PassPipeline.cpp

@@ -534,6 +534,10 @@ static void addClosureSpecializePassPipeline(SILPassPipelinePlan &P) {
 static void addLowLevelPassPipeline(SILPassPipelinePlan &P) {
  P.startPipeline("LowLevel");

+  P.addAccessEnforcementOpts();
+  P.addAccessEnforcementWMO();
+  P.addAccessMarkerElimination();


Generally, when adding very expensive passes to the pipeline, it's good to document why they need to be run at that point so we can improve the pipeline later. What other passes are they dependent on that requires them to be rerun?

With AccessMarkerElimination, ideally we would fix whatever optimization pass can't handle access markers rather than running the elimination pass since a short term goal is supporting access markers throughout the pipeline.

atrick · 2020-07-08T19:27:57Z

lib/SIL/Utils/MemAccessUtils.cpp

+  while (isa<StructElementAddrInst>(operand) ||
+         isa<RefElementAddrInst>(operand) ||
+         isa<TupleElementAddrInst>(operand) || isa<LoadInst>(operand))
+    operand = operand.getDefiningInstruction()->getOperand(0);


I'm not sure what's happening here with peeking through operands. The UseDefChainVisitor needs to visit each instruction on the "access path" between the address of the memory access and the address of the access to formal storage.

atrick · 2020-07-08T19:31:03Z

lib/SIL/Utils/MemAccessUtils.cpp

+    operand = operand.getDefiningInstruction()->getOperand(0);
+
+  if (isa<AllocStackInst>(operand))
+    return asImpl().visitBase(field, AccessedStorage::Stack);


If an access is to a class property, it doesn't matter where the reference to the class is stored. Once you reach the base of the formal access, you've found the AccessStorage. This corresponds to language-level constructs: global variable, class property, local variable.

atrick · 2020-07-08T19:33:25Z

lib/SIL/Utils/MemAccessUtils.cpp

+    return asImpl().visitBase(field, AccessedStorage::Stack);
+  if (auto allocRef = dyn_cast<AllocRefInst>(operand)) {
+    if (allocRef->isAllocatingStack())
+      return asImpl().visitBase(field, AccessedStorage::Stack);


Class properties have specific semantics at the language level and specific structural properties in SIL. Whether the object is allocated on the heap or stack is irrelevant to any of this. If we have an optimization that can somehow benefit from knowing that an object is stack-allocated, then we can teach that optimization about stack-allocated objects. But we can never pretend that a class property is a local variable.

zoecarver · 2020-07-12T17:27:23Z

@atrick thanks for the review. There's a lot of really great information here, thanks for sharing it :)

I'm going to close this PR because, as you said, the entire premise relies on an incorrect idea: stack allocations and local variables are synonymous. I've created #32844 which is the start of a more correct implementation of this same optimization.

Update MemAccessUtils to clasify class allocs on the stack as "AccessedStorage::Kind::Stack" instead of "::Class". This allows class element access markers to be promoted to have static access enforcement.

zoecarver · 2020-07-15T03:54:03Z

@swift-ci please benchmark

zoecarver · 2020-07-15T03:54:14Z

@swift-ci please benchmark

zoecarver · 2020-07-15T03:54:59Z

I'm re-testing after removing AccessMarkerElimination to see if that was the reason for the performance improvements.

swift-ci · 2020-07-15T04:24:36Z

Performance: -O

Improvement	OLD	NEW	DELTA	RATIO
RecursiveOwnedParameter	66	54	-18.2%	1.22x
ObjectiveCBridgeStubToNSDate2	360	330	-8.3%	1.09x (?)
RemoveWhereMoveInts	12	11	-8.3%	1.09x (?)
ObserverUnappliedMethod	700	650	-7.1%	1.08x (?)
ObjectiveCBridgeStubToNSDateRef	2320	2160	-6.9%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
RecursiveOwnedParameter.o	2150	2198	+2.2%	0.98x
ObjectAllocation.o	4451	4547	+2.2%	0.98x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	928	1019	+9.8%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	4354	3678	-15.5%	1.18x (?)
Dictionary4	225	207	-8.0%	1.09x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
ObjectAllocation.o	4298	4378	+1.9%	0.98x
DictTest4.o	13600	13760	+1.2%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
LinkedList.o	2257	2200	-2.5%	1.03x
ArrayOfGenericRef.o	8304	8096	-2.5%	1.03x
ArrayOfRef.o	8581	8389	-2.2%	1.02x
OpaqueConsumingUsers.o	2351	2303	-2.0%	1.02x
RGBHistogram.o	19552	19232	-1.6%	1.02x
DictionaryBridge.o	3309	3269	-1.2%	1.01x

Performance: -Onone

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

zoecarver · 2020-07-15T15:09:17Z

The interesting thing shown above is that putting AccessMarkerElimination in LowLevel,Function seems to be the cause of the performance improvements.

atrick · 2020-07-15T17:52:23Z

@zoecarver yeah, that's pretty clearly a problem of some pass not handling access markers. That should just be fixed independently.

zoecarver force-pushed the opt/static-access-marker branch from 48097a3 to aa1dd01 Compare May 1, 2020 04:41

zoecarver commented May 1, 2020

View reviewed changes

zoecarver mentioned this pull request May 7, 2020

[LVA] Update ignore instructions RLE and DSE. #31134

Merged

gottesmm requested a review from atrick May 8, 2020 23:32

atrick requested changes Jul 8, 2020

View reviewed changes

zoecarver mentioned this pull request Jul 12, 2020

[SIL] Add a "unique" flag to alloc_ref instructions. #32844

Closed

zoecarver closed this Jul 12, 2020

zoecarver mentioned this pull request Jul 14, 2020

[NO-MERGE] Add access enforcement opts and access marker elimination to addHighLevelModulePipeline. #32879

Closed

zoecarver added 2 commits July 14, 2020 20:52

[opt] Update alloc_refs on the stack to have stack memory kind.

599c821

Update MemAccessUtils to clasify class allocs on the stack as "AccessedStorage::Kind::Stack" instead of "::Class". This allows class element access markers to be promoted to have static access enforcement.

Remove AccessMarkerElimination from LowLevel,Function pipeline

da23b35

zoecarver reopened this Jul 15, 2020

zoecarver force-pushed the opt/static-access-marker branch from aa1dd01 to da23b35 Compare July 15, 2020 03:53

zoecarver closed this Jul 15, 2020

[opt] Update alloc_refs on the stack to have stack memory kind. #31423

[opt] Update alloc_refs on the stack to have stack memory kind. #31423

Uh oh!

Conversation

zoecarver commented Apr 29, 2020

Uh oh!

zoecarver commented Apr 29, 2020

Uh oh!

zoecarver commented Apr 29, 2020

Uh oh!

zoecarver commented Apr 30, 2020

Uh oh!

zoecarver commented Apr 30, 2020

Uh oh!

zoecarver May 1, 2020

Choose a reason for hiding this comment

Uh oh!

zoecarver commented May 1, 2020

Uh oh!

zoecarver commented May 1, 2020

Uh oh!

swift-ci commented May 1, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

zoecarver commented May 1, 2020

Uh oh!

zoecarver commented May 8, 2020

Uh oh!

zoecarver commented May 8, 2020

Uh oh!

swift-ci commented May 8, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

zoecarver commented Jul 8, 2020

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

atrick Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

atrick Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

atrick Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

atrick Jul 8, 2020

Choose a reason for hiding this comment

Uh oh!

zoecarver commented Jul 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoecarver commented Jul 15, 2020

Uh oh!

zoecarver commented Jul 15, 2020

Uh oh!

zoecarver commented Jul 15, 2020

Uh oh!

swift-ci commented Jul 15, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

zoecarver commented Jul 15, 2020

Uh oh!

atrick commented Jul 15, 2020

zoecarver commented Jul 12, 2020 •

edited

Loading