[SYCL][NFCI] Unify large-grf splitting with per-aspects split #7512

AlexeySachkov · 2022-11-23T16:11:24Z

The patch removes standalone splitter we had for large-grf and moves large-grf handling into per-aspect splitter.

The change is intended to be non-functional: at most it may affect the order and names of modules produced by sycl-post-link, but not their content.

This is a side effect of updating per-aspects splitter: now hashes are different and modules might be re-ordered from previous implementation. On top of that, due to different "topology" of device code splits, GRF modules now do not share the same ID with ESIMD modules.

AlexeySachkov · 2022-11-23T16:13:02Z

llvm/tools/sycl-post-link/sycl-post-link.cpp

-        DUMP_ENTRY_POINTS(MMs.back().entries(), MMs.back().Name.c_str(), 3);
-        Modified = true;
-      }
+    MDesc.fixupLinkageOfDirectInvokeSimdTargets();


Note for reviewers: this huge chunk of changes is essentially a removal of a while loop with shifting the body to the left for one indentation level

llvm/tools/sycl-post-link/ModuleSplitter.cpp

sarnex

Nice, thanks for doing this

sarnex · 2022-11-28T14:08:48Z

llvm/test/tools/sycl-post-link/device-code-split/per-aspect-split-2.ll

@@ -5,11 +5,11 @@
 ; RUN: FileCheck %s -input-file=%t.table --check-prefix CHECK-TABLE


Should we add a test with aspects and large GRF to lock down the behavior?

The thing is that as we add more features, they will affect existing tests anyway, so I would prefer to leave this as-is to have an isolated test for a piece of the implementation

) #### Intro This is a refactoring of how we perform device code split in `sycl-post-link`, which is intended to solve several existing issues with the current implementation: 1. increased peak RAM consumption by `sycl-post-link` 2. bad scaling with more and more split "dimensions" being added 3. increased tests maintenance cost due to non-deterministic order (between commits) of output files produced by `sycl-post-link` #### A bit more context about the issues above: (1) Increase peak RAM consumption is caused by the fact that we currently preserve **all** splits in-memory, even though we can process them on-by-one and discard them as soon as we stored them to a disk. This was implemented as a memory consumption optimization in #5021, but it got accidentally reverted in #7302 as an attempt to workaround (2). (2) is pretty much summarized in our source code: https://github.com/intel/llvm/blob/afebb2543ccecb89f83c84b68fba7616bbab89ac/llvm/tools/sycl-post-link/sycl-post-link.cpp#L806-L811 (3) is caused by a bad implementation decision made in #7302: because every split is now identified by a hash, every time you add a new split "dimension"/new feature to an account, it results in different hashes for existing tests. Just look how many unrelated tests had to be updated in #7512, #8056 and #8167 #### Now to the PR itself: It introduces a new infrastructure for categorizing/grouping kernel functions: instead of using hashes, we now build a string description for each kernel function and then group kernels with the same description string together. String description is built by a new entity: it accepts a set of rules, where each rule is a simple function which returns a string for passed `llvm::Function`. Results of all rules are concatenated together and rules are invoked in a stable order of their registration. There is a simple API for building those rules. It provides some predefined rules for the most popular use cases like turning a function attribute or a metadata into a string descriptor for the function. There is also a possibility to pass a custom callback there to implement more complicated logic. #### How does this PR help with issues above? (1) and (2) are fixed in conjunction: `sycl-post-link` was refactored to avoid storing more than one split module at a time and that is possible because the PR unifies per-scope and optional-kernel-features splitters into a single generic splitter. The new API for kernels categorization seems to be flexible enough to provide that infrastructure so merged splitters still look OK code-wise. (3) is caused by using string identifiers instead of hashes as well as by using a data structure which sorts identifiers. #### Any other benefits from this PR? About 50 lines of code less to support :) Extending device code split for more optional features would be even easier than it is now: instead of adding several changes to various places around `UsedOptionalFeatures` structure, it will be just adding a 1-3 lines of code. Please also note that `UsedOptionalFeatures` contains tons of inconsistencies in its implementation, which will all gone with this PR: in `operator==` we don't use hash and instead compare certain fields directly (and we do miss some of them); `generateModuleName` method skips some of optional features and ignores them. Cross-module `device_global` usages checks should now work at all split dimensions (except for ESIMD). #### Any potential downsides? With current `UsedOptionalFeatures` there is a possibility to embed various information (used aspects, `large-grf` flag, etc.) directly during device code split to avoid re-gathering that information later when we generate properties. With the suggested approach, it would be harder to do, because it doesn't seem to naturally fit to the proposed infrastructure: see changes I did around `large-grf` in this PR. However, we have never actually implemented this and re-querying some metadata from function doesn't seem like a bottleneck, so it should really be a very minor and only theoretical downside.

AlexeySachkov added 6 commits November 23, 2022 10:41

Add large-grf attribute into per-aspects splitter

4e1a7a3

Stop using largeGRF splitter in sycl-post-link

870398f

Propagate info about large-grf usage from per-aspect splitter

5699462

Remove LargeGRF splitter, because it is not used anymore

9722e9f

Remove groupEntryPointsByAttribute because it is not used anymore

15c7398

AlexeySachkov commented Nov 23, 2022

View reviewed changes

Fix test

4539278

AlexeySachkov commented Nov 24, 2022

View reviewed changes

llvm/tools/sycl-post-link/ModuleSplitter.cpp Outdated Show resolved Hide resolved

AlexeySachkov marked this pull request as ready for review November 24, 2022 10:06

AlexeySachkov requested a review from a team as a code owner November 24, 2022 10:06

AlexeySachkov requested review from kbobrovs and a team November 24, 2022 10:07

[NFC] Use less auto, be more verbose with types

722fdd8

sarnex approved these changes Nov 28, 2022

View reviewed changes

kbobrovs approved these changes Nov 28, 2022

View reviewed changes

AlexeySachkov merged commit 675148c into intel:sycl Nov 29, 2022

AlexeySachkov mentioned this pull request Mar 28, 2023

[SYCL][NFCI] Refactor device code split implementation once again #8833

Merged

AlexeySachkov deleted the private/asachkov/unify-large-grf-splitter-with-aspects-split branch March 29, 2023 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][NFCI] Unify large-grf splitting with per-aspects split #7512

[SYCL][NFCI] Unify large-grf splitting with per-aspects split #7512

Uh oh!

AlexeySachkov commented Nov 23, 2022

Uh oh!

AlexeySachkov Nov 23, 2022

Uh oh!

Uh oh!

sarnex left a comment

Uh oh!

sarnex Nov 28, 2022

Uh oh!

AlexeySachkov Nov 29, 2022

Uh oh!

Uh oh!

		@@ -5,11 +5,11 @@
		; RUN: FileCheck %s -input-file=%t.table --check-prefix CHECK-TABLE

[SYCL][NFCI] Unify large-grf splitting with per-aspects split #7512

[SYCL][NFCI] Unify large-grf splitting with per-aspects split #7512

Uh oh!

Conversation

AlexeySachkov commented Nov 23, 2022

Uh oh!

AlexeySachkov Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sarnex left a comment

Choose a reason for hiding this comment

Uh oh!

sarnex Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!