Skip to content

[SYCL] Enhance device code split call graph analysis #8589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexeySachkov
Copy link
Contributor

@AlexeySachkov AlexeySachkov commented Mar 9, 2023

This patch introduces significant changes to how device code split detects functions and global variables which should be included into a cloned module.

There are two main changes done to that:

  1. analysis algorithm now traces uses of global variables to allow adding all globals into every split module
  2. analysis algorithm now traces indirect calls, trying to define a list of all functions which are potentially called indirectly to avoid the need to disabled device code split completely in presence of indirect calls

Both things are implemented through new DependencyGraph entity, which replaces CallGraph entity we used.
Instead of calls, that new graph is built over uses of functions and variables to understand which functions and global variables are used by which functions and global variables.

The most tricky part here is indirect calls: we can't understand which exact function is being called by an indirect call. However, we can compile a list of potentially-called function by comparing function signatures with signature of an indirect call.

On top of that, ESIMD handling is refactored by this patch:

  • outlined ESIMD-specific handling into a separate function
  • created new ESIMD-specific device code split helper

New ESIMD-specific device code split helper is needed, because we should use different rules for ESIMD and non-ESIMD parts of a module when splitting it to two. For ESIMD part we want to grab all ESIMD-functions even if they were not considered as entry points in the original module. For non-ESIMD part we don't want to grab any ESIMD-functions, even if they are referenced/used by non-ESIMD functions.

Both of those special rules come from invoke_simd feature support: non-ESIMD kernel can indirectly reference a ESIMD function. Since those different kind of functions require different processing, we have to completely separate them before processing step. Non-ESIMD module could be incomplete as a result of such split, but it will be merged back with ESIMD module after ESIMD lowering. That merge step is required for invoke_simd functionality.

Co-Authored-By: Cai, Justin justin.cai@intel.com

@AlexeySachkov AlexeySachkov temporarily deployed to aws March 9, 2023 14:29 — with GitHub Actions Inactive
@AlexeySachkov AlexeySachkov temporarily deployed to aws March 22, 2023 17:43 — with GitHub Actions Inactive
@asudarsa
Copy link
Contributor

Hi @AlexeySachkov

Do you want me to review this now or wait for any of your other PRs to be merged?

Thanks

@AlexeySachkov
Copy link
Contributor Author

Hi @AlexeySachkov

Do you want me to review this now or wait for any of your other PRs to be merged?

Thanks

@asudarsa, this is still a work in progress and therefore I would say there is no need to review it right now. #8833 however, is marked as ready for review and it would be great to see your feedback on it.

// 1. Global object A is used within by a global object B in any way (store,
// bitcast, phi node, call, etc.): "A" -> "B" edge will be added to the
// graph;
// 2. function A performs an indirect call of a function with signature S and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller function had the same signature S, will it also be included here?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The proposed version contains no filter of potential callees (except them being kernels) and therefore would include the caller itself as well, if it has signature matching indirect call.

This shouldn't cause any issues, because the caller will anyway be included if we reached it through call graph; and it won't be included otherwise (i.e. if we look at a graph, which includes some potentially indirectly callable callee, but not indirect calls)

// 1. Global object A is used within by a global object B in any way (store,
// bitcast, phi node, call, etc.): "A" -> "B" edge will be added to the
// graph;
// 2. function A performs an indirect call of a function with signature S and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we ignore including these functions with the same signature? Will it cause any code generation issues?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There won't be issues with code generation, but since we generally don't know which exact function is being called, it could lead to UB during execution of a kernel. For example:

parallel_for(...) {
  if (arg) // runtime variable, kernel argument
    ptrs[0] = foo;
  else
    ptrs[0] = bar;
}

parallel_for(...) {
  ptrs[0](42); // it could be either foo or bar;
  //we have no compile-time knowledge which one exactly it is and therefore have to include both
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we do today? Do we add 'all' function if there are indirect calls to be handled? Or do we just NOT handle indirect calls?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If device code split is set to auto, we fallback to not doing any device code split if indirect calls are present:

case SPLIT_AUTO: {
if (hasIndirectFunctionsOrCalls(M) || AutoSplitIsGlobalScope)
return Scope_Global;

If split still happened by some reason (user explicitly provided an option, or second-level per-aspect device code split happened), then we add all indirectly callable functions to every module:

// It is conservatively assumed that any address-taken function can be invoked
// or otherwise used by any function in any module split from the initial one.
// So such functions along with the call graphs they start are always
// extracted (and duplicated in each split module). They are not treated as
// entry points, as SYCL runtime requires that intersection of entry point
// sets of different device binaries (for the same target) must be empty.
// TODO: try to determine which split modules really use address-taken
// functions and only duplicate the functions in such modules. Note that usage
// may include e.g. function address comparison w/o actual invocation.
for (const auto *F : Deps.addrTakenFunctions()) {
if (!isKernel(*F) && (isESIMDFunction(*F) == ModuleEntryPoints.isEsimd()))
GVs.insert(F);
}

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. Please address comments. Please feel free to ignore comments that are marked 'nit'. I also need to take a closer look at the tests. Will do so in a bit.

Thanks again for this nice improvement.

@AlexeySachkov AlexeySachkov temporarily deployed to aws June 6, 2023 17:13 — with GitHub Actions Inactive
@AlexeySachkov AlexeySachkov temporarily deployed to aws June 6, 2023 17:52 — with GitHub Actions Inactive
Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR and also the clarifications.

Copy link
Contributor

@maksimsab maksimsab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AlexeySachkov AlexeySachkov merged commit 2910add into intel:sycl Jun 12, 2023
AlexeySachkov added a commit that referenced this pull request Jun 22, 2023
There was a concern expressed during code review of #8589,
that we may be adding too many functions into a module when trying to
find all potential callees of an indirect call, thus bloating module.
This PR attempts to fix that by limiting our signature matching to
functions which are marked with `referenced-indirectly` attribute.

This change is not expected to affect any E2E tests, because it returns
the behavior, where we only considered `referenced-indirectly` marked
functions as potential indirect call targets, which we had before
#8589.

---------

Co-authored-by: asudarsa <arvind.sudarsanam@intel.com>
@AlexeySachkov AlexeySachkov deleted the private/asachkov/device-code-split-and-function-pointers branch May 22, 2024 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants