-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Enhance device code split call graph analysis #8589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Enhance device code split call graph analysis #8589
Conversation
…ce-code-split-and-function-pointers
…ce-code-split-and-function-pointers
…ce-code-split-and-function-pointers
Do you want me to review this now or wait for any of your other PRs to be merged? Thanks |
…ce-code-split-and-function-pointers
…ce-code-split-and-function-pointers
…ce-code-split-and-function-pointers
// 1. Global object A is used within by a global object B in any way (store, | ||
// bitcast, phi node, call, etc.): "A" -> "B" edge will be added to the | ||
// graph; | ||
// 2. function A performs an indirect call of a function with signature S and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the caller function had the same signature S, will it also be included here?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The proposed version contains no filter of potential callees (except them being kernels) and therefore would include the caller itself as well, if it has signature matching indirect call.
This shouldn't cause any issues, because the caller will anyway be included if we reached it through call graph; and it won't be included otherwise (i.e. if we look at a graph, which includes some potentially indirectly callable callee, but not indirect calls)
// 1. Global object A is used within by a global object B in any way (store, | ||
// bitcast, phi node, call, etc.): "A" -> "B" edge will be added to the | ||
// graph; | ||
// 2. function A performs an indirect call of a function with signature S and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we ignore including these functions with the same signature? Will it cause any code generation issues?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There won't be issues with code generation, but since we generally don't know which exact function is being called, it could lead to UB during execution of a kernel. For example:
parallel_for(...) {
if (arg) // runtime variable, kernel argument
ptrs[0] = foo;
else
ptrs[0] = bar;
}
parallel_for(...) {
ptrs[0](42); // it could be either foo or bar;
//we have no compile-time knowledge which one exactly it is and therefore have to include both
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we do today? Do we add 'all' function if there are indirect calls to be handled? Or do we just NOT handle indirect calls?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If device code split is set to auto
, we fallback to not doing any device code split if indirect calls are present:
llvm/llvm/tools/sycl-post-link/ModuleSplitter.cpp
Lines 83 to 85 in 6d2a46a
case SPLIT_AUTO: { | |
if (hasIndirectFunctionsOrCalls(M) || AutoSplitIsGlobalScope) | |
return Scope_Global; |
If split still happened by some reason (user explicitly provided an option, or second-level per-aspect device code split happened), then we add all indirectly callable functions to every module:
llvm/llvm/tools/sycl-post-link/ModuleSplitter.cpp
Lines 248 to 260 in 6d2a46a
// It is conservatively assumed that any address-taken function can be invoked | |
// or otherwise used by any function in any module split from the initial one. | |
// So such functions along with the call graphs they start are always | |
// extracted (and duplicated in each split module). They are not treated as | |
// entry points, as SYCL runtime requires that intersection of entry point | |
// sets of different device binaries (for the same target) must be empty. | |
// TODO: try to determine which split modules really use address-taken | |
// functions and only duplicate the functions in such modules. Note that usage | |
// may include e.g. function address comparison w/o actual invocation. | |
for (const auto *F : Deps.addrTakenFunctions()) { | |
if (!isKernel(*F) && (isESIMDFunction(*F) == ModuleEntryPoints.isEsimd())) | |
GVs.insert(F); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM. Please address comments. Please feel free to ignore comments that are marked 'nit'. I also need to take a closer look at the tests. Will do so in a bit.
Thanks again for this nice improvement.
…ce-code-split-and-function-pointers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the PR and also the clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a concern expressed during code review of #8589, that we may be adding too many functions into a module when trying to find all potential callees of an indirect call, thus bloating module. This PR attempts to fix that by limiting our signature matching to functions which are marked with `referenced-indirectly` attribute. This change is not expected to affect any E2E tests, because it returns the behavior, where we only considered `referenced-indirectly` marked functions as potential indirect call targets, which we had before #8589. --------- Co-authored-by: asudarsa <arvind.sudarsanam@intel.com>
This patch introduces significant changes to how device code split detects functions and global variables which should be included into a cloned module.
There are two main changes done to that:
Both things are implemented through new
DependencyGraph
entity, which replacesCallGraph
entity we used.Instead of calls, that new graph is built over uses of functions and variables to understand which functions and global variables are used by which functions and global variables.
The most tricky part here is indirect calls: we can't understand which exact function is being called by an indirect call. However, we can compile a list of potentially-called function by comparing function signatures with signature of an indirect call.
On top of that, ESIMD handling is refactored by this patch:
New ESIMD-specific device code split helper is needed, because we should use different rules for ESIMD and non-ESIMD parts of a module when splitting it to two. For ESIMD part we want to grab all ESIMD-functions even if they were not considered as entry points in the original module. For non-ESIMD part we don't want to grab any ESIMD-functions, even if they are referenced/used by non-ESIMD functions.
Both of those special rules come from
invoke_simd
feature support: non-ESIMD kernel can indirectly reference a ESIMD function. Since those different kind of functions require different processing, we have to completely separate them before processing step. Non-ESIMD module could be incomplete as a result of such split, but it will be merged back with ESIMD module after ESIMD lowering. That merge step is required forinvoke_simd
functionality.Co-Authored-By: Cai, Justin justin.cai@intel.com