-
Notifications
You must be signed in to change notification settings - Fork 790
[Driver][SYCL] Improve FPGA archive device unbundling with AOCO #9572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
68d0cce
to
18b7c46
Compare
Hi @maksimsab Can you please add some description to this PR? Thanks |
18b7c46
to
7a9bd3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bundler changes LGTM. Thanks
7a9bd3b
to
992ed07
Compare
AOCO specific objects can be added to an archive. When this occurs, the archive was being completely ignored, causing issues with device compilation. Take advantage of the additional -excluded-targets option from the clang-offload-bundler so we can process these archives for the needed device information, excluding potential unbundling of the target device in the presense of aoco binaries in that same object. Co-authored-by: Michael Toguchi <michael.d.toguchi@intel.com>
992ed07
to
9aa6c0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driver changes LGTM, thanks
@@ -1215,6 +1241,66 @@ class ArchiveFileHandler final : public FileHandler { | |||
Error WriteBundle(raw_fd_ostream &OS, MemoryBuffer &Input) override { | |||
llvm_unreachable("unsupported for the ArchiveFileHandler"); | |||
} | |||
|
|||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are opening and parsing the file here an additional time for every child. This can prove expensive. There must be a way to do this 'exclusion' on the fly using a single parse of the file. But we can do that as a subsequent work. Please add a TODO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this optimizations.
There is a cost we should always consider is that this tool is from community. And we should resolve conflicts in sycl-web every time something comes from community in this functionality. So if we change the archive processing significantly in terms of a code then conflict resolving could turn into hard work for one who will maintain this stuff (it is me or somebody from our team).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is. It removes useless work in general case.
3707aa4#diff-c29010597bfa221cf7e7cfc018e0b0a7bdae17bcf8d17b27be96f4cbd20ec28aR1293-R1295
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surely not a blocker for this PR. But we need to address this and atleast verify that this does not increase compile time by a lot. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Few opportunities for optimization can be addressed later. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, there are no tests for clang-offload-bundler
changes in this PR.
// NOTE: "-sycldevice" Triple component has been deprecated. | ||
// However, it still can be met in libraries that have been compiled before | ||
// deprecation. For example, here Triple might be the following: | ||
// sycl-fpga_aoco-intel-unknown-sycldevice | ||
// | ||
// The workaround is to strip this Triple component if it is present. | ||
Triple.consume_back("-sycldevice"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have dropped support for such binaries, no need to have that workaround anymore, see #9408
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it looks like FPGA libraries still depend on it. We asked them how did it happen. But it doesn't look like we can drop support of "sycldevice" env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have effectively done that already. We do aim to preserve backwards compatibility generally, but right now we are in ABI-breaking window and we are allowed to cleanup legacy code, thus breaking backwards compatibility.
clang/lib/Driver/OffloadBundler.cpp
Outdated
for (const auto &TargetName : TargetNames) | ||
if (CheckTargetIsExcluded(TargetName)) | ||
return true; | ||
|
||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return std::count_if() > 0
here as well? cppreference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CheckTargetIsExcluded
is a method. Thus, it can't be passed as UnaryPredicate. Also it can't be made static because it uses BundlerConfig. Trying to make it static and pass BundlerConfig through would make the code look ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be converted into a lamda, since it is fairly small method. But feel free to disregard this comment or apply in a separate follow-up patch
// RUN: llc -filetype=obj -o %t-device2.o %t-device2.bc | ||
// RUN: llc -filetype=obj -o %t-aoco.o %t-aoco.bc | ||
// RUN: llc -filetype=obj -o %t-host.o %t-host.bc | ||
// RUN: llc -filetype=obj -o %t-host2.o %t-host2.bc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to use a pre-built binary/archive and put it in Inputs/SYCL
and just test the behavior of clang-offload-bundler
? It reduces external dependencies on other tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can put it into Inputs folder. However, I still have dependency on cat
, xargs
, strings
and grep
tools in the final line of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
I see
locally. |
I believe the issue has been introduced in #9572.
AOCO specific objects can be added to an archive. When this occurs,
the archive was being completely ignored, causing issues with device
compilation. Take advantage of the additional -excluded-targets option
from the clang-offload-bundler so we can process these archives for
the needed device information, excluding potential unbundling of the
target device in the presense of aoco binaries in that same object.