-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Integrated Distributed ThinLTO (DTLTO): Design Overview #126654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
✅ With the latest revision this PR passed the C/C++ code formatter. |
I believe this failure is OK as I have followed the (non-standard) formatting in the flagged file which the code for the other ThinLTO backends use. |
llvm/lib/LTO/LTO.cpp
Outdated
sys::path::append(ObjFilePath, sys::path::stem(ModulePath) + "." + | ||
itostr(Task) + "." + UID + ".native.o"); | ||
|
||
Job &J = Jobs[Task - 1]; /*Task 0 is reserved*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC - 1
=> - RegularLTO.ParallelCodeGenParallelismLevel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I broke this for more than one LTO partition whilst hastily simplifying things. Thanks! Fixed now.
llvm/lib/LTO/LTO.cpp
Outdated
|
||
// Common command line template. | ||
JOS.attributeArray("args", [&]() { | ||
JOS.value(RemoteOptTool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused by the name--thinlto-remote-opt-tool=
. This is actually clang, not opt
. Perhaps thinlto-remote-compiler
or thinlto-remote-compile-tool
(long) or thinlto-remote-cc
(shortest)?
+ Args.MakeArgString("--thinlto-remote-opt-tool=" +
+ Twine(ToolChain.getDriver().getClangProgramPath())));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you accept thinlto-remote-codegen-tool
? This is so that the option name doesn't imply that the invoked tool is a compiler. One of the possibilities that we might want to explore is invoking LLD to do the backend compilations on the remote machines. Please see: #126654 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion on the linked comment thread I have adopted your suggestion of --thinlto-remote-compiler.
clang/lib/Driver/ToolChains/Gnu.cpp
Outdated
Twine(ToolChain.getDriver().getClangProgramPath()))); | ||
|
||
for (auto A : Args.getAllArgValues(options::OPT_Xthinlto_distributor_EQ)) | ||
CmdArgs.push_back(Args.MakeArgString("-mllvm=-thinlto-distributor-arg=" + A)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why we don't use a regular linker option. In lld, you could use getStrings
to read a list option, e.g.
ctx.arg.passPlugins = args::getStrings(args, OPT_load_pass_plugins)
However, introducing a new linker option requires changes to all lld ports and llvm-lto. Therefore, perhaps make --thinlto-remote-opt-tool
a cl::opt tool as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you already worked out I used cl::opt
options to minimize the changes to LLD ports and llvm-lto. The syntax is verbose, but LLD is usually invoked via the compiler driver so the verbose syntax is not exposed. I'm happy to use a cl::opt
for this. However, I would like to retain a retain the LLD option for the COFF port where LLD is often invoked directly. Does that sound ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having looked at this again I think your suggestion of making --thinlto-remote-opt-tool
a cl::opt
tool makes sense. I have updated the code to match. Please take a look.
lld/ELF/InputFiles.cpp
Outdated
|
||
// Compute a thin archive member full file path. | ||
static std::string | ||
computeThinArchiveMemberFullPath(const StringRef modulePath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two new functions can be combined. Perhaps name the new one dtltoGetMemberPathIfThinArchive
. What if it a non-thin archive? Shall we create a temporary file?
if (!ctx.arg.dtltoDistributor.empty() && !archiveName.empty())
path = dtltoGetMemberPathIfThinArchive(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two new functions can be combined. Perhaps name the new one
dtltoGetMemberPathIfThinArchive
.if (!ctx.arg.dtltoDistributor.empty() && !archiveName.empty()) path = dtltoGetMemberPathIfThinArchive(...)
I believe the simplification suggestions are viable. I also think that I can remove the this->mb = mbref; line (and its accompanying extensive comment), since that was only necessary for emitting empty summary index files - which we don't need for DTLTO.
I'm also not comfortable with isThinArchive relying on checking the magic number at the start of the archive file to determine if it is a thin archive. I'm considering recording the archive type somewhere (perhaps in InputFile) so that this check becomes unnecessary. I'll test these improvements and put them up tomorrow.
What if it a non-thin archive? Shall we create a temporary file?
Non-thin archives with bitcode members are important for PlayStation. A reasonable implementation would be to emit a temporary file into a cache (since libraries are often external to a project and rarely modified) and use that temporary file for DTLTO. Currently, the buffer identifier assigned here is used eventually as the ModulePath stored in the summary index. However, at this point we don't know which lazy bitcode files will actually be used by the link. Therefore, we would need to either:
Write out temporary files for all bitcode members of non-thin archives now, using the temporary file path as the buffer identifier, or write out temporary files later when a lazy bitcode file is used and added to the link. If we write out later, we would need to swap the identifier for LTO or add some remapping information to the LTO link so that the temporary file path is used for loading the bitcode file instead of the identifier assigned here.
At the LLVM conference there were also suggestions that we could teach Clang and the distribution systems to handle archives natively, perhaps even understanding identifiers such as foo.a(bar.o at 40)
.
It would be beneficial to implement a mechanism that could also be used by the existing Bazel-style distribution in case, in the future, there is a need to support bitcode members in archives for that.
For the initial commit, however, I would prefer to support thin archives only. non-thin archive support can be added in a later commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have simplified the code as we discussed. I also record which archives are thin when files are added to the link in LinkerDriver::addFile
and then pass this information into the BitcodeFile
constructor. This removes the hack where I was reopening the archive files and checking the magic bytes to determine if they were thin. The implementation is much improved now. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now there are quite a few changes to Driver and InputFiles, which actually made me more nervous. Non-dtlto use cases now incur the overhead (not that they can't, but it's not necessary to introduce these changes for the linker option feature).
These changes might turn out to be unneeded when non-thin archives are supported.
Would be nice to restore the previous version but just improve the local code and check the thin magic only when --thinlto-distributor= is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
lld/docs/DTLTO.rst
Outdated
Specifies the file to execute as a distributor process. | ||
If specified, ThinLTO backend compilations will be distributed. | ||
|
||
- ``--thinlto-remote-opt-tool=<path>`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps --thinlto-remote-compiler or --thinlto-remote-compile-tool. It's unlikely we will use the internal tool named opt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: #126654 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code to use thinlto-remote-compiler
.
By "DTLTO backend" do you mean the new CGThinBackend class? Since that's a big part of the LLVM piece, I've kind of held off on reviewing for now since I'd like to compare the 2 approaches first and I'm not sure how much is changing. |
- Remove linker version. - Make AddBuffer a default parameter for LTO::run() to minimize changes at call sites. - Format code with clang-format, even if it deviates slightly from the surrounding style. I decided that CI passing for the PRs was more important than matching the existing style. - Order distributor command-line options correctly. - Remove dead code in llvm-lto2.cpp. - Add docstrings to Python distributors. - Update the Python mock.py distributor to validate JSON. - Organize LLVM DTLTO tests by moving them into a dedicated "dtlto" directory, tidying existing tests and adding new ones. - Implement fix for more than one LTO partition (spotted by MaskRay).
- Combine new functions as suggested by MaskRay in review. - Remove code that worked around a crash that can no longer occur now that the implementation is structured as a new ThinLTO backend. - Record whether an archive was thin or not and pass this information into the BitcodeFile constructor. This replaces code that reopened the archive file and checked the archive magic.
Actually, please review whatever you would like to at this point, Theresa. I don't want to get in the way of hearing what you think - we're keen to your input. I just wanted to point out that since another branch is coming, you may wish to wait until it arrives if you think a side-by-side comparison would be a good way of doing things. To clarify: that other branch won't be put up as a pull request, but we can decide how to proceed here if the general design shown in that other branch is preferred. I also mentioned that it will appear in a few days, but that's really dependent on the results of some more internal review. We're working hard on it! |
Actually, we don't have any data to determine whether or not these ideas for performance enhancements translate into real world gains. So, let's perhaps revisit these in subsequent PRs, assuming we get something landed for starters, here. |
- Remove mention of the, now removed, "Linker Version" stuff. - Remote optimization tool -> remote compiler.
Hi Reviewers! Thanks for the feedback here. I wanted to draw attention to something that I mentioned briefly in the description - the possibility of a plugin interface as opposed to invoking an external process that consumes JSON. There are some theoretical advantages with plugins. For example, if distribution systems exist/arise that allow data to be passed back and forth between LLVM and a distribution system using memory buffers instead of files, a plugin could perhaps do that more efficiently. But we haven't done anything yet to quantify how much better this would be vs implicitly leaning on e.g. memory mapped files and the OS file cache. The distribution systems we're motivated to support from customer demand don't have such capabilities at this time. Does anyone have any opinions on this? |
Reviewers have suggested that smaller PRs be created to facilitate detailed reviews while keeping this one around to provide an overall picture and facilitate discussion of high-level design (see this ongoing discussion). I have created PR #127749 to collect detailed review comments for the parts related to LLVM and have added those who commented here as reviewers. Note that I have changed the name of this PR to indicate that it provides an overview of DTLTO to avoid confusion. Thanks again for all your input! |
The “No backend DTLTO” branch is ready. Please have a look and let us know what you think. https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend This comment #127749 (comment) has more details about the differences between “Out of process (DTLTO) backend” branch and “No backend” DTLTO branch. |
The “No backend DTLTO” branch is ready. Please have a look and let us know what you think. https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend This comment #127749 (comment) has more details about the differences between “Out of process (DTLTO) backend” branch and “No backend” DTLTO branch. |
This patch introduces initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM. DTLTO enables distribution of backend ThinLTO compilation jobs via external distribution systems, such as Incredibuild. The existing support for distributing ThinLTO compilations uses separate thin-link (--thinlto-index-only), backend compilation, and link steps coordinated by a modern build system, like Bazel. This "Bazel-style" distributed ThinLTO requires a modern build system to manage dynamically discovered dependencies specified in the summary index file shards. However, adopting a modern build system can be prohibitive for users with established build infrastructure. In contrast, DTLTO manages distribution within LLVM during the traditional link step. This approach allows DTLTO to work with any build process that supports in-process ThinLTO. This patch provides a minimal but functional implementation of DTLTO, which will be expanded in subsequent commits. It adds support for delegating the backend ThinLTO jobs to an external process (the distributor), which in turn coordinates distribution through a system such as Incredibuild. A JSON interface is used for communication with the distributor. The JSON interface allows new distribution systems to be supported without modifying LLVM. Please see the documentation added in this patch: llvm/docs/dtlto.rst for more details. Subsequent LLVM commits will add: - Support for the ThinLTO cache. - Support for more LTO configuration states, e.g., basic block sections. - Performance improvements, such as more efficient removal of temporary files. RFC: Integrated Distributed ThinLTO RFC For the design of the DTLTO feature, see: llvm#126654.
This patch adds initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM, which manages distribution internally during the traditional link step. This enables compatibility with any build system that supports in-process ThinLTO. In contrast, existing approaches to distributed ThinLTO, which split the thin-link (--thinlto-index-only), backend compilation, and final link into separate steps, require build system support, e.g. Bazel. This patch implements the core DTLTO mechanism, which enables delegation of ThinLTO backend jobs to an external process (the distributor). The distributor can then manage job distribution through systems like Incredibuild. A generic JSON interface is used to communicate with the distributor, allowing for the creation of new distributors (and thus integration with different distribution systems) without modifying LLVM. Please see llvm/docs/dtlto.rst for more details. RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641 Design Review: #126654
…749) This patch adds initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM, which manages distribution internally during the traditional link step. This enables compatibility with any build system that supports in-process ThinLTO. In contrast, existing approaches to distributed ThinLTO, which split the thin-link (--thinlto-index-only), backend compilation, and final link into separate steps, require build system support, e.g. Bazel. This patch implements the core DTLTO mechanism, which enables delegation of ThinLTO backend jobs to an external process (the distributor). The distributor can then manage job distribution through systems like Incredibuild. A generic JSON interface is used to communicate with the distributor, allowing for the creation of new distributors (and thus integration with different distribution systems) without modifying LLVM. Please see llvm/docs/dtlto.rst for more details. RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641 Design Review: llvm/llvm-project#126654
Thanks to everyone for the review effort on DTLTO — I’ve now merged the LLVM portion of the changes (#127749). I’m currently on PTO this week, but once I’m back, I’ll post the Clang and LLD patches for review. In the meantime, if anyone wants to try out DTLTO on a stable branch, I’ve backported the changes to LLVM 19 here: https://github.com/bd1976bris/llvm-project/commits/dtlto_llvm19. |
This patch adds initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM, which manages distribution internally during the traditional link step. This enables compatibility with any build system that supports in-process ThinLTO. In contrast, existing approaches to distributed ThinLTO, which split the thin-link (--thinlto-index-only), backend compilation, and final link into separate steps, require build system support, e.g. Bazel. This patch implements the core DTLTO mechanism, which enables delegation of ThinLTO backend jobs to an external process (the distributor). The distributor can then manage job distribution through systems like Incredibuild. A generic JSON interface is used to communicate with the distributor, allowing for the creation of new distributors (and thus integration with different distribution systems) without modifying LLVM. Please see llvm/docs/dtlto.rst for more details. RFC: https://discourse.llvm.org/t/rfc-integrated-distributed-thinlto/69641 Design Review: llvm#126654
This patch introduces support for Integrated Distributed ThinLTO (DTLTO) in ELF LLD. DTLTO enables the distribution of ThinLTO backend compilations via external distribution systems, such as Incredibuild, during the traditional link step: https://llvm.org/docs/DTLTO.html. It is expected that users will invoke DTLTO through the compiler driver (e.g., Clang) rather than calling LLD directly. A Clang-side interface for DTLTO will be added in a follow-up patch. Note: Bitcode members of non-thin archives are not currently supported. This will be addressed in a future change. Testing: - ELF LLD `lit` test coverage has been added, using a mock distributor to avoid requiring Clang. - Cross-project `lit` tests cover integration with Clang. For the design discussion of the DTLTO feature, see: llvm#126654
This patch introduces support for Integrated Distributed ThinLTO (DTLTO) in ELF LLD. DTLTO enables the distribution of ThinLTO backend compilations via external distribution systems, such as Incredibuild, during the traditional link step: https://llvm.org/docs/DTLTO.html. It is expected that users will invoke DTLTO through the compiler driver (e.g., Clang) rather than calling LLD directly. A Clang-side interface for DTLTO will be added in a follow-up patch. Note: Bitcode members of non-thin archives are not currently supported. This will be addressed in a future change. Testing: - ELF LLD `lit` test coverage has been added, using a mock distributor to avoid requiring Clang. - Cross-project `lit` tests cover integration with Clang. For the design discussion of the DTLTO feature, see: llvm#126654
This patch introduces support for Integrated Distributed ThinLTO (DTLTO) in ELF LLD. DTLTO enables the distribution of ThinLTO backend compilations via external distribution systems, such as Incredibuild, during the traditional link step: https://llvm.org/docs/DTLTO.html. It is expected that users will invoke DTLTO through the compiler driver (e.g., Clang) rather than calling LLD directly. A Clang-side interface for DTLTO will be added in a follow-up patch. Note: Bitcode members of non-thin archives are not currently supported. This will be addressed in a future change. Testing: - ELF LLD `lit` test coverage has been added, using a mock distributor to avoid requiring Clang. - Cross-project `lit` tests cover integration with Clang. For the design discussion of the DTLTO feature, see: llvm#126654
This patch introduces support for Integrated Distributed ThinLTO (DTLTO) in ELF LLD. DTLTO enables the distribution of ThinLTO backend compilations via external distribution systems, such as Incredibuild, during the traditional link step: https://llvm.org/docs/DTLTO.html. It is expected that users will invoke DTLTO through the compiler driver (e.g., Clang) rather than calling LLD directly. A Clang-side interface for DTLTO will be added in a follow-up patch. Note: Bitcode members of non-thin archives are not currently supported. This will be addressed in a future change. Testing: - ELF LLD `lit` test coverage has been added, using a mock distributor to avoid requiring Clang. - Cross-project `lit` tests cover integration with Clang. For the design discussion of the DTLTO feature, see: llvm#126654
I have put a PR for introducing support for Integrated Distributed ThinLTO (DTLTO) in ELF LLD: #142757. Once that's merged I will follow up with PRs for adding DTLTO support to Clang and adding DTLTO support to COFF LLD. |
Initial DTLTO Support
This PR introduces initial support for Integrated Distributed ThinLTO (DTLTO). DTLTO was previously discussed in an RFC and during the LTO roundtable discussion at the October US 2024 LLVM conference. PlayStation has offered this feature as a proprietary technology for some time, and we would like to see support in LLVM.
Overview of DTLTO
DTLTO enables the distribution of backend ThinLTO compilations via external distribution systems, such as Incredibuild. Existing support for distributing ThinLTO compilations typically involves separate thin-link (
--thinlto-index-only
), backend compilation, and link steps coordinated by a modern build system, like Bazel. This "Bazel-style" distributed ThinLTO requires a modern build system as it must handle the dynamic dependencies specified in the summary index file shards. However, adopting a modern build system can be prohibitive for users with established build infrastructure.In contrast, DTLTO manages distribution within LLVM during the traditional link step. This approach means that DTLTO is usable with any build process that supports in-process ThinLTO.
Documentation and Resources
llvm/docs/dtlto.rst
.clang/docs/dtlto.rst
andlld/docs/dtlto.rst
.llvm/utils/dtlto
.Features of This Initial Commit
This commit provides a minimal but functional implementation of DTLTO, which will be expanded in subsequent commits. The current implementation includes:
The goal of this initial commit is to demonstrate what will be required to support DTLTO while providing useful minimal functionality. Hopefully, having a concrete PR will facilitate discussion and review of the feature.
Performance
We have access to a large farm of computers on Windows. For a link of
clang.exe
on a modest Windows development machine (AMD64 16 cores, 64GB RAM) DTLTO (viasn-dbs.py
) was approximately 4 times as fast as multi-threaded in-process ThinLTO.To estimate the overhead from DTLTO vs in-process ThinLTO, we measured the difference in the time taken to link
Clang
with in-process ThinLTO using one thread per core, and DTLTO using one local process per core. On both Windows and Linux the overhead was approximately 6%.Note that, to facilitate review, this PR elides performance optimizations where possible.
Planned Future Enhancements
The following features will be addressed in future commits:
Discussion Points
DTLTO
name could potentially cause confusion with the existing Bazel-style distributed ThinLTO. At the LLVM roundtable discussion no one objected to the name, but we remain open to suggestions.Clang
is invoked to do the backend compilations and a minimal number of options are added to theClang
command line to ensure that the codegen is reasonable (for the testing we have done so far). However, it would be good to find a scalable solution for matching the code-generation state in the invoked external tool to the code-generation state if an in-process ThinLTO backend was in use.Other approaches
We have experimented with other approaches for implementing DTLTO. In particular we have explored:
We have prepared another branch to demonstrate some of these ideas: integrated-DTLTO-no-backend
List of Child PRs:
(I intend to update this section as new PRs are filed.)