-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lld] check cache before real_path in loadDylib #140791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In llvm#137649 symlink resolution was added when loading dylibs. This introduced a performance regression when linking with a large number of inputs with LC_LOAD_DYLIB commands due to the syscall overhead of realpath. Refactor the change to be closer to the original: - first check if the given path is in the cache - if not, resolve it and check again - update cache entries of both paths to point to the same dylib This mitigates the regression as we do not incur the realpath cost for every loadDylib call, only once per unique path.
@llvm/pr-subscribers-lld-macho Author: Richard Howell (rmaz) ChangesIn #137649 symlink resolution was added when loading dylibs. This introduced a performance regression when linking with a large number of inputs with LC_LOAD_DYLIB commands due to the syscall overhead of realpath. Refactor the change to be closer to the original:
This mitigates the regression as we do not incur the realpath cost for every loadDylib call, only once per unique path. Full diff: https://github.com/llvm/llvm-project/pull/140791.diff 1 Files Affected:
diff --git a/lld/MachO/DriverUtils.cpp b/lld/MachO/DriverUtils.cpp
index cf874018fa34b..14d60eb4cfa81 100644
--- a/lld/MachO/DriverUtils.cpp
+++ b/lld/MachO/DriverUtils.cpp
@@ -229,12 +229,7 @@ static DenseMap<CachedHashStringRef, DylibFile *> loadedDylibs;
DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
bool isBundleLoader, bool explicitlyLinked) {
- // Frameworks can be found from different symlink paths, so resolve
- // symlinks before looking up in the dylib cache.
- SmallString<128> realPath;
- std::error_code err = fs::real_path(mbref.getBufferIdentifier(), realPath);
- CachedHashStringRef path(!err ? uniqueSaver().save(StringRef(realPath))
- : mbref.getBufferIdentifier());
+ CachedHashStringRef path(mbref.getBufferIdentifier());
DylibFile *&file = loadedDylibs[path];
if (file) {
if (explicitlyLinked)
@@ -242,6 +237,23 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
return file;
}
+ // Frameworks can be found from different symlink paths, so resolve
+ // symlinks and look up in the dylib cache.
+ DylibFile *&realfile = file;
+ SmallString<128> realPath;
+ std::error_code err = fs::real_path(mbref.getBufferIdentifier(), realPath);
+ if (!err) {
+ CachedHashStringRef resolvedPath(uniqueSaver().save(StringRef(realPath)));
+ realfile = loadedDylibs[resolvedPath];
+ if (realfile) {
+ if (explicitlyLinked)
+ realfile->setExplicitlyLinked();
+
+ file = realfile;
+ return realfile;
+ }
+ }
+
DylibFile *newFile;
file_magic magic = identify_magic(mbref.getBuffer());
if (magic == file_magic::tapi_file) {
@@ -253,6 +265,7 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
}
file =
make<DylibFile>(**result, umbrella, isBundleLoader, explicitlyLinked);
+ realfile = file;
// parseReexports() can recursively call loadDylib(). That's fine since
// we wrote the DylibFile we just loaded to the loadDylib cache via the
@@ -268,6 +281,7 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
magic == file_magic::macho_executable ||
magic == file_magic::macho_bundle);
file = make<DylibFile>(mbref, umbrella, isBundleLoader, explicitlyLinked);
+ realfile = file;
// parseLoadCommands() can also recursively call loadDylib(). See comment
// in previous block for why this means we must copy `file` here.
|
@llvm/pr-subscribers-lld Author: Richard Howell (rmaz) ChangesIn #137649 symlink resolution was added when loading dylibs. This introduced a performance regression when linking with a large number of inputs with LC_LOAD_DYLIB commands due to the syscall overhead of realpath. Refactor the change to be closer to the original:
This mitigates the regression as we do not incur the realpath cost for every loadDylib call, only once per unique path. Full diff: https://github.com/llvm/llvm-project/pull/140791.diff 1 Files Affected:
diff --git a/lld/MachO/DriverUtils.cpp b/lld/MachO/DriverUtils.cpp
index cf874018fa34b..14d60eb4cfa81 100644
--- a/lld/MachO/DriverUtils.cpp
+++ b/lld/MachO/DriverUtils.cpp
@@ -229,12 +229,7 @@ static DenseMap<CachedHashStringRef, DylibFile *> loadedDylibs;
DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
bool isBundleLoader, bool explicitlyLinked) {
- // Frameworks can be found from different symlink paths, so resolve
- // symlinks before looking up in the dylib cache.
- SmallString<128> realPath;
- std::error_code err = fs::real_path(mbref.getBufferIdentifier(), realPath);
- CachedHashStringRef path(!err ? uniqueSaver().save(StringRef(realPath))
- : mbref.getBufferIdentifier());
+ CachedHashStringRef path(mbref.getBufferIdentifier());
DylibFile *&file = loadedDylibs[path];
if (file) {
if (explicitlyLinked)
@@ -242,6 +237,23 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
return file;
}
+ // Frameworks can be found from different symlink paths, so resolve
+ // symlinks and look up in the dylib cache.
+ DylibFile *&realfile = file;
+ SmallString<128> realPath;
+ std::error_code err = fs::real_path(mbref.getBufferIdentifier(), realPath);
+ if (!err) {
+ CachedHashStringRef resolvedPath(uniqueSaver().save(StringRef(realPath)));
+ realfile = loadedDylibs[resolvedPath];
+ if (realfile) {
+ if (explicitlyLinked)
+ realfile->setExplicitlyLinked();
+
+ file = realfile;
+ return realfile;
+ }
+ }
+
DylibFile *newFile;
file_magic magic = identify_magic(mbref.getBuffer());
if (magic == file_magic::tapi_file) {
@@ -253,6 +265,7 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
}
file =
make<DylibFile>(**result, umbrella, isBundleLoader, explicitlyLinked);
+ realfile = file;
// parseReexports() can recursively call loadDylib(). That's fine since
// we wrote the DylibFile we just loaded to the loadDylib cache via the
@@ -268,6 +281,7 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
magic == file_magic::macho_executable ||
magic == file_magic::macho_bundle);
file = make<DylibFile>(mbref, umbrella, isBundleLoader, explicitlyLinked);
+ realfile = file;
// parseLoadCommands() can also recursively call loadDylib(). See comment
// in previous block for why this means we must copy `file` here.
|
SmallString<128> realPath; | ||
std::error_code err = fs::real_path(mbref.getBufferIdentifier(), realPath); | ||
if (!err) { | ||
CachedHashStringRef resolvedPath(uniqueSaver().save(StringRef(realPath))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CachedHashStringRef resolvedPath(uniqueSaver().save(StringRef(realPath))); | |
CachedHashStringRef resolvedPath(uniqueSaver().save(realPath.str())); |
@@ -253,6 +265,7 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella, | |||
} | |||
file = | |||
make<DylibFile>(**result, umbrella, isBundleLoader, explicitlyLinked); | |||
realfile = file; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the common case we have
DylibFile *&file = loadedDylibs[path];
DylibFile *&realfile = file;
Then what happens on this line? It does a load and a store to the same address? I'm sure it does the right thing it just looks funny to me.
Also, if we set file
later in the future, how can we make sure to not forget to also set realfile
? Is there a better way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what happens on this line? It does a load and a store to the same address?
Thats what I thought would happen, yes.
Also, if we set file later in the future, how can we make sure to not forget to also set realfile? Is there a better way of doing this?
What would you suggest? Ultimately we have 2 cache pointers, that may or may not point to the same thing, and we need to update them.
In #137649 symlink resolution was added when loading dylibs. This introduced a performance regression when linking with a large number of inputs with LC_LOAD_DYLIB commands due to the syscall overhead of realpath.
Refactor the change to be closer to the original:
This mitigates the regression as we do not incur the realpath cost for every loadDylib call, only once per unique path.