-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable TLS on linux/arm64 only for static resolver #104408
Conversation
Tagging subscribers to this area: @mangod9 |
@oldzhu confirmed that this solution works. |
src/coreclr/vm/threadstatics.cpp
Outdated
uint32_t* resolverAddress = reinterpret_cast<uint32_t*>(GetTLSResolverAddress()); | ||
if ( | ||
// nop or hint 32 | ||
((resolverAddress[0] == 0xd503201f) || (resolverAddress[0] == 0xd503241f)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work correctly for musl glibc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we do not do this optimization for musl as you can see few lines above - https://github.com/dotnet/runtime/pull/104408/files/6ecf11fec6c07aefc61f3647a3b3e358e35b6008#diff-838c5a095d88dfabb928a305f3d4ec888cbb0241f0c0d5cf1bb802c706bd1edbR800-R801
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that because we end up with dynamic resolver on musl? It would be good to find out if this fix covers the musl too (and that ifdef above can be deleted) or if there is something else going on with musl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that because we end up with dynamic resolver on musl?
We originally disabled it, probably because of same reason, but do you think that enabling for musl/arm64 can be done as a separate change?
EDIT: I am planning to backport this to 8.0, so might be good to just fix it for targets we have this optimization enabled and do a separate change to enable it for musl/arm64 in .net9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with enabling it for musl in a separate change. We should understand what's going on with musl before backporting to make sure we are not missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should understand what's going on with musl before backporting to make sure we are not missing anything.
Just to make sure I understand, if it turns out to be same issue for musl as well, we still do not want to enable musl/arm64 for 8.0, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the servicing fix should be the minimal change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything else needed for this PR then?
{ | ||
// nop might not be present in older resolver, so skip it. | ||
|
||
// nop or hint 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// nop or hint 32 | |
// nop or bti |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since "hint 32" is an instruction and I preferred having instruction in the comment, should I keep it as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bti
is a more self-describing name for the same instruction.
hint
is reserved instruction encoding space. They are assigning concrete meaning and names to the instructions in this space as the architecture evolves. BTW: nop
is same as hint 0
.
} | ||
else | ||
{ | ||
_ASSERTE(false && "Unexpected code sequence."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that validates this fallback? I expect this question is going to be asked during servicing review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. What is the right way to add a test? Force dynamic resolver to kick in or something? Need to find out how to do that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rigth:
-
Add a native test library with a lot of thread statics (look for CMakeLists.txt under src\tests for how to add one of those). The thread statics need to be individual thread static variables so that they are allocated individually. Loading this library should eat all non-dynamic statics space and force the dynamic resolver to kick in.
-
Create a test that loads this native test library before loading libcoreclr.so. I am not sure what's the best way to do that. You can try using
LD_PRELOAD
that points to the native test library; or you can try to load it as a mock host policy using this hook in corerun. cc @jkoritzinsky and @AaronRobinsonMSFT for more thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a native test library with a lot of thread statics
Interesting, my impression was using some kind of compiler flag or an attribute.
Loading this library should eat all non-dynamic statics space and force the dynamic resolver to kick in.
But even if I get it to use the dynamic resolver, the question would be how to check that the TLS optimization is disabled I suppose. I believe I will have to still write managed code containing ThreadStatic
variable and I will have to verify that its access was not optimized. That will need somehow getting hold of the jitted method's address and reading the instruction stream of it to confirm that we did not do TLS access? Alternatively, just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this assert, you can add a debug-only config switch that says whether we expect dynamic or static resolver. If we come here and we do not the expected resolver, assert. I think it is good-enough for the test to be effective on checked build only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.
The test should do some thread static accesses from multiple threads to make sure that they work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But even if I get it to use the dynamic resolver, the question would be how to check that the TLS optimization is disabled I suppose. I believe I will have to still write managed code containing
ThreadStatic
variable and I will have to verify that its access was not optimized. That will need somehow getting hold of the jitted method's address and reading the instruction stream of it to confirm that we did not do TLS access? Alternatively, just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.
You can use disasm checks to do this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a test that loads this native test library before loading libcoreclr.so. I am not sure what's the best way to do that. You can try using LD_PRELOAD that points to the native test library; or you can try to load it as a mock host policy using this hook in corerun. cc @jkoritzinsky and @AaronRobinsonMSFT for more thoughts.
Seems like a reasonable feature. I would create a new environment variables that loads an arbitrary DLL early on. Once tht is in, I will update the host policy mechanism to use that.
@@ -799,6 +803,34 @@ bool CanJITOptimizeTLSAccess() | |||
// Optimization is disabled for FreeBSD/arm64 | |||
#elif defined(FEATURE_INTERPRETER) | |||
// Optimization is disabled when interpreter may be used | |||
#elif !defined(TARGET_OSX) && defined(TARGET_UNIX) && defined(TARGET_ARM64) | |||
// Optimization is enabled for linux/arm64 only for static resolver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We may have a problem on native AOT too: The codegen assumes that the OS loader is going to be able to give us a constant offset (we effectively hardcode that static resolver):
runtime/src/coreclr/jit/helperexpansion.cpp
Lines 970 to 974 in a7efcd9
// Code sequence to access thread local variable on linux/arm64: // // mrs xt, tpidr_elf0 // mov xd, [xt+cns] //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought about NativeAOT too, but realized that it might be always static resolver.
It is a fine assumption to make when building executables, but it may not work for libraries.
We can detect static/dynamic resolver for JITting on the fly, but for nativeaot, it might be tricky to know that information ahead of time. Possibly, we might have to embed that check in the generated code itself? But that might add an extra check for code for which we mostly get static resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For native AOT, we should generate the call to the resolver via indirection just like C/C++ compiler does when building dynamic libraries.
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -32 | ||
adrp x0, :tlsdesc:t_ThreadStatics | ||
ldr x1, [x0, #:tlsdesc_lo12:t_ThreadStatics] | ||
mov x0, x1 | ||
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -32 | |
adrp x0, :tlsdesc:t_ThreadStatics | |
ldr x1, [x0, #:tlsdesc_lo12:t_ThreadStatics] | |
mov x0, x1 | |
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 32 | |
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -16 | |
adrp x0, :tlsdesc:t_ThreadStatics | |
ldr x0 [x0, #:tlsdesc_lo12:t_ThreadStatics] | |
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 16 |
Is this ok ?
Superseded by #106052 |
If the TLS is resolved by the dynamic resolver, the TP offset returned if for calling (JITting) thread and it is wrong to embed that information in the JIT code which will get executed by an execution thread. Doing that gets SEGFAULT. So detect what type of resolver it is and depending on that disable TLS if it is dynamic resolver.