Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TLS on linux/arm64 only for static resolver #104408

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
add check to skip nop for older resolver
  • Loading branch information
kunalspathak committed Jul 5, 2024
commit 67cf8a88ea972556b4a73d3df27673b9027cbac6
23 changes: 17 additions & 6 deletions src/coreclr/vm/threadstatics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -806,20 +806,31 @@ bool CanJITOptimizeTLSAccess()
#elif !defined(TARGET_OSX) && defined(TARGET_UNIX) && defined(TARGET_ARM64)
// Optimization is enabled for linux/arm64 only for static resolver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We may have a problem on native AOT too: The codegen assumes that the OS loader is going to be able to give us a constant offset (we effectively hardcode that static resolver):
    // Code sequence to access thread local variable on linux/arm64:
    //
    // mrs xt, tpidr_elf0
    // mov xd, [xt+cns]
    //
    . It is a fine assumption to make when building executables, but it may not work for libraries. (Note that we have 1P workloads that use naot libraries.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about NativeAOT too, but realized that it might be always static resolver.

It is a fine assumption to make when building executables, but it may not work for libraries.

We can detect static/dynamic resolver for JITting on the fly, but for nativeaot, it might be tricky to know that information ahead of time. Possibly, we might have to embed that check in the generated code itself? But that might add an extra check for code for which we mostly get static resolver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For native AOT, we should generate the call to the resolver via indirection just like C/C++ compiler does when building dynamic libraries.

// For static resolver, the TP offset is same for all threads.
// For dynamic resolver, TP offset returned is that of a JIT thread and
// will be different for the executing thread.
// For dynamic resolver, TP offset returned is for the current thread and
// will be different for the other threads.
uint32_t* resolverAddress = reinterpret_cast<uint32_t*>(GetTLSResolverAddress());
if (
int ip = 0;
if ((resolverAddress[ip] == 0xd503201f) || (resolverAddress[ip] == 0xd503241f))
{
// nop might not be present in older resolver, so skip it.
jkotas marked this conversation as resolved.
Show resolved Hide resolved

// nop or hint 32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// nop or hint 32
// nop or bti

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since "hint 32" is an instruction and I preferred having instruction in the comment, should I keep it as-is?

Copy link
Member

@jkotas jkotas Jul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bti is a more self-describing name for the same instruction.

hint is reserved instruction encoding space. They are assigning concrete meaning and names to the instructions in this space as the architecture evolves. BTW: nop is same as hint 0.

((resolverAddress[0] == 0xd503201f) || (resolverAddress[0] == 0xd503241f)) &&
ip++;
}

if (
// ldr x0, [x0, #8]
(resolverAddress[1] == 0xf9400400) &&
(resolverAddress[ip] == 0xf9400400) &&
// ret
(resolverAddress[2] == 0xd65f03c0)
(resolverAddress[ip + 1] == 0xd65f03c0)
)
{
optimizeThreadStaticAccess = true;
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
}
else
{
_ASSERTE(false && "Unexpected code sequence.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that validates this fallback? I expect this question is going to be asked during servicing review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. What is the right way to add a test? Force dynamic resolver to kick in or something? Need to find out how to do that though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rigth:

  • Add a native test library with a lot of thread statics (look for CMakeLists.txt under src\tests for how to add one of those). The thread statics need to be individual thread static variables so that they are allocated individually. Loading this library should eat all non-dynamic statics space and force the dynamic resolver to kick in.

  • Create a test that loads this native test library before loading libcoreclr.so. I am not sure what's the best way to do that. You can try using LD_PRELOAD that points to the native test library; or you can try to load it as a mock host policy using this hook in corerun. cc @jkoritzinsky and @AaronRobinsonMSFT for more thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a native test library with a lot of thread statics

Interesting, my impression was using some kind of compiler flag or an attribute.

Loading this library should eat all non-dynamic statics space and force the dynamic resolver to kick in.

But even if I get it to use the dynamic resolver, the question would be how to check that the TLS optimization is disabled I suppose. I believe I will have to still write managed code containing ThreadStatic variable and I will have to verify that its access was not optimized. That will need somehow getting hold of the jitted method's address and reading the instruction stream of it to confirm that we did not do TLS access? Alternatively, just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this assert, you can add a debug-only config switch that says whether we expect dynamic or static resolver. If we come here and we do not the expected resolver, assert. I think it is good-enough for the test to be effective on checked build only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.

The test should do some thread static accesses from multiple threads to make sure that they work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even if I get it to use the dynamic resolver, the question would be how to check that the TLS optimization is disabled I suppose. I believe I will have to still write managed code containing ThreadStatic variable and I will have to verify that its access was not optimized. That will need somehow getting hold of the jitted method's address and reading the instruction stream of it to confirm that we did not do TLS access? Alternatively, just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.

You can use disasm checks to do this too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a test that loads this native test library before loading libcoreclr.so. I am not sure what's the best way to do that. You can try using LD_PRELOAD that points to the native test library; or you can try to load it as a mock host policy using this hook in corerun. cc @jkoritzinsky and @AaronRobinsonMSFT for more thoughts.

Seems like a reasonable feature. I would create a new environment variables that loads an arbitrary DLL early on. Once tht is in, I will update the host policy mechanism to use that.

}
#else
optimizeThreadStaticAccess = true;
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
#if !defined(TARGET_OSX) && defined(TARGET_UNIX) && defined(TARGET_AMD64)
Expand Down
Loading