Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TLS on linux/arm64 only for static resolver #104408

Closed
wants to merge 3 commits into from

Conversation

kunalspathak
Copy link
Member

If the TLS is resolved by the dynamic resolver, the TP offset returned if for calling (JITting) thread and it is wrong to embed that information in the JIT code which will get executed by an execution thread. Doing that gets SEGFAULT. So detect what type of resolver it is and depending on that disable TLS if it is dynamic resolver.

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@kunalspathak
Copy link
Member Author

@oldzhu confirmed that this solution works.

uint32_t* resolverAddress = reinterpret_cast<uint32_t*>(GetTLSResolverAddress());
if (
// nop or hint 32
((resolverAddress[0] == 0xd503201f) || (resolverAddress[0] == 0xd503241f)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work correctly for musl glibc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because we end up with dynamic resolver on musl? It would be good to find out if this fix covers the musl too (and that ifdef above can be deleted) or if there is something else going on with musl.

Copy link
Member Author

@kunalspathak kunalspathak Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because we end up with dynamic resolver on musl?

We originally disabled it, probably because of same reason, but do you think that enabling for musl/arm64 can be done as a separate change?

EDIT: I am planning to backport this to 8.0, so might be good to just fix it for targets we have this optimization enabled and do a separate change to enable it for musl/arm64 in .net9.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with enabling it for musl in a separate change. We should understand what's going on with musl before backporting to make sure we are not missing anything.

Copy link
Member Author

@kunalspathak kunalspathak Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should understand what's going on with musl before backporting to make sure we are not missing anything.

Just to make sure I understand, if it turns out to be same issue for musl as well, we still do not want to enable musl/arm64 for 8.0, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the servicing fix should be the minimal change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything else needed for this PR then?

{
// nop might not be present in older resolver, so skip it.

// nop or hint 32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// nop or hint 32
// nop or bti

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since "hint 32" is an instruction and I preferred having instruction in the comment, should I keep it as-is?

Copy link
Member

@jkotas jkotas Jul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bti is a more self-describing name for the same instruction.

hint is reserved instruction encoding space. They are assigning concrete meaning and names to the instructions in this space as the architecture evolves. BTW: nop is same as hint 0.

}
else
{
_ASSERTE(false && "Unexpected code sequence.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that validates this fallback? I expect this question is going to be asked during servicing review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. What is the right way to add a test? Force dynamic resolver to kick in or something? Need to find out how to do that though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rigth:

  • Add a native test library with a lot of thread statics (look for CMakeLists.txt under src\tests for how to add one of those). The thread statics need to be individual thread static variables so that they are allocated individually. Loading this library should eat all non-dynamic statics space and force the dynamic resolver to kick in.

  • Create a test that loads this native test library before loading libcoreclr.so. I am not sure what's the best way to do that. You can try using LD_PRELOAD that points to the native test library; or you can try to load it as a mock host policy using this hook in corerun. cc @jkoritzinsky and @AaronRobinsonMSFT for more thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a native test library with a lot of thread statics

Interesting, my impression was using some kind of compiler flag or an attribute.

Loading this library should eat all non-dynamic statics space and force the dynamic resolver to kick in.

But even if I get it to use the dynamic resolver, the question would be how to check that the TLS optimization is disabled I suppose. I believe I will have to still write managed code containing ThreadStatic variable and I will have to verify that its access was not optimized. That will need somehow getting hold of the jitted method's address and reading the instruction stream of it to confirm that we did not do TLS access? Alternatively, just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this assert, you can add a debug-only config switch that says whether we expect dynamic or static resolver. If we come here and we do not the expected resolver, assert. I think it is good-enough for the test to be effective on checked build only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.

The test should do some thread static accesses from multiple threads to make sure that they work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even if I get it to use the dynamic resolver, the question would be how to check that the TLS optimization is disabled I suppose. I believe I will have to still write managed code containing ThreadStatic variable and I will have to verify that its access was not optimized. That will need somehow getting hold of the jitted method's address and reading the instruction stream of it to confirm that we did not do TLS access? Alternatively, just add a check here that the instruction stream we get is that of the dynamic resolver, but not sure if we will test anything interesting with that.

You can use disasm checks to do this too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a test that loads this native test library before loading libcoreclr.so. I am not sure what's the best way to do that. You can try using LD_PRELOAD that points to the native test library; or you can try to load it as a mock host policy using this hook in corerun. cc @jkoritzinsky and @AaronRobinsonMSFT for more thoughts.

Seems like a reasonable feature. I would create a new environment variables that loads an arbitrary DLL early on. Once tht is in, I will update the host policy mechanism to use that.

@@ -799,6 +803,34 @@ bool CanJITOptimizeTLSAccess()
// Optimization is disabled for FreeBSD/arm64
#elif defined(FEATURE_INTERPRETER)
// Optimization is disabled when interpreter may be used
#elif !defined(TARGET_OSX) && defined(TARGET_UNIX) && defined(TARGET_ARM64)
// Optimization is enabled for linux/arm64 only for static resolver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We may have a problem on native AOT too: The codegen assumes that the OS loader is going to be able to give us a constant offset (we effectively hardcode that static resolver):
    // Code sequence to access thread local variable on linux/arm64:
    //
    // mrs xt, tpidr_elf0
    // mov xd, [xt+cns]
    //
    . It is a fine assumption to make when building executables, but it may not work for libraries. (Note that we have 1P workloads that use naot libraries.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about NativeAOT too, but realized that it might be always static resolver.

It is a fine assumption to make when building executables, but it may not work for libraries.

We can detect static/dynamic resolver for JITting on the fly, but for nativeaot, it might be tricky to know that information ahead of time. Possibly, we might have to embed that check in the generated code itself? But that might add an extra check for code for which we mostly get static resolver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For native AOT, we should generate the call to the resolver via indirection just like C/C++ compiler does when building dynamic libraries.

Comment on lines +803 to +807
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -32
adrp x0, :tlsdesc:t_ThreadStatics
ldr x1, [x0, #:tlsdesc_lo12:t_ThreadStatics]
mov x0, x1
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 32
Copy link
Contributor

@shushanhf shushanhf Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -32
adrp x0, :tlsdesc:t_ThreadStatics
ldr x1, [x0, #:tlsdesc_lo12:t_ThreadStatics]
mov x0, x1
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 32
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -16
adrp x0, :tlsdesc:t_ThreadStatics
ldr x0 [x0, #:tlsdesc_lo12:t_ThreadStatics]
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 16

Is this ok ?

@jkotas
Copy link
Member

jkotas commented Aug 7, 2024

Superseded by #106052

@jkotas jkotas closed this Aug 7, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants