Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TLS on linux/arm64 only for static resolver #104408

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions src/coreclr/vm/arm64/asmhelpers.S
Original file line number Diff line number Diff line change
Expand Up @@ -794,4 +794,18 @@ LEAF_ENTRY GetThreadStaticsVariableOffset, _TEXT
EPILOG_RETURN
LEAF_END GetThreadStaticsVariableOffset, _TEXT
// ------------------------------------------------------------------

// ------------------------------------------------------------------
// size_t GetTLSResolverAddress()

// Helper to get the TLS resolver address. This will be then used to determine if we have a static or dynamic resolver.
LEAF_ENTRY GetTLSResolverAddress, _TEXT
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -32
adrp x0, :tlsdesc:t_ThreadStatics
ldr x1, [x0, #:tlsdesc_lo12:t_ThreadStatics]
mov x0, x1
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 32
Comment on lines +803 to +807
Copy link
Contributor

@shushanhf shushanhf Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -32
adrp x0, :tlsdesc:t_ThreadStatics
ldr x1, [x0, #:tlsdesc_lo12:t_ThreadStatics]
mov x0, x1
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 32
PROLOG_SAVE_REG_PAIR_INDEXED fp, lr, -16
adrp x0, :tlsdesc:t_ThreadStatics
ldr x0 [x0, #:tlsdesc_lo12:t_ThreadStatics]
EPILOG_RESTORE_REG_PAIR_INDEXED fp, lr, 16

Is this ok ?

EPILOG_RETURN
LEAF_END GetTLSResolverAddress, _TEXT
// ------------------------------------------------------------------
#endif // !TARGET_OSX
21 changes: 21 additions & 0 deletions src/coreclr/vm/threadstatics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -784,6 +784,10 @@ void FreeTLSIndicesForLoaderAllocator(LoaderAllocator *pLoaderAllocator)

static void* GetTlsIndexObjectAddress();

#if !defined(TARGET_OSX) && defined(TARGET_UNIX) && defined(TARGET_ARM64)
extern "C" size_t GetTLSResolverAddress();
#endif // !TARGET_OSX && TARGET_UNIX && TARGET_ARM64

bool CanJITOptimizeTLSAccess()
{
LIMITED_METHOD_CONTRACT;
Expand All @@ -799,6 +803,23 @@ bool CanJITOptimizeTLSAccess()
// Optimization is disabled for FreeBSD/arm64
#elif defined(FEATURE_INTERPRETER)
// Optimization is disabled when interpreter may be used
#elif !defined(TARGET_OSX) && defined(TARGET_UNIX) && defined(TARGET_ARM64)
// Optimization is enabled for linux/arm64 only for static resolver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We may have a problem on native AOT too: The codegen assumes that the OS loader is going to be able to give us a constant offset (we effectively hardcode that static resolver):
    // Code sequence to access thread local variable on linux/arm64:
    //
    // mrs xt, tpidr_elf0
    // mov xd, [xt+cns]
    //
    . It is a fine assumption to make when building executables, but it may not work for libraries. (Note that we have 1P workloads that use naot libraries.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about NativeAOT too, but realized that it might be always static resolver.

It is a fine assumption to make when building executables, but it may not work for libraries.

We can detect static/dynamic resolver for JITting on the fly, but for nativeaot, it might be tricky to know that information ahead of time. Possibly, we might have to embed that check in the generated code itself? But that might add an extra check for code for which we mostly get static resolver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For native AOT, we should generate the call to the resolver via indirection just like C/C++ compiler does when building dynamic libraries.

// For static resolver, the TP offset is same for all threads.
// For dynamic resolver, TP offset returned is that of a JIT thread and
// will be different for the executing thread.
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
uint32_t* resolverAddress = reinterpret_cast<uint32_t*>(GetTLSResolverAddress());
if (
// nop or hint 32
((resolverAddress[0] == 0xd503201f) || (resolverAddress[0] == 0xd503241f)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work correctly for musl glibc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because we end up with dynamic resolver on musl? It would be good to find out if this fix covers the musl too (and that ifdef above can be deleted) or if there is something else going on with musl.

Copy link
Member Author

@kunalspathak kunalspathak Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because we end up with dynamic resolver on musl?

We originally disabled it, probably because of same reason, but do you think that enabling for musl/arm64 can be done as a separate change?

EDIT: I am planning to backport this to 8.0, so might be good to just fix it for targets we have this optimization enabled and do a separate change to enable it for musl/arm64 in .net9.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with enabling it for musl in a separate change. We should understand what's going on with musl before backporting to make sure we are not missing anything.

Copy link
Member Author

@kunalspathak kunalspathak Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should understand what's going on with musl before backporting to make sure we are not missing anything.

Just to make sure I understand, if it turns out to be same issue for musl as well, we still do not want to enable musl/arm64 for 8.0, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the servicing fix should be the minimal change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything else needed for this PR then?

// ldr x0, [x0, #8]
(resolverAddress[1] == 0xf9400400) &&
// ret
(resolverAddress[2] == 0xd65f03c0)
)
{
optimizeThreadStaticAccess = true;
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
}
#else
optimizeThreadStaticAccess = true;
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
#if !defined(TARGET_OSX) && defined(TARGET_UNIX) && defined(TARGET_AMD64)
Expand Down
Loading