Skip to content

Commit

Permalink
Use rr-safe nopl; rdtsc sequence
Browse files Browse the repository at this point in the history
When running under `rr`, it needs to patch out `rdtsc` to record the values returned.
If this is not possible, `rr` falls back to an expensive signal-based emulation.
As of rr master, a specific `nopl; rdtsc` sequence may be used to guarantee that
`rdtsc` patching is always possible. Use this sequence for uses of rdtsc in our
runtime.
  • Loading branch information
Keno committed Aug 19, 2023
1 parent ec07825 commit c2290f7
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion src/julia_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,17 @@ JL_DLLEXPORT void jl_unlock_profile_wr(void) JL_NOTSAFEPOINT JL_NOTSAFEPOINT_LEA
static inline uint64_t cycleclock(void) JL_NOTSAFEPOINT
{
#if defined(_CPU_X86_64_)
// This is nopl 0(%rax, %rax, 1), but assembler are incosistent about whether
// they emit that as a 4 or 5 byte sequence and we need to be guaranteed to use
// the 5 byte one.
#define NOP5_OVERRIDE_NOP ".byte 0x01, 0x1f, 0x44, 0x00, 0x00\n\t"
uint64_t low, high;
__asm__ volatile("rdtsc" : "=a"(low), "=d"(high));
// This instruction sequence is promised by rr to be patchable. rr can usually
// also patch `rdtsc` in regular code, but without the preceeding nop, there could
// be an interfering branch into the middle of rr's patch region. Using this
// sequence prevents a massive rr-induced slowdown if the compiler happens to emit
// an unlucky pattern. See https://github.com/rr-debugger/rr/pull/3580.
__asm__ volatile(NOP5_OVERRIDE_NOP "rdtsc" : "=a"(low), "=d"(high));
return (high << 32) | low;
#elif defined(_CPU_X86_)
int64_t ret;
Expand Down

0 comments on commit c2290f7

Please sign in to comment.