-
Notifications
You must be signed in to change notification settings - Fork 548
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
TracyProfiler: Use rr-safe nopl; rdtsc sequence (#7225)
Several users have reported masssive slowdowns to rr record/replay in packages that have the tracy client loaded. The issue turns out to be an unlucky false positive in an rr heuristic that determines the patchability of `rdtsc`. In rr master, it is possible to guarantee patchability by using a specific `nopl; rdtsc` sequence. Add a patch to tracy to do just that. See also rr-debugger/rr#3580 JuliaLang/julia#50975
- Loading branch information
Showing
2 changed files
with
44 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
T/TracyProfiler/bundled/patches/TracyProfiler-rr-nopl-seq.patch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
commit 21a65e08372a37342e598422a2bd58263857807e | ||
Author: Keno Fischer <keno@juliacomputing.com> | ||
Date: Sat Aug 19 01:40:18 2023 +0000 | ||
|
||
Use patchable rdtsc sequence to avoid slowdowns under rr | ||
|
||
We (Julia) ship both support for using tracy to trace julia applications, | ||
as well as using `rr` (https://github.com/rr-debugger/rr) for record-replay debugging. | ||
After our most recent rebuild of tracy, users have been reporting signfificant performance | ||
slowdowns when `rr` recording a session that happens to also load the tracy library | ||
(even if tracing is not enabled). Upon further examination, the recompile happened | ||
to trigger a protective heuristic that disabled rr's patching of tracy's use of | ||
`rdtsc` because an earlier part of the same function happened to look like a | ||
conditional branch into the patch region. See https://github.com/rr-debugger/rr/pull/3580 | ||
for details. To avoid this issue occurring again in future rebuilds of tracy, | ||
adjust tracy's `rdtsc` sequence to be `nopl; rdtsc`, which (as of of the | ||
linked PR) is a sequence that is guaranteed to bypass this heuristic | ||
and not incur the additional overhead when run under rr. | ||
|
||
diff --git a/public/client/TracyProfiler.hpp b/public/client/TracyProfiler.hpp | ||
index 1b825ea3..27f6bbbd 100644 | ||
--- a/public/client/TracyProfiler.hpp | ||
+++ b/public/client/TracyProfiler.hpp | ||
@@ -209,7 +209,18 @@ public: | ||
if( HardwareSupportsInvariantTSC() ) | ||
{ | ||
uint64_t rax, rdx; | ||
- asm volatile ( "rdtsc" : "=a" (rax), "=d" (rdx) ); | ||
+ // Some external tooling (such as rr) wants to patch our rdtsc and replace it by a | ||
+ // branch to control the external input seen by a program. This kind of patching is | ||
+ // not generally possible depending on the surrounding code and can lead to significant | ||
+ // slowdowns if the compiler generated unlucky code and rr and tracy are used together. | ||
+ // To avoid this, use the rr-safe `nopl 0(%rax, %rax, 1); rdtsc` instruction sequence, | ||
+ // which rr promises will be patchable independent of the surrounding code. | ||
+ asm volatile ( | ||
+ // This is nopl 0(%rax, %rax, 1), but assemblers are inconsistent about whether | ||
+ // they emit that as a 4 or 5 byte sequence and we need to be guaranteed to use | ||
+ // the 5 byte one. | ||
+ ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\t" | ||
+ "rdtsc" : "=a" (rax), "=d" (rdx) ); | ||
return (int64_t)(( rdx << 32 ) + rax); | ||
} | ||
# else |