Skip to content

Add-ons to C API#1293

Open
slomp wants to merge 5 commits intowolfpld:masterfrom
slomp:slomp/c-api-addons
Open

Add-ons to C API#1293
slomp wants to merge 5 commits intowolfpld:masterfrom
slomp:slomp/c-api-addons

Conversation

@slomp
Copy link
Contributor

@slomp slomp commented Feb 28, 2026

Some new C interfaces to help integrate Tracy with other languages and scripting tools:

  1. ability to access tracy::Profiler::GetTime()
  2. bugfix where thread-id of GpuNewContext messages must be 0 (also in line with all other GPU backends)
  3. ability to manually set the threadId for GPU Begin/End Zone messages (some languages rely on co-routines that can flow through execution threads)

I am not sure how widespread the C API is for GPU out there.
Item 3 above could technically break existing clients (if the struct is not properly zero-initialized), but at the same time uses would have stumbled upon 2. above and being unable to use the GPU C API.

struct ___tracy_gpu_zone_end_data {
uint16_t queryId;
uint8_t context;
uint32_t thread;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some languages (like Swift) seem to rely on fibers/co-routines shenanigans under the hood, so it's possible for execution context to "zone begin" in one thread, and "zone end" in another thread.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't see any usage for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the behavior is a "crash" in the profiler because it leads to Gpu Zone End messages without a matching Gpu Zone Begin message. In the case of CPU zones, the workaround was to use Tracy Fibers to just manipulate (fake) the thread-ids explicitly and have things show up in the timeline. There's no fiber-equivalent for the GPU events.

(I'll ask for the exact error; at the moment I'm sort of playing a game of "Telephone" here relaying messages from people that are more actively involved with using Tracy in the context of other languages and scrips.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... Ugh... It looks like the git patch I applied to create this PR was botched :/

(I can see now why you asked about the thread member specifically in ___tracy_gpu_zone_end_data, since it's not used. I'll get back to you once I get more context from the folks).

@slomp slomp force-pushed the slomp/c-api-addons branch from 78a2637 to e8ef021 Compare February 28, 2026 03:56
@slomp
Copy link
Contributor Author

slomp commented Mar 2, 2026

Ok, so I looked at the other (C++ API) GPU back-ends, and noticed this:

TracyOpenGL.hpp sets thread to 0 when formulating the end message:

memset( &item->gpuZoneEnd.thread, 0, sizeof( item->gpuZoneEnd.thread ) );

But the other GPU back-ends set it to GetThreadHandle():
OpenCL:

MemWrite(&item->gpuZoneEnd.thread, GetThreadHandle());

Vulkan:
MemWrite( &item->gpuZoneEnd.thread, GetThreadHandle() );

D3D11:
MemWrite( &item->gpuZoneEnd.thread, GetThreadHandle() );

D3D12:
MemWrite(&item->gpuZoneEnd.thread, GetThreadHandle());

Metal:
MemWrite( &item->gpuZoneEnd.thread, GetThreadHandle() );

So this is probably something that went on unnoticed for a long time.
My guess is that 0 for end-gpu-zone is sort of an "implicit don't care", given OpenGL has historically been a context-centric single-threaded command stream, so the end-gpu-zone implicit expects the same thread of the begin-gpu-zone.

Then there's the matter of "gpu begin" messages only handling fibers in the OpenGL back-end:

#ifdef TRACY_FIBERS
TracyLfqPrepare( QueueType::GpuZoneBegin );
memset( &item->gpuZoneBegin.thread, 0, sizeof( item->gpuZoneBegin.thread ) );
#else
GetProfiler().SendCallstack( depth );
TracyLfqPrepare( QueueType::GpuZoneBeginCallstack );
MemWrite( &item->gpuZoneBegin.thread, GetThreadHandle() );
#endif

Not sure which approach is right or wrong here...

@wolfpld
Copy link
Owner

wolfpld commented Mar 2, 2026

So this is probably something that went on unnoticed for a long time.
My guess is that 0 for end-gpu-zone is sort of an "implicit don't care", given OpenGL has historically been a context-centric single-threaded command stream, so the end-gpu-zone implicit expects the same thread of the begin-gpu-zone.

Not really. The OpenGL and Vulkan implementations are different, and for a reason. These two APIs are the only ones I touched. What happens for the other APIs I don't know.

Now, I don't remember the details, so the analysis below is AI assisted (the input was your post above).

Architecture

The server has two modes for GPU contexts (TracyWorker.cpp:5822-5834):

Mode Context Types ctx->thread Zone Thread ID
Thread-bound OpenGL, D3D11 != 0 Ignored (ztid = 0)
Unbound Vulkan, OpenCL, D3D12, Metal == 0 Used (ztid = ev.thread)

→ This is in-line with what I vaguely remember.

1. Critical Bug: D3D11 (TracyD3D11.hpp:132)

MemWrite( &item->gpuNewContext.thread, uint32_t(0) );  // #TODO: why not GetThreadHandle()?

D3D11 is single-threaded like OpenGL but sets thread = 0, causing the server to treat it as unbound. This breaks zone matching.

→ I don't know about D3D11. If it's single-threaded like OpenGL, then it should follow OpenGL patterns.

2. Minor: OpenGL end message (TracyOpenGL.hpp:311)

memset( &item->gpuZoneEnd.thread, 0, sizeof( item->gpuZoneEnd.thread ) );

This is functionally harmless since OpenGL's ctx->thread != 0 means the server ignores zone thread IDs anyway. But it's inconsistent with other backends.

→ I think this answers your question. You can see this confirmed in 0f68e1e. Getting the thread id has a cost.

3. Potential Issue: Metal (TracyMetal.hmm:351)

MemWrite(&item->gpuNewContext.thread, uint32_t(0)); // TODO: why not GetThreadHandle()?

Metal uses thread = 0 (unbound mode) then GetThreadHandle() for zones - this matches Vulkan/OpenCL pattern and likely works correctly for command-queue-based APIs.

→ Again, I don't know anything about metal.

4. FIBERS: OpenGL-only with incomplete logic

Only OpenGL has TRACY_FIBERS guards (lines 237, 286). For thread-bound contexts this is moot anyway. For unbound contexts, fibers support is missing entirely.

733d267 I don't remember what this was about.

@slomp
Copy link
Contributor Author

slomp commented Mar 2, 2026

Ok, spent a few minutes reviewing the server code for GPU zone processing.

The thread specified in GpuNewContext determines whether or not the given context is implicitly tied to that thread (ctx->thread != 0), or if any thread can post events to that context (ctx->thread == 0).

So, the problem with the current GPU C API is that, when crating a new GPU context, there's no way to make it "thread-less":

tracy::MemWrite( &item->gpuNewContext.thread, tracy::GetThreadHandle() );

tracy::MemWrite( &item->gpuNewContext.thread, tracy::GetThreadHandle() );

To accommodate for that, we'd need to add a thread field to ___tracy_gpu_new_context_data:

tracy/public/tracy/TracyC.h

Lines 184 to 190 in 80c849a

struct ___tracy_gpu_new_context_data {
int64_t gpuTime;
float period;
uint8_t context;
uint8_t flags;
uint8_t type;
};

Doing so would then necessitate a way to set the thread field in the GPU begin/end API.
Always sending 0 during GPU zone end messages creates a problem, because ProcessGpuZoneEnd indiscriminately goes for ctx->threadData.find( ev.thread ), irrespective of the context thread type.

tracy/server/TracyWorker.cpp

Lines 5920 to 5921 in aa30c61

auto td = ctx->threadData.find( ev.thread );
assert( td != ctx->threadData.end() );

Alternatively, we could add to ProcessGpuZoneEnd the same context thread logic we apply during ProcessGpuZoneBeginImplCommon:

tracy/server/TracyWorker.cpp

Lines 5821 to 5838 in aa30c61

uint64_t ztid;
if( ctx->thread == 0 )
{
// Vulkan, OpenCL and Direct3D 12 contexts are not bound to any single thread.
zone->SetThread( CompressThread( ev.thread ) );
ztid = ev.thread;
}
else
{
// OpenGL and Direct3D11 doesn't need per-zone thread id. It still can be sent,
// because it may be needed for callstack collection purposes.
zone->SetThread( 0 );
ztid = 0;
}
if( m_data.lastTime < time ) m_data.lastTime = time;
auto td = ctx->threadData.find( ztid );

(not sure whether the comment "It still can be sent, because it may be needed for callstack collection purposes" above still has any practical use).


Maybe we need both. This would make begin/end processing more consistent to each other, and would allow the Tracy C GPU API to be used in languages that implicitly rely on co-routines to manage/schedule threads around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants