Skip to content

Conversation

AndreyPavlenko
Copy link
Contributor

@AndreyPavlenko AndreyPavlenko commented Jul 25, 2025

To enable the tracking, set the environment variable TRITON_TRACK_DUMP to either 1, true, yes, on, y or a path to a directory where the tracking reports will be dumped.
To add the profiling statistics to the reports, set the TRITON_TRACK_PROFILE environment variable.
To track the kernel launches, set the TRITON_TRACK_RUN environment variable.

Link #4716

@AndreyPavlenko AndreyPavlenko force-pushed the AndreyPavlenko/track branch 3 times, most recently from 41015d0 to 1216480 Compare July 25, 2025 20:48
@AndreyPavlenko AndreyPavlenko changed the title Implemented compile time/size tracking and profiling utility A tracking utility for gathering the compile and/or runtime time, size, profiling and other statistics Jul 25, 2025
@AndreyPavlenko AndreyPavlenko force-pushed the AndreyPavlenko/track branch 2 times, most recently from 7843958 to 9752167 Compare July 29, 2025 13:44
@AndreyPavlenko AndreyPavlenko marked this pull request as ready for review July 29, 2025 18:26
Comment on lines -268 to -269
},
py::call_guard<py::gil_scoped_release>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't allow calling the callback function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make conditional? For example, still use it if pyCb=std::nullopt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's released in the beginning of the lambda and acquired on each callback call.

@anmyachev
Copy link
Contributor

I would also add tests for this utility so that the code does not become outdated unexpectedly.

@Egor-Krivov
Copy link
Contributor

Egor-Krivov commented Aug 11, 2025

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

@AndreyPavlenko
Copy link
Contributor Author

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

@Egor-Krivov
Copy link
Contributor

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

Can you show how to distinguish between them? I think currently I only see a folder with kernel name and inside a lot of files with similar names, like kernel.run_3842.json. Can I somehow extract which run corresponds to which shape? Maybe I could somehow affect naming, like calling some sort of `profiling.label("m32_n32_k32") and affect naming? Or store all results in one large json based on my provided labels?

@AndreyPavlenko
Copy link
Contributor Author

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

Can you show how to distinguish between them? I think currently I only see a folder with kernel name and inside a lot of files with similar names, like kernel.run_3842.json. Can I somehow extract which run corresponds to which shape? Maybe I could somehow affect naming, like calling some sort of `profiling.label("m32_n32_k32") and affect naming? Or store all results in one large json based on my provided labels?

Currently it has the same name as the kernel name and it's difficult to distinguish. A similar issue is discussed here - #4800 (comment) .

kernel.run_3842.json is related to kernel runs tracking, not the compilation, and you probably don't need it. Just do not set the TRITON_TRACK_RUN env var.

@AndreyPavlenko
Copy link
Contributor Author

Now constexprs are added to the kernel names and the grid is added to the kernel runs.

@vlad-penkin vlad-penkin linked an issue Aug 25, 2025 that may be closed by this pull request
@AndreyPavlenko AndreyPavlenko force-pushed the AndreyPavlenko/track branch 3 times, most recently from f16b622 to c465ae3 Compare August 29, 2025 13:43
VERIFY: ${{ (github.event_name == 'pull_request' || github.event_name == 'schedule' || inputs.verify) && '1' || '0' }}
TAG: ${{ inputs.tag || (github.event_name == 'pull_request' && format('pr-{0}', github.event.number)) || (github.event_name == 'schedule' && 'ci') || 'test' }}
N_RUNS: ${{ inputs.n_runs || '1' }}
TRITON_TRACK_DUMP: "$PWD/reports/track"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it optional depending on input from user. It can cause overhead, which can generally be avoided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also enable this profiling for some test in intel folder at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this line. It seems not sufficient, because the dumps are not picked up. It requires adding some additional logic into workflows and probably it's better to do in a separate PR.



def _tr_env(name: str, default: str = "", type: Any = str) -> Any:
return type(os.environ.get(name, default).strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a type, not a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns the value of the specified type - str, int, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. This is why it is not good to override built-in functions. Let's give for type variable another name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Comment on lines +6 to +18
# To enable the tracking, set the environment variable ``TRITON_TRACK_DUMP``
# to either ``1``, ``true``, ``yes``, ``on``, ``y`` or a path to a directory
# where the tracking reports will be dumped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need all these possible values for TRITON_TRACK_DUMP?

I would leave only path to a directory and undefined cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also print dumps into console. So many values to be consistent with other bool env vars, that support all these values.


return decorator(funcOrName) if callable(funcOrName) else decorator

# This ugly hook is used to decorate the upstream functions and avoid circular imports.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do circular imports appear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we decorate the functions in the triton.runtime.jit module from backend, but the backend is called by that module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see decorators in triton.runtime.jit, only in backend/compiler.py. Maybe changing the import will help: https://github.com/intel/intel-xpu-backend-for-triton/pull/4777/files#r2310330084.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not touched the upstream code. The decorators are injected here. We can't do here something like:

from triton.runtime.jit import JITFunction
JITFunction._do_compile = decorate(JITFunction._do_compile)

Because this code is called from triton.runtime.jit and we are getting a circular import.

@AndreyPavlenko AndreyPavlenko force-pushed the AndreyPavlenko/track branch 2 times, most recently from f0c3086 to 9f72bda Compare September 1, 2025 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compile Time Tracking for Key Workloads
3 participants