[Torch Inductor] The IPEX uses conservative configuration on Triton Intel GPU. #759

chengjunlu · 2024-03-27T01:05:29Z

The IPEX uses the conservative num_warps on Triton intel GPU. Unlike the NV, Intel GPU support the num_warps up to 64.

The Torch inductor may chose the sub-optimal Triton kernel to use.

[2024-03-26 09:25:34,485] torch._inductor.triton_heuristics: [DEBUG] Benchmark all input configs get:
[2024-03-26 09:25:34,485] torch._inductor.triton_heuristics: [DEBUG] XBLOCK: 1, num_warps: 2, num_ctas: 1, num_stages: 1: 5.263040, nreg 0, nspill 0, #shared-mem 0
[2024-03-26 09:25:34,485] torch._inductor.triton_heuristics: [DEBUG] XBLOCK: 8, num_warps: 4, num_ctas: 1, num_stages: 1: 0.997920, nreg 0, nspill 0, #shared-mem 0
[2024-03-26 09:25:34,485] torch._inductor.triton_heuristics: [DEBUG] XBLOCK: 32, num_warps: 8, num_ctas: 1, num_stages: 1: 0.689840, nreg 0, nspill 0, #shared-mem 0

The text was updated successfully, but these errors were encountered:

riverliuintel · 2024-04-01T23:12:21Z

@EikanWang Any comments about this ticket?

EikanWang · 2024-04-25T01:24:11Z

We have verified different configurations from the E2E performance perspective, like enlarging num_warps. But it does not impact E2E performance significantly. In the future, we will utilize Inductor autotune to get the better configuration.

What's the impact on the Triton?

chengjunlu · 2024-04-25T02:07:41Z

The configuration information is dumped from the log information of the inductor autotune.
From the log, it seems that the num_warps is only up to 8 and the 8 is the best performance among the value: 2, 4, 8.
It's just in case it seems doesn't try the larger num_warps on Intel GPU.

It is fine as long as the performance is good with the small num_warps.

vlad-penkin added tests: e2e performance labels Mar 28, 2024

vlad-penkin added this to the 04.2 E2E performance milestone Mar 28, 2024

vlad-penkin self-assigned this Apr 1, 2024

riverliuintel assigned EikanWang and unassigned vlad-penkin Apr 1, 2024

vlad-penkin added the enhancement New feature or request label Apr 17, 2024

vlad-penkin modified the milestones: 4.2 [Performance] E2E, 4.0 [Performance] Core Apr 22, 2024

vlad-penkin modified the milestones: 4.0 [Performance] Core, 0.1 [PT Upstream] TorchInductor May 13, 2024

vlad-penkin added the upstream: pytorch label Jun 12, 2024

vlad-penkin modified the milestones: 0.1 [PT Upstream] TorchInductor - 2.4, 0.1 [PT Upstream] TorchInductor - 2.5 Aug 2, 2024

vlad-penkin modified the milestones: 0.1 [PT 2.5 Upstream] TorchInductor, 0.1 [PT 2.6 Upstream] TorchInductor Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch Inductor] The IPEX uses conservative configuration on Triton Intel GPU. #759

[Torch Inductor] The IPEX uses conservative configuration on Triton Intel GPU. #759

chengjunlu commented Mar 27, 2024

riverliuintel commented Apr 1, 2024

EikanWang commented Apr 25, 2024

chengjunlu commented Apr 25, 2024 •

edited

Loading

[Torch Inductor] The IPEX uses conservative configuration on Triton Intel GPU. #759

[Torch Inductor] The IPEX uses conservative configuration on Triton Intel GPU. #759

Comments

chengjunlu commented Mar 27, 2024

riverliuintel commented Apr 1, 2024

EikanWang commented Apr 25, 2024

chengjunlu commented Apr 25, 2024 • edited Loading

chengjunlu commented Apr 25, 2024 •

edited

Loading