You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have verified different configurations from the E2E performance perspective, like enlarging num_warps. But it does not impact E2E performance significantly. In the future, we will utilize Inductor autotune to get the better configuration.
The configuration information is dumped from the log information of the inductor autotune.
From the log, it seems that the num_warps is only up to 8 and the 8 is the best performance among the value: 2, 4, 8.
It's just in case it seems doesn't try the larger num_warps on Intel GPU.
It is fine as long as the performance is good with the small num_warps.
The IPEX uses the conservative num_warps on Triton intel GPU. Unlike the NV, Intel GPU support the num_warps up to 64.
The Torch inductor may chose the sub-optimal Triton kernel to use.
The text was updated successfully, but these errors were encountered: