[PyTorch Upstream] Inductor UT Regression caused by Triton update: NotImplementedError: elapsed_time is not supported by XPUEvent. #1138

etaf · 2024-05-16T08:10:18Z

Hi, teams, we found an Inductor UT Regression caused by Triton update: NotImplementedError: elapsed_time is not supported by XPUEvent.
This is cause by the commit: 21bd536

The commit is ok, but unfortunately XPU runtime doesn't support it yet.

The detail:

File "/home/xinanlin/xinanlin/pytorch/torch/_dynamo/utils.py", line 210, in time_wrapper
   r = func(*args, **kwargs)
 File "/home/xinanlin/xinanlin/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 687, in benchmark_all_configs
   timings = {
 File "/home/xinanlin/xinanlin/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 688, in <dictcomp>
   launcher: self.bench(launcher, *args, **kwargs)
 File "/home/xinanlin/xinanlin/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 659, in bench
   return do_bench_gpu(kernel_call, rep=40, fast_flush=True)
 File "/home/xinanlin/xinanlin/pytorch/torch/_inductor/runtime/runtime_utils.py", line 112, in do_bench_gpu
   return triton_do_bench(*args, **kwargs)[0]
 File "/home/xinanlin/xinanlin/miniconda3/lib/python3.10/site-packages/triton/testing.py", line 132, in do_bench
   estimate_ms = start_event.elapsed_time(end_event) / 5
 File "/home/xinanlin/xinanlin/pytorch/torch/xpu/streams.py", line 152, in elapsed_time
   return super().elapsed_time(end_event)
NotImplementedError: elapsed_time is not supported by XPUEvent.

The text was updated successfully, but these errors were encountered:

etaf · 2024-05-16T08:11:12Z

@vlad-penkin @riverliuintel This regression blocked our stock pytorch Inductor UT upstream, please pritorize, thanks.

etaf · 2024-05-16T14:00:19Z

From intel pytorch core team, the "elapsed_time is not supported by XPUEvent." will not be supported untill 2025 Q1, that's too late. Maybe we should revert the commit.

etiotto · 2024-05-16T14:29:31Z

This line of code:

 estimate_ms = start_event.elapsed_time(end_event) / 5

Is identical to the OpenAI version of do_bench. We should use torch.xpu.Event to compute the kernel time (just like for NV they use torch.cuda.Event. The code works fine in our CI when we use IPEX intel-extension-for-pytorch 2.1.10+gitfd4fce2 so I suggest upstreaming support in pytorch.

etaf · 2024-05-16T15:54:37Z

@guangyey may I know if you can upstream the feature by the end of this week?

alexbaden · 2024-05-16T21:19:13Z

I have pushed a PR for an initial implementation based on IPEX implementation to PyTorch: pytorch/pytorch#126456

etaf · 2024-05-22T03:20:05Z

Hi, @etiotto @alexbaden @vlad-penkin , We'll skip the affected cases untill XPU Runtime supports elapsed_time in XPUEvent.
Thanks for your support!

vlad-penkin · 2024-05-22T10:43:59Z

@etaf @riverliuintel @EikanWang this is a high priority change for Triton, reopening the issue

riverliuintel · 2024-05-22T14:03:28Z

The current implementation in Triton XPU backend is a regression issue comparing with current Triton version upstream in stock PyTorch. It not only make Inductor UT failure, but also broken the Inductor XPU functionality, as the Inductor auto_tune is an essential feature of the Inductor for PyTorch v2.4 XPU release. We request to revert to original Trion implementation to no miss the PyTorch v2.4 intel GPU feature freeze date before the end of May.

Stonepia · 2024-06-07T09:09:55Z

@vlad-penkin Hello Vlad, in IPEX 2.3, the XPUEvent elapsed_time is removed. Could you remove the changes from https://github.com/intel/intel-xpu-backend-for-triton/pull/1190/files#diff-447aaa7319f4083423e0d1ce5ecbf440e7efe60b5bc527b38f82e982280b98d3R9-R16 ? We no longer need this if statement.

vlad-penkin added this to the 0.1 [PT Upstream] TorchInductor milestone May 16, 2024

vlad-penkin added bug Something isn't working upstream: pytorch dependencies labels May 16, 2024

vlad-penkin assigned alexbaden May 20, 2024

etaf closed this as completed May 22, 2024

vlad-penkin reopened this May 22, 2024

riverliuintel removed the upstream: pytorch label May 22, 2024

riverliuintel removed this from the 0.1 [PT Upstream] TorchInductor milestone May 22, 2024

riverliuintel added the upstream: pytorch label May 24, 2024

riverliuintel added this to the 0.1 [PT Upstream] TorchInductor milestone May 24, 2024

vlad-penkin linked a pull request May 28, 2024 that will close this issue

Use wall time for XPU timing in PyTorch #1190

Merged

alexbaden closed this as completed in #1190 May 29, 2024

Stonepia reopened this Jun 7, 2024

vlad-penkin added the tests: ut label Jun 12, 2024

vlad-penkin modified the milestones: 0.1 [PT Upstream] TorchInductor - 2.4, 0.1 [PT Upstream] TorchInductor - 2.5, 0.1 [PT 2.5 Upstream] TorchInductor, 0.1.1 [Triton / Pytorch] Switch to Pytorch from IPEX Aug 2, 2024

vlad-penkin unassigned alexbaden Aug 4, 2024

vlad-penkin removed this from the 0.1.1 [Triton / Pytorch] Switch to Pytorch from IPEX milestone Aug 18, 2024

vlad-penkin added this to the 0.1 [PT 2.5 Upstream] TorchInductor milestone Aug 18, 2024

vlad-penkin mentioned this issue Aug 18, 2024

Switch to upstream PyTorch, deprecate IPEX dependency #925

Open

vlad-penkin added dependencies: ipex dependencies: pytorch and removed dependencies labels Aug 18, 2024

vlad-penkin modified the milestones: 0.1 [PT 2.5 Upstream] TorchInductor, 0.1 [PT 2.6 Upstream] TorchInductor Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch Upstream] Inductor UT Regression caused by Triton update: NotImplementedError: elapsed_time is not supported by XPUEvent. #1138

[PyTorch Upstream] Inductor UT Regression caused by Triton update: NotImplementedError: elapsed_time is not supported by XPUEvent. #1138

etaf commented May 16, 2024 •

edited

Loading

etaf commented May 16, 2024

etaf commented May 16, 2024 •

edited

Loading

etiotto commented May 16, 2024

etaf commented May 16, 2024

alexbaden commented May 16, 2024

etaf commented May 22, 2024

vlad-penkin commented May 22, 2024

riverliuintel commented May 22, 2024 •

edited

Loading

Stonepia commented Jun 7, 2024

[PyTorch Upstream] Inductor UT Regression caused by Triton update: NotImplementedError: elapsed_time is not supported by XPUEvent. #1138

[PyTorch Upstream] Inductor UT Regression caused by Triton update: NotImplementedError: elapsed_time is not supported by XPUEvent. #1138

Comments

etaf commented May 16, 2024 • edited Loading

etaf commented May 16, 2024

etaf commented May 16, 2024 • edited Loading

etiotto commented May 16, 2024

etaf commented May 16, 2024

alexbaden commented May 16, 2024

etaf commented May 22, 2024

vlad-penkin commented May 22, 2024

riverliuintel commented May 22, 2024 • edited Loading

Stonepia commented Jun 7, 2024

etaf commented May 16, 2024 •

edited

Loading

etaf commented May 16, 2024 •

edited

Loading

riverliuintel commented May 22, 2024 •

edited

Loading