xpu: support xpu backend from stock pytorch (>=2.4) #2825

dvrogozh · 2024-06-04T17:02:17Z

XPU backend is available in the stock PyTorch starting from version 2.4 [1]. This commit extends huggingface accelerate to support XPU from both IPEX and the stock pytorch. IPEX is being tried first.

Raising this PR as WIP and Draft to facilitate further discussion around XPU backend enabling in huggingface and be able to communicate observed XPU issues back to PyTorch.

[1] pytorch/pytorch#114842

@EikanWang, @fengyuan14, @guangyey, @jgong5, @kding1, @sywangyi

Fixes: huggingface#31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Requires: huggingface/accelerate#2825 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

muellerzr

Thanks a bunch for doing this! Great start, just one question 🤗

src/accelerate/utils/imports.py

src/accelerate/accelerator.py

Fixes: huggingface#31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Requires: huggingface/accelerate#2825 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

muellerzr

Thanks! Looking much better. Just a nit

src/accelerate/utils/imports.py

HuggingFaceDocBuilderDev · 2024-06-05T19:47:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dvrogozh · 2024-06-06T15:40:26Z

src/accelerate/accelerator.py

-                model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=dtype, inplace=True, level="O1")
+                # torch.xpu.optimize is available only for xpu via IPEX
+                if hasattr(torch.xpu, "optimize"):
+                    model, optimizer = torch.xpu.optimize(


@muellerzr : I think I figured it out why there are 2 different calls (torch.xpu.optimize and ipex.optimize) in the current code. It occurs that IPEX can be built in 2 distinct ways:

The first way is to build it with Intel GPUs support, i.e. IPEX-XPU. If it's built this way, we get monkey-patches applied and torch.xpu.optimize() is exposed which is a handy version of ipex.optimize() which is also available.

The second way to built IPEX is with CPU support. In this case there is no any GPUs support, no monkey-patches available and no XPU is available. In this case the only thing which is available is ipex.optimize()

So, it seems that current Huggingface accelerate code is supporting both paths, IPEX-XPU and IPEX-CPU. That's if judging from the source code. Interesting to note that the https://github.com/huggingface/accelerate/blob/main/docs/source/usage_guides/ipex.md talks only about IPEX-CPU and does not mention IPEX-XPU... I wonder whether IPEX-XPU is fully enabled in Huggingface?

Summarizing, I think I need to update my PR covering all 3 cases: 1) IPEX-XPU, 2) IPEX-CPU, 3) XPU in stock PyTorch. I will try to rewrite in a way that it will be clear which options are there on the plate, will add some comments to the code.

FYI The easiest way to check behavior would be probably trying out IPEX containers from https://hub.docker.com/r/intel/intel-optimized-pytorch. Here are some printouts:

# IPEX CPU $ docker run -it --rm --privileged intel/intel-extension-for-pytorch:2.3.0-pip-base python3 -c 'import torch; import intel_extension_for_pytorch; print(torch.xpu.is_available())' False $ docker run -it --rm --privileged intel/intel-extension-for-pytorch:2.3.0-pip-base python3 -c 'import torch; import intel_extension_for_pytorch as ipex; print(hasattr(ipex, "optimize"))' True $ docker run -it --rm --privileged intel/intel-extension-for-pytorch:2.3.0-pip-base python3 -c 'import torch; import intel_extension_for_pytorch as ipex; print(hasattr(torch.xpu, "optimize"))' False # IPEX XPU $ docker run -it --rm --privileged intel/intel-extension-for-pytorch:2.1.30-xpu python3 -c 'import torch; import intel_extension_for_pytorch; print(torch.xpu.is_available())' True $ docker run -it --rm --privileged intel/intel-extension-for-pytorch:2.1.30-xpu python3 -c 'import torch; import intel_extension_for_pytorch as ipex; print(hasattr(ipex, "optimize"))' True $ docker run -it --rm --privileged intel/intel-extension-for-pytorch:2.1.30-xpu python3 -c 'import torch; import intel_extension_for_pytorch as ipex; print(hasattr(torch.xpu, "optimize"))' True

@muellerzr : I reworked the PR according to above. Please, help to review again.

dvrogozh · 2024-06-07T17:03:40Z

I tried this PR (+ huggingface/transformers#31238) as much as I could in the IPEX-CPU, IPEX-XPU, Pytorch-XPU, Pytorch-CPU scenarios. Tried to run some tests from accelerate and transformers and some examples from transformers. All seem to work engaging with XPU when expected. I promote these PRs from drafts for the qualified review. Let me know if any concerns or any feedback needs to be addressed.

dvrogozh · 2024-06-10T16:47:23Z

Applied doc-builder style src/accelerate docs/source --max_len 119 to fix format issues identified by ci.

dvrogozh · 2024-06-11T15:46:41Z

@muellerzr : can you, please, help to run ci again? Also, is there anything else I can help with fixing in this PR to get it merged?

dvrogozh · 2024-06-11T20:28:46Z

I did not see such a failure before on this PR. Can this be something random since I can't associate this failure with the changes made. I also tried this locally and test worked for me running on cpu. @muellerzr, can you, please, advise?

FAILED tests/test_accelerator.py::AcceleratorTester::test_save_load_model_with_hooks_use_pytorch - assert 0.0007739067077636719 > 0.001
 +  where 0.0007739067077636719 = abs((4.019573211669922 - 4.0203471183776855))
 +    where 4.0203471183776855 = get_signature(Linear(in_features=2, out_features=4, bias=True))

Fixes: huggingface#31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Requires: huggingface/accelerate#2825 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh · 2024-06-13T13:53:24Z

@SunMarc : thank you for retriggering failed ci. I see it's passing now. I guess my assumption that this was sporadic failure is true.

@SunMarc, @muellerzr : I have outlined current status of xpu backend in pytorch in huggingface/transformers#31237. There are a number of issues in xpu backend which are being worked on right now. I believe however that this PR and PR in transformers (huggingface/transformers#31237) are ready as the first step to enable xpu backend in huggingface on top of which we can gradually improve the support. Can you, please, outline acceptance requirements for these PRs on Huggingface side?

muellerzr

Thanks! This looks great to me, thank you for the improvement!

cc @SunMarc for a second pair of eyes, else we can merge it after the nit has been addressed!

muellerzr · 2024-06-13T14:12:00Z

src/accelerate/utils/imports.py

+    if importlib.util.find_spec("torch") is None:
+        return False
+
+    import torch


This part can actually be removed, as accelerate always requires PyTorch :)

Indeed:

accelerate/src/accelerate/utils/imports.py

Line 21 in 91a2599

import torch

I copied this from is_npu_available above. Which by the way also re-imports torch. Do you want me to also fix is_npu_available() in this PR?

accelerate/src/accelerate/utils/imports.py

Lines 362 to 367 in 91a2599

if importlib.util.find_spec("torch") is None or importlib.util.find_spec("torch_npu") is None:

return False

import torch

import torch_npu # noqa: F401

This part can actually be removed, as accelerate always requires PyTorch :)

Fixed. I will submit npu/mlu in a separate cleanup PR unless you will tell me otherwise.

I will submit npu/mlu in a separate cleanup PR unless you will tell me otherwise.

Submitted #2856.

Fixes: huggingface/transformers#31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface accelerate to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

SunMarc

LGTM ! Just a small nit

SunMarc · 2024-06-13T14:28:43Z

src/accelerate/accelerator.py

-                )
-            else:
+            # ipex.optimize() is available only for IPEX, both IPEX-CPU and IPEX-XPU
+            if is_ipex_available():


Maybe change that by self.state.use_ipex (or add at least)?

I afraid this might break IPEX-XPU path.

Note that use_ipex is currently used with CPU path to differentiate the case when IPEX-CPU optimization should or should not be used:

accelerate/src/accelerate/accelerator.py

Line 1288 in c0faec7

if self.device.type == "cpu" and self.state.use_ipex:

In case of IPEX-XPU this flag was not used and I am not sure whether it will be =True.

Sound good !

Fixes: huggingface#31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Requires: huggingface/accelerate#2825 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

muellerzr

Thanks for doing this! Next step: transformers :)

(Also if you want the version this will release on, accelerate==0.32.0)

Fixes: huggingface#31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Requires: huggingface/accelerate#2825 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* xpu: support xpu backend from stock pytorch (>=2.4) Fixes: #31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Requires: huggingface/accelerate#2825 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * xpu: enable gpt2 and decision_transformer tests for xpu pytorch backend Note that running xpu tests requires TRANSFORMERS_TEST_DEVICE_SPEC=spec.py passed to the test runner: import torch DEVICE_NAME = 'xpu' MANUAL_SEED_FN = torch.xpu.manual_seed EMPTY_CACHE_FN = torch.xpu.empty_cache DEVICE_COUNT_FN = torch.xpu.device_count Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> --------- Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

This was referenced Jun 4, 2024

xpu: support xpu backend from stock pytorch (>=2.4) huggingface/transformers#31238

Merged

xpu: Support new PyTorch XPU backend (>=2.4) huggingface/transformers#31237

Closed

muellerzr reviewed Jun 4, 2024

View reviewed changes

src/accelerate/utils/imports.py Outdated Show resolved Hide resolved

src/accelerate/accelerator.py Show resolved Hide resolved

dvrogozh force-pushed the xpu branch from 10eb7fc to d83f493 Compare June 5, 2024 00:18

muellerzr approved these changes Jun 5, 2024

View reviewed changes

src/accelerate/utils/imports.py Outdated Show resolved Hide resolved

dvrogozh force-pushed the xpu branch from d83f493 to 92484df Compare June 5, 2024 18:54

dvrogozh commented Jun 6, 2024

View reviewed changes

dvrogozh force-pushed the xpu branch from 92484df to a90d9cc Compare June 7, 2024 16:13

dvrogozh requested a review from muellerzr June 7, 2024 16:15

dvrogozh force-pushed the xpu branch 2 times, most recently from e918737 to 9b94a01 Compare June 7, 2024 16:33

dvrogozh marked this pull request as ready for review June 7, 2024 17:00

dvrogozh changed the title ~~[WIP] xpu: support xpu backend from stock pytorch (>=2.4)~~ xpu: support xpu backend from stock pytorch (>=2.4) Jun 7, 2024

dvrogozh force-pushed the xpu branch from 9b94a01 to 13aa308 Compare June 10, 2024 16:46

EikanWang approved these changes Jun 11, 2024

View reviewed changes

dvrogozh mentioned this pull request Jun 12, 2024

xpu: gradient checkpointing wrongly hits cuda path running on non-cuda devices pytorch/pytorch#128478

Closed

muellerzr approved these changes Jun 13, 2024

View reviewed changes

muellerzr requested a review from SunMarc June 13, 2024 14:12

SunMarc approved these changes Jun 13, 2024

View reviewed changes

dvrogozh force-pushed the xpu branch from 13aa308 to e7648bf Compare June 13, 2024 14:59

muellerzr approved these changes Jun 13, 2024

View reviewed changes

muellerzr merged commit 3b5a00e into huggingface:main Jun 13, 2024
22 of 23 checks passed

dvrogozh mentioned this pull request Jun 13, 2024

Drop torch re-imports in npu and mlu paths #2856

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xpu: support xpu backend from stock pytorch (>=2.4) #2825

xpu: support xpu backend from stock pytorch (>=2.4) #2825

dvrogozh commented Jun 4, 2024 •

edited

Loading

muellerzr left a comment

muellerzr left a comment

HuggingFaceDocBuilderDev commented Jun 5, 2024

dvrogozh Jun 6, 2024

dvrogozh Jun 7, 2024

dvrogozh commented Jun 7, 2024

dvrogozh commented Jun 10, 2024

dvrogozh commented Jun 11, 2024

dvrogozh commented Jun 11, 2024 •

edited

Loading

dvrogozh commented Jun 13, 2024

muellerzr left a comment

muellerzr Jun 13, 2024

dvrogozh Jun 13, 2024

dvrogozh Jun 13, 2024

dvrogozh Jun 13, 2024

SunMarc left a comment

SunMarc Jun 13, 2024

dvrogozh Jun 13, 2024

SunMarc Jun 13, 2024

muellerzr left a comment •

edited

Loading

	if importlib.util.find_spec("torch") is None or importlib.util.find_spec("torch_npu") is None:
	return False

	import torch
	import torch_npu # noqa: F401

xpu: support xpu backend from stock pytorch (>=2.4) #2825

xpu: support xpu backend from stock pytorch (>=2.4) #2825

Conversation

dvrogozh commented Jun 4, 2024 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dvrogozh commented Jun 7, 2024

dvrogozh commented Jun 10, 2024

dvrogozh commented Jun 11, 2024

dvrogozh commented Jun 11, 2024 • edited Loading

dvrogozh commented Jun 13, 2024

muellerzr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muellerzr left a comment • edited Loading

Choose a reason for hiding this comment

dvrogozh commented Jun 4, 2024 •

edited

Loading

dvrogozh commented Jun 11, 2024 •

edited

Loading

muellerzr left a comment •

edited

Loading