-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch for Cambricon MLUs test #1747
Merged
Merged
Changes from 1 commit
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
d853d39
support mlu device
huismiling f605d3c
Merge commit '6c44096c7b8d55a2ecf24be9bc68393467e1584a' into mlu-dev
huismiling e0cfce6
merge from main
huismiling 4f462b8
rollback
huismiling 2fbdb33
up
huismiling 3b2b8d6
add version check for mlu
huismiling 2d0ef7c
better accelerate version check for mlu device
huismiling 0965469
fix error with make style
huismiling 4447fe2
patch for MLUs pytest
huismiling 3e8e62b
Merge remote-tracking branch 'hf_peft/main' into mlu-dev
huismiling 99965ac
make style for MLUs
huismiling 83eaba2
rollback to torch.float16 for MLU test.
huismiling ad14bfe
skip test_merge_layers_fp16 for pt2.1 & cpu device
huismiling 2b1ad33
fix err for test_merge_layers_fp16
huismiling File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
patch for MLUs pytest
- Loading branch information
commit 4447fe2276a92bb20a60192e2eb4a9c0856af3af
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch_dtype=torch.float16
leads to an error.RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I cannot replicate this, whether with or without GPU. The idea of this test is exactly to check that this error does not occur with fp16, so not using this dtype is counter-productive. Is this only occurring with MLU devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reproduction code is as follows.
The issue can be reproduced using PyTorch 2.1, but it executes normally with PyTorch 2.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so instead of changing the dtype, how about skipping the test if an old pytorch version is detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe we can use fp16 with pt>=2.3, and fp32 with pt<2.3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really don't need to test merging with fp32 here, as it's tested extensively in other tests. This test is very specifically for merging with fp16, so if we don't use fp16, we can skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, Got it! I will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I found that it is the device "cpu" leads error.
When device is changed to self.torch_device as I fixed, MLU tests is OK with torch.float16 .
@BenjaminBossan Will this test use "cpu" device? If not, it isn't need to skip test for pt2.1 .
peft/tests/testing_common.py
Line 473 in cb0bf07
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So IIRC, there is an error when using CPU + float16 + old PyTorch. If we change either of those variables, there is no error. On CI, we have a new PyTorch version, so it passes, despite using CPU.
If we switch to
self.torch_device
, it depends, because the device is auto-inferred based on what devices are available. So on our CI, this would still be CPU. On yours, it might not, but then we don't really test what was the original intent, namely that float16 works on CPU.I assume this fails on your CI because it uses an older PyTorch version. This is why I suggested to just skip the test with older PyTorch versions. If you want, you could add a specific test for merging float16 with MLU, which would be skipped if the device is not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenjaminBossan
Got it! Skipping test_merge_layers_fp16 for pt2.1 when cpu device, this should be OK.