-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Allow same layer adapters on different devices #1742
FIX Allow same layer adapters on different devices #1742
Conversation
Resolves 1639 The issue is that so far, we made the assumption in PEFT that all adapter weights of a specific layer are on the same device. There can be cases where it is useful to have adapters on different devices. E.g. when a user loads a lot of LoRA adapters and wants to offload those not currently in use to CPU, they would not currently be able to do so. With this PR, we add this possibility. To achieve this, when we update an adapter layer with a new adapter, we only move that specific adapter to the device of the base layer, will not touching the other loaded adapters. While working on this, I discovered a small bug in VeRA when adding multiple adapters, which is now also fixed. This PR has the potential to lead to unforeseen issues, so careful review is required. After merging this, let's keep it out of releases for a while to ensure it doesn't break anything.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
As it's not a unittest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @BenjaminBossan for refactoring out the logic for setting the device of the adapters without disturbing the devices of the prior added adapters! 🚀 Very useful for offloading when working with a vast numbers of adapters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this !
Thanks for the reviews. I'll wait a bit in case @iuliaturc has time to check if this PR fixes the initial issue. If we don't hear back, I'll merge in a few days. We should let this PR "simmer" for a bit since there is a small probability that this will break some edge case we haven't thought of. |
Thanks so much for the PR! I left a comment here. TL;DR is that, indeed, only one LoRA seems to be loaded at a time, but the fix doesn't seem to address the original problem (that latency keeps creeping up the more calls we make). |
Thanks for the confirmation. |
PR huggingface#1742 introduced the feature that adapters of the same layer can be on different devices. A new method was introduced that is responsible for moving the parameters related to a specific adapter in a consistent way. In BOFT, however, one parameter was overlooked, boft_P. This parameter is not stored inside a ParameterDict or ModuleDict, hence it was not moved. The reason is (presumably) that this parameter is shared between all BOFT adapters, as it's always identical. However, this clashes with having different adapters on different devices. To solve this, the parameter is now moved on the fly to the correct device during the forward pass.
PR #1742 introduced the feature that adapters of the same layer can be on different devices. A new method was introduced that is responsible for moving the parameters related to a specific adapter in a consistent way. In BOFT, however, one parameter was overlooked, boft_P. This parameter is not stored inside a ParameterDict or ModuleDict, hence it was not moved. The reason is (presumably) that this parameter is shared between all BOFT adapters, as it's always identical. However, this clashes with having different adapters on different devices. To solve this, the parameter is now moved on the fly to the correct device during the forward pass.
Locally, multiple AdaLoRA-related tests are failing. We did not catch this in the nightly run because the tests were missing the necessary pytest marker. The issue is related to the change in huggingface#1742, which made it possible to load different adapters on different devices. Although that PR itself was sound, the issue is that for AdaLoRA, one of its parameters, ranknum, was not listed in the other_param_names and was thus not moved to the correct device. This oversight is now fixed and the GPU tests are now passing locally for me. This PR also adds the missing pytest marker to the test class that was missing it, so that these errors should be caught by our nightly CI in the future.
Locally, multiple AdaLoRA-related tests are failing. We did not catch this in the nightly run because the tests were missing the necessary pytest marker. The issue is related to the change in #1742, which made it possible to load different adapters on different devices. Although that PR itself was sound, the issue is that for AdaLoRA, one of its parameters, ranknum, was not listed in the other_param_names and was thus not moved to the correct device. This oversight is now fixed and the GPU tests are now passing locally for me. This PR also adds the missing pytest marker to the test class that was missing it, so that these errors should be caught by our nightly CI in the future.
Resolves #1639
The issue is that so far, we made the assumption in PEFT that all adapter weights of a specific layer are on the same device. There can be cases where it is useful to have adapters on different devices. E.g. when a user loads a lot of LoRA adapters and wants to offload those not currently in use to CPU, they would not currently be able to do so.
With this PR, we add this possibility. To achieve this, when we update an adapter layer with a new adapter, we only move that specific adapter to the device of the base layer, will not touching the other loaded adapters.
While working on this, I discovered a small bug in VeRA when adding multiple adapters, which is now also fixed.
This PR has the potential to lead to unforeseen issues, so careful review is required. After merging this, let's keep it out of releases for a while to ensure it doesn't break anything.