[Bugfix] for saving quantized models trained using fsdp #2183
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bugfix for Saving Quantized Models Trained Using FSDP
This description details a bugfix for an issue encountered when loading quantized models that were trained using Fully Sharded Data Parallel (FSDP).
The issue originated from the process of quantization, during which we updated layer names and saved them using our custom implementation instead of
accelerate
. This approach resulted in an incorrectly savedstate_dict
, as each tensor had device information associated with it. Consequently, the model could not be loaded properly using the transformer'sAutoModel.from_pretrained(...)
method.The modifications in this update address these complications, ensuring that quantized models trained using FSDP can now be saved and loaded correctly.
Changes
The solution includes a post-processing step where we explicitly iterate through the
state_dict
and move each tensor to the CPU. The correctedstate_dict
is then overwritten on the previous, faultystate_dict
.Testing
The saved quantized models can now be loaded by
SparseAutoModel.from_pretrained(...)
. This has been verified manually. The test commands in the ticket work as expected.Benefits