Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] for saving quantized models trained using fsdp #2183

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Mar 15, 2024

Bugfix for Saving Quantized Models Trained Using FSDP

This description details a bugfix for an issue encountered when loading quantized models that were trained using Fully Sharded Data Parallel (FSDP).

The issue originated from the process of quantization, during which we updated layer names and saved them using our custom implementation instead of accelerate. This approach resulted in an incorrectly saved state_dict, as each tensor had device information associated with it. Consequently, the model could not be loaded properly using the transformer's AutoModel.from_pretrained(...) method.

The modifications in this update address these complications, ensuring that quantized models trained using FSDP can now be saved and loaded correctly.

Changes

  • Asana Ticket
  • Relevant code modifications have been made to fix the saved state dicts for quantized models trained with FSDP.

The solution includes a post-processing step where we explicitly iterate through the state_dict and move each tensor to the CPU. The corrected state_dict is then overwritten on the previous, faulty state_dict.

Testing

The saved quantized models can now be loaded by SparseAutoModel.from_pretrained(...). This has been verified manually. The test commands in the ticket work as expected.

Benefits

  • This fix improves compatibility when saving quantized models trained using FSDP.

@rahul-tuli rahul-tuli force-pushed the potential-fsdp-quantized-model-save-fix branch from 8fe1701 to 65a4c07 Compare March 15, 2024 15:36
@rahul-tuli rahul-tuli force-pushed the potential-fsdp-quantized-model-save-fix branch from 65a4c07 to cc05f07 Compare March 18, 2024 15:05
@rahul-tuli rahul-tuli marked this pull request as ready for review March 18, 2024 15:14
@rahul-tuli rahul-tuli self-assigned this Mar 18, 2024
@rahul-tuli rahul-tuli added the bug Something isn't working label Mar 18, 2024
@rahul-tuli rahul-tuli changed the title Potential fix for saving quantized models trained using fsdp [Bugfix] for saving quantized models trained using fsdp Mar 18, 2024
@Satrat Satrat merged commit 1fd86c2 into main Mar 18, 2024
13 of 14 checks passed
@Satrat Satrat deleted the potential-fsdp-quantized-model-save-fix branch March 18, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants