-
Notifications
You must be signed in to change notification settings - Fork 122
Open
Description
I don’t have an Ampere architecture GPU, so I cannot use the FlashAttention module and have disabled it in my setup. I would like to ask:
1.Can I directly use the provided one-stage pretrained weights with FlashAttention disabled?
- Or do I need to retrain the model from scratch without FlashAttention? Can I just use the default code of Sparsedrive to train the stage 1 model?
Thanks for your guidance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels