Question about Flash_atten

I don’t have an Ampere architecture GPU, so I cannot use the FlashAttention module and have disabled it in my setup. I would like to ask:

1.Can I directly use the provided one-stage pretrained weights with FlashAttention disabled?

2. Or do I need to retrain the model from scratch without FlashAttention? Can I just use the default code of Sparsedrive to train the stage 1 model?

Thanks for your guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Flash_atten #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about Flash_atten #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions