Open
Description
If we can distill or prune the NLLB-200 shortly after starting fine tuning, we may be able to dramatically reduce (50% or more) the training and inferencing time needed. It could even do something like this:
- Take the 3.3GB model and train for 1000 steps on the 2x A100's. Prune and save.
- Load the model on a single A100 and finish training and inferencing.
Metadata
Metadata
Assignees
Type
Projects
Status
📋 Backlog