Skip to content

Distill or prune model to save training time #288

Open
@johnml1135

Description

@johnml1135

If we can distill or prune the NLLB-200 shortly after starting fine tuning, we may be able to dramatically reduce (50% or more) the training and inferencing time needed. It could even do something like this:

  • Take the 3.3GB model and train for 1000 steps on the 2x A100's. Prune and save.
  • Load the model on a single A100 and finish training and inferencing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    optimizationModel training/inferencing optimizationresearchResearch topics

    Type

    No type

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions