Quantized Training

Inspired by a recent back and forth with @gau-nernst we should add some quantized training recipes in AO for small models (600M param range)

Character.ai recently shared that they're working on quantized training https://research.character.ai/optimizing-inference/ where per @stephenroller they train models from scratch in int8 https://x.com/stephenroller/status/1816636257717436779

Historically we've invested more in QAT which @andrewor14 has led which is more of a technique to reduce perplexity when we do an eventual post training quantization.

Quantized training on the other hand actually quantizes the model at training time and so memory savings are observed both for training and inference

So when discussing quantized training there's a few aspects
1. Weights they can be in one: fp16, fp8, int8, int4 and below
2. Activations most likely limited to fp8, fp16
3. Optimizer can be in one of: fp32, fp16, bf16, fp8, int8 and below

And if one were to ship this work, a bad combination can be validated at small scale (~600M parameter range) but a good idea needs to continuously be tested from (8b to 405b range) so each of these will need loss curves 

When choosing the starting point, we could either pretrain a model using quantized training or just finetune it and as long as the loss curves match the fp16 baselines then we are good. We'd also need to of course validate that memory savings are there and what the speedups/slowdowns are. 

And while we can merge a lot of the dtype conversion in AO and have some toy training loop in AO what I'm more optimistic about is having some end to end trainig recipe in https://github.com/pytorch/torchtitan @awgu  and an end to end finetuning recipe https://github.com/pytorch/torchtune @ebsmothers @joecummings 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantized Training #554

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantized Training #554

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions