float8 training with rowwise scaling

This is a brain dump of what is missing from `torchao.float8` to support training with rowwise scaling, to help if someone wants to jump in to build this.

## already done

* `torch._scaled_mm` supports rowwise scaling
* inductor supports rowwise scaled gemms, in `max-autotune` mode (I haven't personally tested this yet)

## needed

1. we need `Float8Tensor` to work with rowwise scales.  We had an unlanded PR on `float8_experimental` doing that here (https://github.com/pytorch-labs/float8_experimental/pull/352), just never got the time to land it.  You can reuse that PR or do something similar.  Note that https://github.com/pytorch/ao/pull/819 landed recently adding float8 rowwise scaling to inference, so being consistent with that where applicable would be nice.
2. we need `Float8Linear` to be configurable with rowwise scales for each argument, and for the scaling to respect the config, validated by tests + benchmarks, would require changes to `torchao.float8.config.py` and `torchao.float8.float8_linear.py`.
3. after (1) and (2), we could make each gemm configurable to enable leaving some of them in high precision
4. performance fixes throughout `torchao.float8` and inductor, if needed based on how well inductor generates the scaling code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

float8 training with rowwise scaling #889

already done

needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

float8 training with rowwise scaling #889

Description

already done

needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions