Skip to content

better handling of tensorwise float8 recipe in configuration #901

Open
@vkuzo

Description

@vkuzo

Bug description

We need a follow-up on #808 . If --float8.recipe_name tensorwise is specified, we should handle the FSDP float8 all-gather, scale precompute, etc arguments properly instead of asserting that they aren't supported.

Versions

main branch

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions