You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
add a benchmark for casting a tensor to MX across dim0 and dim1
Summary:
This is useful for training, extracting into a benchmark so we can
optimize.
Test Plan:
```
TORCH_LOGS_FORMAT=short TORCH_LOGS=aot_graphs,output_code python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250223_test --mx_recipe_name mxfp8_emulated --experiment_filter lowp --mode_filter cast_only_dim0_dim1
// output: https://gist.github.com/vkuzo/a4e13bac7fc8ca3af10bfd5483b85b33
// currently we see two kernels, one per dim
```
Reviewers:
Subscribers:
Tasks:
Tags:
ghstack-source-id: d8745ba
ghstack-comment-id: 2686344197
Pull Request resolved: #1787
0 commit comments