FlexAttention ModIndex misses cache hit for autograd func

# Summary

https://github.com/vllm-project/vllm/pull/16078, while working on this Richard and I noticed that we are missing cache on repeated runs to "compile_block_mask" because of mod_index autograd func

https://github.com/pytorch/pytorch/blob/21c2565f35f1d5034c3244066b61e58eb5148781/torch/_dynamo/_trace_wrapped_higher_order_op.py#L141


Fix is to check if grad_mod is enabled / x requries grad. If so run func else: call contents of foward

cc @chauhang @penguinwu @zou3519 @ydwu4 @bdhirsh @Chillee @yanboliang @BoyuanFeng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FlexAttention ModIndex misses cache hit for autograd func #151358

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FlexAttention ModIndex misses cache hit for autograd func #151358

Description

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions