Open
Description
Summary
vllm-project/vllm#16078, while working on this Richard and I noticed that we are missing cache on repeated runs to "compile_block_mask" because of mod_index autograd func
Fix is to check if grad_mod is enabled / x requries grad. If so run func else: call contents of foward
cc @chauhang @penguinwu @zou3519 @ydwu4 @bdhirsh @Chillee @yanboliang @BoyuanFeng