Open
Description
Hi folks, thanks for the great work.
With #135 merged, vLLM could see benefit from torch.compile backend given compiler-native integration with PagedAttention kernels.
Is there an easy way to see what the latest/nightly MBU is for torch compile on say, H100 / Llama3 70B?
Also interested in cold start compile time
cc @msaroufim