TorchDynamo & XLA in VLLM #1475

dhritiman · 2023-10-26T00:33:13Z

We came across this post which talks about gains in Llama inference with TorchDynamo and XLA

https://pytorch.org/blog/path-achieve-low-inference-latency/?utm_content=254892693&utm_medium=social&utm_source=linkedin&hss_channel=lcp-78618366

I had the following questions :-

(a) Has the vllm team or anyone here explored these optimizations with VLLM ? If yes, do you see similar gains as discussed in the blog ?
(b) Does the VLLM team plan to support TorchDynamo and XLA ?
(c) Do you think any of the optimizations brought in by TorchDynamo can conflict with the optimizations or custom implementation in VLLM itself ?

hmellor · 2024-03-13T11:54:21Z

torch.compile is on the roadmap #2681

hmellor closed this as completed Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchDynamo & XLA in VLLM #1475

TorchDynamo & XLA in VLLM #1475

dhritiman commented Oct 26, 2023

hmellor commented Mar 13, 2024

TorchDynamo & XLA in VLLM #1475

TorchDynamo & XLA in VLLM #1475

Comments

dhritiman commented Oct 26, 2023

hmellor commented Mar 13, 2024