You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(a) Has the vllm team or anyone here explored these optimizations with VLLM ? If yes, do you see similar gains as discussed in the blog ?
(b) Does the VLLM team plan to support TorchDynamo and XLA ?
(c) Do you think any of the optimizations brought in by TorchDynamo can conflict with the optimizations or custom implementation in VLLM itself ?
The text was updated successfully, but these errors were encountered:
We came across this post which talks about gains in Llama inference with TorchDynamo and XLA
https://pytorch.org/blog/path-achieve-low-inference-latency/?utm_content=254892693&utm_medium=social&utm_source=linkedin&hss_channel=lcp-78618366
I had the following questions :-
(a) Has the vllm team or anyone here explored these optimizations with VLLM ? If yes, do you see similar gains as discussed in the blog ?
(b) Does the VLLM team plan to support TorchDynamo and XLA ?
(c) Do you think any of the optimizations brought in by TorchDynamo can conflict with the optimizations or custom implementation in VLLM itself ?
The text was updated successfully, but these errors were encountered: