Description
As of May 19th 2025, we are halting active development on torchchat.
The original intent of torchchat was to both demonstrate how to run LLM inference using PyTorch and improve the performance and functionality of the entire PyTorch ecosystem.
Since torchchat’s launch, we’ve seen vLLM become the dominant player for server-side LLM inference. We’re ecstatic to have vLLM join the PyTorch Ecosystem and recommend folks use them for hosting LLMs in server production environments. Given the growth of vLLM and others, we do not see the need to maintain an active demonstration of how to run LLM inference using PyTorch.
We are very proud of the performance and functionality improvements we saw in the PyTorch ecosystem over the last year, including:
- The performance of LLM inference increase by multiples for every device we support (CUDA, CPU, MPS, ARM, etc)
- Working code, demonstrating how to run LLM inference for all the major execution modes (Eager, Compile, AOTI and ET) giving users a starting point for using PyTorch for LLM inference from server to embedded devices and everything in between
- Quantization expand to support the most popular schemes and bit sizes
- torchchat become the testing grounds for new advancements (experimental torchao kernels, MPS compile, AOTI Packaging)
There’s still plenty of exciting work to do across the LLM Inference space and PyTorch will stay invested in improving things.
We appreciate and thank everyone in the community for all that you’ve contributed.
Thanks to our contributors:
@mikekgfb @Jack-Khuu @metascroy @malfet @larryliu0820 @kirklandsign @swolchok @vmpuri @kwen2501 @Gasoonjia @orionr @guangy10 @byjlw @lessw2020 @mergennachin @GregoryComer @shoumikhin @kimishpatel @manuelcandales @lucylq @desertfire @gabe-l-hart @seemethere @iseeyuan @jerryzh168 @leseb @yanbing-j @mreso @fduwjj @Olivia-liu @angelayi @JacobSzwejbka @ali-khosh @nlpfollower @songhappy @HDCharles @jenniew @silverguo @zhenyan-zhang-meta @ianbarber @dbort @kit1980 @mcr229 @georgehong @krammnic @xuedinge233 @anirudhs001 @shreyashah1903 @soumith @TheBetterSolution @codereba @jackzhxng @KPCOFGS @kuizhiqing @kartikayk @nobelchowdary @mike94043 @vladoovtcharov @prideout @sanchitintel @cbilgin @jeffdaily @infil00p @msaroufim @zhxchen17 @vmoens @wjunLu
-PyTorch Team