You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Only having support for ray for distributed inference will significantly reduce adoption of this tool if it truly is more performant than TGI. TGI can be run as a black-box image on Kubernetes with support for sharded models and vLLM should support this as well.
bryanhpchiang, yiliu30, jd-nuva, samos123, hammad26 and 5 morebryanhpchiang, samos123, jimangel and sa-