-
Notifications
You must be signed in to change notification settings - Fork 14
generate lws based yaml #219
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
This PR provides capability to use LeaderWorkerSet as an alternative to a Deployment for the P/D pods. Supports simple expression of tensor and data parallelism. Currently supports on data local parallelism of 1. The base lws configuration comes from https://github.com/tlrmchlsmth/vllm-dp-lws/blob/main/lws.yaml. It was slightly modified to (a) create explicit |
go run main.go \
--epp-cluster-role pod-read generate \
-m samples/deepseek/deepseek-1t1d.yaml \
-b samples/deepseek/lws-base.yaml \
| sed 's/^[a-zA-Z]*:/ ---/' \
| sed 's/^ //' \
> samples/deepseek/deepseek-1t1d-manifest.yaml |
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
Serving inference requests in llm-d where each P/D node is deployed over multiple pods. A project that shows how to host a model with multiple pods per P/D node is --> https://github.com/tlrmchlsmth/vllm-dp-lws/tree/main To do this with llm-d we show the steps to deploy the llm-d inference scheduler. We give instructions using kgateway.
|
Replace generation of P/D with LeaderWorkerSets instead of deployments.
Includes sample msvc and baseconfig files.