Closed
Description
When I set tensor_parallel_size to > 1, the wall time increases though everything else is down. Am I doing something wrong in my setup where using multiple gpus is actually slower than using one?
vllm = LLM(
model="mosaicml/mpt-7b-instruct",
trust_remote_code=True,
dtype="float16",
tensor_parallel_size=1,
gpu_memory_utilization=.95,
)
CPU times: user 3.66 s, sys: 262 ms, total: 3.93 s
Wall time: 1.11 s
vllm = LLM(
model="mosaicml/mpt-7b-instruct",
trust_remote_code=True,
dtype="float16",
tensor_parallel_size=2,
gpu_memory_utilization=.95,
)
CPU times: user 65.5 ms, sys: 32.2 ms, total: 97.7 ms
Wall time: 1.27 s
Metadata
Metadata
Assignees
Labels
No labels