Skip to content

[Feature] Enable vLLM V1 feature and Tensor/Pipeline Parallel to improve Performance #2044

@louie-tsai

Description

@louie-tsai

Priority

P3-Medium

OS type

Ubuntu

Hardware type

Xeon-EMR

Running nodes

Single Node

Description

To improve vLLM performance on Xeon, we need to use Tensor Parallel/Pipeline Parallel and V1 feature to improve performance on Xeon. potentially, we might have 2-3X speedup

Metadata

Metadata

Assignees

Labels

featureNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions