[Feature] Enable vLLM V1 feature and Tensor/Pipeline Parallel to improve Performance

### Priority

P3-Medium

### OS type

Ubuntu

### Hardware type

Xeon-EMR

### Running nodes

Single Node

### Description

To improve vLLM performance on Xeon, we need to use Tensor Parallel/Pipeline Parallel and V1 feature to improve performance on Xeon. potentially, we might have 2-3X speedup