Set tensor_parallel_size=1 or tensor_parallel_size=2. the response is OK. my env info: vllm==0.2.2 ray==2.8.0 transformers==4.34.0 torch==2.1.0