For now we should do: ``` num_gpus=self.cache_config.gpu_memory_utilization if self.parallel_config.tensor_parallel_size < 2 else 1 ``` _Originally posted by @Yard1 in https://github.com/vllm-project/vllm/issues/1821#issuecomment-1833028091_