Thank you for your hard work. The performance difference between A100 and H100 is not significant. I used the official VLLM image 0.2.4 on Docker Hub. I set the prompt and completion to 500, and both A100 and H100 take 19 seconds. Are there any settings to optimize performance on H100?