1. Let's start with ngram, can you collect both latency and throughput numbers on ShareGPT dataset on H100 and one low end GPU? 2. If the numbers from 1 is not expected, could you run some profiling to understand the performance bottleneck. 3. Get more performance numbers on other datasets.