Skip to content

[V1] [Performance Benchmark] Benchmark the performance of Speculative Decoding #15600

Open
@LiuXiaoxuanPKU

Description

@LiuXiaoxuanPKU
  1. Let's start with ngram, can you collect both latency and throughput numbers on ShareGPT dataset on H100 and one low end GPU?
  2. If the numbers from 1 is not expected, could you run some profiling to understand the performance bottleneck.
  3. Get more performance numbers on other datasets.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions