[Feat] Add audio benchmarking support `/v1/audio/transcriptions` #99

b8zhong · 2025-05-13T19:07:55Z

Add Audio Transcription Benchmarking

vLLM supports Whisper, since vllm-project/vllm#12909 and TensorRT in https://github.com/NVIDIA/TensorRT-LLM/tree/release/0.19/examples/whisper (haven't personally tried this one).

Support ASR model benchmarking via /v1/audio/transcriptions.

Changes:

New openai-audio backend
ASRDataset class for loading/preparing ASR samples from Hugging Face datasets (e.g., LibriSpeech, Common Voice, AMI), including temporary file management. Mostly lifted from vLLM.
CLI arguments (--audio-dataset-name, etc.) for ASR data configuration.
Unfortunately, i had to modify some of RequestFuncInput, main.py, and Client.py to integrate the audio pipeline.
Added librosa, soundfile, datasets dependencies. We can move these to an extra [audio] if necessary as well

Signed-off-by: Brayden Zhong b8zhong@uwaterloo.ca
Co-authored-by: @vincentzed

Example:

fib benchmark \
    --backend openai-audio \
    --model "openai/whisper-large-v3-turbo" \
    --base-url "http://localhost:8000" \
    --endpoint "/v1/audio/transcriptions" \
    --tokenizer "openai/whisper-large-v3-turbo" \
    \
    --audio-dataset-name "edinburghcstr/ami" \
    --audio-dataset-config "ihm" \
    --audio-dataset-split "test" \
    --audio-language "en" \
    --audio-duration-limit 29.5 \
    --audio-max-samples 500 \
    \
    --num-of-req 500

============ Serving Benchmark Result ============
Successful requests:                     500       
Benchmark duration (s):                  14.59     
Total input tokens:                      3500      
Total generated tokens:                  3050      
Request throughput (req/s):              34.27     
Input token throughput (tok/s):          239.88    
Output token throughput (tok/s):         209.04    
---------------Time to First Token----------------
Mean TTFT (ms):                          8803.45   
Median TTFT (ms):                        9122.14   
P99 TTFT (ms):                           12670.75  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.01      
Median TPOT (ms):                        0.00      
P99 TPOT (ms):                           0.02      
---------------Inter-token Latency----------------
Mean ITL (ms):                           0.00      
Median ITL (ms):                         0.00      
P99 ITL (ms):                            0.00      
==================================================

small cleanup remove unused small cleanup small cleanup small cleanup small cleanup Small refactor.

b8zhong · 2025-05-14T00:47:29Z

Maybe cc @benchislett , thanks in advance 👍

xinli-centml · 2025-06-05T18:08:52Z

Thanks! @b8zhong , sorry for the delayed the review

benchislett · 2025-06-05T18:41:10Z

src/flexible_inference_benchmark/main.py

-        calculate_metrics(output["inputs"], output["outputs"], output["time"], tokenizer, output["stream"])
+        simplified_inputs = None
+        if args.backend == "openai-audio":
+            simplified_inputs = [(req["prompt"], req["prompt_len"], req["output_len"]) for req in prepared_requests_data]


This if/else looks the same in both branches

benchislett · 2025-06-05T18:42:47Z

src/flexible_inference_benchmark/main.py

            "stream": not args.disable_stream,
        }

        if args.output_file:
-            filename = args.output_file
-            if args.num_of_imgs_per_req:
-                w, h = args.img_ratios_per_req[idx]


was this code moved, or intentionally removed? If the latter, for what reason?

xinli-centml · 2025-06-05T18:47:08Z

Hi @b8zhong , thanks a lot for the contribution, currently we don't plan to support audio models for benchmarking, so adding support is a bit pre-mature, we will reopen this PR when audio support is added to our inference engine.

b8zhong · 2025-06-08T01:20:30Z

No problem, thanks Xin + Benjamin for reviewing anyway 👍

Add audio benchmarks init

13f491b

small cleanup remove unused small cleanup small cleanup small cleanup small cleanup Small refactor.

xinli-centml requested review from benchislett and johncalesp June 5, 2025 18:12

benchislett reviewed Jun 5, 2025

View reviewed changes

xinli-centml closed this Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Add audio benchmarking support `/v1/audio/transcriptions` #99

[Feat] Add audio benchmarking support `/v1/audio/transcriptions` #99

Uh oh!

b8zhong commented May 13, 2025 •

edited

Loading

Uh oh!

b8zhong commented May 14, 2025

Uh oh!

xinli-centml commented Jun 5, 2025

Uh oh!

benchislett Jun 5, 2025

Uh oh!

benchislett Jun 5, 2025

Uh oh!

xinli-centml commented Jun 5, 2025

Uh oh!

b8zhong commented Jun 8, 2025

Uh oh!

Uh oh!

[Feat] Add audio benchmarking support /v1/audio/transcriptions #99

[Feat] Add audio benchmarking support /v1/audio/transcriptions #99

Uh oh!

Conversation

b8zhong commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b8zhong commented May 14, 2025

Uh oh!

xinli-centml commented Jun 5, 2025

Uh oh!

benchislett Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

benchislett Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

xinli-centml commented Jun 5, 2025

Uh oh!

b8zhong commented Jun 8, 2025

Uh oh!

Uh oh!

[Feat] Add audio benchmarking support `/v1/audio/transcriptions` #99

[Feat] Add audio benchmarking support `/v1/audio/transcriptions` #99

b8zhong commented May 13, 2025 •

edited

Loading