Add TensorRT Token2wav #70

yuekaizhang · 2025-10-22T03:02:53Z

This PR uses Nvidia TensorRT to accelerate StepAudio2's token2wav module.

Both Streaming and Offline Mode using TRT
Offline Mode batch > 1 inference
Speaker Embedding Model using TRT

The following benchmark was conducted on an NVIDIA L20 GPU, generating 26 audio clips with a total length of 170 seconds.

Method	Note	Cost Time	RTF
Offline	batch=1, PyTorch	4.32 seconds	0.025
Offline	batch=1, TensorRT enabled	2.09 seconds	0.012
Offline	batch=2, PyTorch	3.77 seconds	0.022
Offline	batch=2, TensorRT enabled	1.97 seconds	0.012
Streaming	batch=1, chunk_size = 1 second, PyTorch	20.3 seconds	0.119
Streaming	batch=1, chunk_size = 1 second, TensorRT	12.96 seconds	0.076

For more details, see https://github.com/yuekaizhang/Step-Audio2/blob/trt/tools/tensorrt_token2wav.md

yuekaizhang · 2025-10-22T03:08:32Z

See also Cosyvoice2 LLM + StepAudio2 Token2wav https://github.com/FunAudioLLM/CosyVoice/blob/main/runtime/triton_trtllm/README.DIT.md.

light1726 · 2025-11-21T05:10:39Z

Hi! Great work! I had some tests and they consistently failed with an AssertionError at this line: torch.testing.assert_allclose(output_pytorch, torch.from_numpy(output_onnx).to(device), rtol=1e-2, atol=1e-4). Could you please share the versions of key dependencies you used, such as:

TensorRT
ONNX Runtime GPU
CUDA
PyTorch

This would help me align my environment with yours. Or do you have any insights on what might be causing this discrepancy?

yuekaizhang · 2025-11-21T06:56:38Z

Hi! Great work! I had some tests and they consistently failed with an AssertionError at this line: torch.testing.assert_allclose(output_pytorch, torch.from_numpy(output_onnx).to(device), rtol=1e-2, atol=1e-4). Could you please share the versions of key dependencies you used, such as:

TensorRT

ONNX Runtime GPU

CUDA

PyTorch

This would help me align my environment with yours. Or do you have any insights on what might be causing this discrepancy?

@light1726 The error should be harmless. Go ahead please.

yuekaizhang · 2025-11-21T06:57:01Z

For pre-built docker image, see https://github.com/FunAudioLLM/CosyVoice/blob/main/runtime/triton_trtllm/README.DIT.md.

root and others added 11 commits October 22, 2025 10:51

support batching and trt for token2wav

0c27a8d

add streaming trt support

9e9433a

add speaker cache and runtime streaming request cache

6997494

support vocoder cache

4e51760

rename files

6a01297

fix att buffer shallow copy issue

28984ee

lint

c9b95f4

code clean

96ae888

align with the original token2wav inferface

459f67e

add benchmark results

4c1c3ce

clean code

e2a16c0

yuekaizhang changed the title ~~Trt~~ Add TensorRT Token2wav Oct 22, 2025

remove license

204e587

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TensorRT Token2wav #70

Add TensorRT Token2wav #70

Uh oh!

yuekaizhang commented Oct 22, 2025 •

edited

Loading

Uh oh!

yuekaizhang commented Oct 22, 2025

Uh oh!

light1726 commented Nov 21, 2025

Uh oh!

yuekaizhang commented Nov 21, 2025

Uh oh!

yuekaizhang commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add TensorRT Token2wav #70

Are you sure you want to change the base?

Add TensorRT Token2wav #70

Uh oh!

Conversation

yuekaizhang commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuekaizhang commented Oct 22, 2025

Uh oh!

light1726 commented Nov 21, 2025

Uh oh!

yuekaizhang commented Nov 21, 2025

Uh oh!

yuekaizhang commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuekaizhang commented Oct 22, 2025 •

edited

Loading

yuekaizhang commented Nov 21, 2025 •

edited

Loading