feat: 2-3x inference speedup, faster than real-time #71

vatsalaggarwal · 2024-02-22T17:28:27Z

dirty fix to speed up inference by porting gpt-fast... supports single utterance only, no batching...

First stage only:
RTX 4090: 230T/s (~1.5 seconds of speech generated in 1 second of wall-clock time)
H100: 382T/s (~2.5 seconds of speech generated in 1 second of wall-clock time)

These times are consistent across context lengths due to the static kv-cache used.

Notes:

it can take a while to start up the first time (~1-3 minutes)
due to compilation, restarting the script pays a penalty of ~20-50second

sidroopdaska

few clarifications requested

fam/llm/gptfast_model.py

fam/llm/gptfast_sample_utils.py

sidroopdaska · 2024-02-22T17:43:41Z

fam/llm/gptfast_inference.py

+            num_samples=1,
+            seed=1337,
+            device="cuda",
+            dtype="bfloat16",


auto handle dtype. check fam/llm/utils.py

README.md

feat: 2.5x inference speedup, faster than real-time

b529269

vatsalaggarwal requested a review from sidroopdaska February 22, 2024 17:28

vatsalaggarwal self-assigned this Feb 22, 2024

This was referenced Feb 22, 2024

deepspeed inference #68

Closed

Improving the latency #28

Closed

vatsalaggarwal changed the title ~~feat: 2.5x inference speedup, faster than real-time~~ feat: 2-3x inference speedup, faster than real-time Feb 22, 2024

sidroopdaska requested changes Feb 22, 2024

View reviewed changes

vatsalaggarwal and others added 16 commits February 22, 2024 18:12

fix pr comments

f9ec182

improve README

71f35ec

feat: add faster inferencing to serving.py

d7b0fb7

feat: faster inference on gradio app

b38b1a8

feat: support for bfloat16 & float16

f8b0fc9

feat: calc RTF

558ae1d

ckpt

bd16f69

update: error messaging

f8e92f7

update: output location

a23c5dd

feat: improving terminal messaging

14e4bc9

update: README.md

088e184

update: dockerfile

098757b

Merge remote-tracking branch 'origin/main' into vatsal/make_2_5x_faster

edce27e

fix: containerisation

cacfcde

fix: t4 detection

9feeb7d

update: README.md

9b6446e

sidroopdaska approved these changes Feb 25, 2024

View reviewed changes

sidroopdaska merged commit 26fc3df into main Feb 25, 2024

KoljaB mentioned this pull request Feb 28, 2024

Can you please add MetaVoice 1B KoljaB/RealtimeTTS#47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 2-3x inference speedup, faster than real-time #71

feat: 2-3x inference speedup, faster than real-time #71

vatsalaggarwal commented Feb 22, 2024 •

edited

Loading

sidroopdaska left a comment

sidroopdaska Feb 22, 2024

vatsalaggarwal Feb 22, 2024

feat: 2-3x inference speedup, faster than real-time #71

feat: 2-3x inference speedup, faster than real-time #71

Conversation

vatsalaggarwal commented Feb 22, 2024 • edited Loading

sidroopdaska left a comment

Choose a reason for hiding this comment

sidroopdaska Feb 22, 2024

Choose a reason for hiding this comment

vatsalaggarwal Feb 22, 2024

Choose a reason for hiding this comment

vatsalaggarwal commented Feb 22, 2024 •

edited

Loading