Skip to content

High latency on a L4 GPU #229

Open
Open
@2010b9

Description

@2010b9

Due diligence

  • I have done my due diligence in trying to find the answer myself.

Topic

The PyTorch implementation

Question

Hello!

First of all, congrats! I've been doing some research about open-source speech-to-speech models and yours is by far the most natural one – I'm really excited to see your upcoming developments!

My question is about some high latency I'm experiencing on a L4 GPU when I start the server with python -m moshi.server on a GCP VM instance with a L4 GPU. On the README.md, you state that Moshi achieves a theoretical latency of 160ms (80ms for the frame size of Mimi + 80ms of acoustic delay), with a practical overall latency as low as 200ms on an **L4 GPU**.
As you can see in the image below I'm experiencing latencies up to 11ms. The latency starts to increase as the conversation progresses and I reached the 11ms at about 1min and 42s of conversation.

Image

Do you know what I'm doing wrong?

Note: I'm still a noob in these topics, but very excited and eager to learn!

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions