OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News #559

irthomasthomas · 2024-02-20T15:22:49Z

OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News

TITLE: OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News

OpenAI have a voice powered chat mode in their app and there's a noticeable delay of a few seconds between finishing your sentence and the bot starting to speak.
I think the problem is that for realistic TTS you need quite a few tokens because the prosody can be affected by tokens that come a fair bit further down the sentence, consider the difference in pitch between:
"The war will be long and bloody"
vs
"The war will be long and bloody?"
So to begin TTS you need quite a lot of tokens, which in turn means you have to digest the prompt and run a whole bunch of forward passes before you can start rendering. And of course you have to keep up with the speed of regular speech, which OpenAI sometimes struggles with.
That said, the gap isn't huge. Many apps won't need it. Some use cases where low latency might matter:

Phone support.
Trading. Think digesting a press release into an action a few seconds faster than your competitors.
Agents that listen in to conversations and "butt in" when they have something useful to say.
RPGs where you can talk to NPCs in realtime.
Real-time analysis of whatever's on screen on your computing device.
Auto-completion.
Using AI as a general command prompt. Think AI bash.
Undoubtably there will be a lot more though. When you give people performance, they find ways to use it.

URL: Hacker News

Suggested labels

{'label-name': 'real-time processing', 'label-description': 'Refers to tasks or systems that operate instantaneously, such as voice chat applications with minimal delays.', 'confidence': 51.49}

irthomasthomas added AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models Automation Automate the things KDE New-Label Choose this option if the existing labels are insufficient to describe the content accurately and removed KDE labels Feb 20, 2024

irthomasthomas removed the voice-assistants label Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News #559

OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News #559

irthomasthomas commented Feb 20, 2024

OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News #559

OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News #559

Comments

irthomasthomas commented Feb 20, 2024

TITLE: OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News

Suggested labels

{'label-name': 'real-time processing', 'label-description': 'Refers to tasks or systems that operate instantaneously, such as voice chat applications with minimal delays.', 'confidence': 51.49}