Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News #559

Open
1 task
irthomasthomas opened this issue Feb 20, 2024 · 0 comments
Labels
AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models Automation Automate the things New-Label Choose this option if the existing labels are insufficient to describe the content accurately

Comments

@irthomasthomas
Copy link
Owner

TITLE: OpenAI have a voice powered chat mode in their app and there's a noticeable dela... | Hacker News

DESCRIPTION:
mike_hearn 5 hours ago | parent | context | flag | favorite | on: Groq runs Mixtral 8x7B-32k with 500 T/s

OpenAI have a voice powered chat mode in their app and there's a noticeable delay of a few seconds between finishing your sentence and the bot starting to speak.
I think the problem is that for realistic TTS you need quite a few tokens because the prosody can be affected by tokens that come a fair bit further down the sentence, consider the difference in pitch between:
"The war will be long and bloody"
vs
"The war will be long and bloody?"
So to begin TTS you need quite a lot of tokens, which in turn means you have to digest the prompt and run a whole bunch of forward passes before you can start rendering. And of course you have to keep up with the speed of regular speech, which OpenAI sometimes struggles with.
That said, the gap isn't huge. Many apps won't need it. Some use cases where low latency might matter:

  • Phone support.
  • Trading. Think digesting a press release into an action a few seconds faster than your competitors.
  • Agents that listen in to conversations and "butt in" when they have something useful to say.
  • RPGs where you can talk to NPCs in realtime.
  • Real-time analysis of whatever's on screen on your computing device.
  • Auto-completion.
  • Using AI as a general command prompt. Think AI bash.
    Undoubtably there will be a lot more though. When you give people performance, they find ways to use it.

URL: Hacker News

Suggested labels

{'label-name': 'real-time processing', 'label-description': 'Refers to tasks or systems that operate instantaneously, such as voice chat applications with minimal delays.', 'confidence': 51.49}

@irthomasthomas irthomasthomas added AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models Automation Automate the things KDE New-Label Choose this option if the existing labels are insufficient to describe the content accurately and removed KDE labels Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models Automation Automate the things New-Label Choose this option if the existing labels are insufficient to describe the content accurately
Projects
None yet
Development

No branches or pull requests

1 participant