Skip to content

Streamed Voice Agent Demo - Multiple Performance Issues #301

Open
@muhammadsmalik

Description

@muhammadsmalik

Streamed Voice Agent Demo - Multiple Performance Issues

Description

The streamed voice agent demo is experiencing several critical issues that affect its usability and functionality:

  1. High Latency: There is a significant delay (3-4 seconds) before receiving responses.
  2. Language Switching: The agent randomly switches to Spanish during conversations.
  3. Over-sensitivity: The agent frequently detects speech and provides incorrect descriptions even when no one is speaking.
  4. Interruption Issues: The agent cannot be interrupted despite semantic_vad apparently being implemented in the code.

Steps to Reproduce

  1. Launch the streamed voice agent demo
  2. Attempt to engage in conversation with the agent
  3. Observe the delay between speaking and receiving a response
  4. Continue conversation for several exchanges to observe language switching
  5. Remain silent for periods to observe false speech detection
  6. Try to interrupt the agent while it's speaking

Expected Behavior

  • Responses should begin within 1 second of user input
  • The agent should maintain the initially selected language throughout the conversation
  • Speech detection should only activate when actual speech is present
  • The semantic_vad feature should allow interruption of the agent's responses

Actual Behavior

  • Responses take 3-4 seconds to begin after user input
  • The agent randomly switches to Spanish during English conversations
  • The agent frequently reports detecting speech and provides descriptions when no one is speaking
  • The agent cannot be interrupted despite the apparent implementation of semantic_vad

Technical Details

From code inspection, semantic_vad appears to be implemented but is not functioning as expected. This suggests a potential issue with how the feature is integrated or configured in the current build.

Additional Notes

These issues significantly impact the user experience and demonstration value of the agent. The latency and language switching problems are particularly disruptive during presentations.

Possible Solutions

  • Investigate streaming optimization to reduce latency
  • Check language model configuration for potential causes of language switching
  • Adjust speech detection sensitivity parameters
  • Review semantic_vad implementation to ensure proper configuration

Priority

High - These issues prevent effective demonstration of the voice agent's capabilities.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions