[.NET microphone app] Audio from playback of AI interpreted as user input #21

Ben-Pattinson · 2024-10-04T12:17:48Z

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Using a desktop PC with speakers and a mic, NOT a headset, say something then watch the chaos unfold.
The AI mistakes it's own reply for your speech, then interrupts itself and replies to it's own reply. Again and again. This lasted about 10 seconds before it said good-by to itself and stopped.

Any log messages given by the failure

Expected/desired behavior

As per the OpenAI app with the advanced voice, the AI should be able to differentiate between it's own voice and someone interrupting it.

OS and Version?

Windows 10, but it would probably be on anything

Versions

It's the .net console implementation. That has the problem.

Mention any other details that might be useful

This probably has only been tested / considered with a headset situation. That has value, but for those of us working from home, many of us have invested in decent mic/speaker setups, to avoid the pain of headsets. Both the app on the phone and the playground both work fine with open mics. So this is possible to solve.

Thanks! We'll be in touch soon.

The text was updated successfully, but these errors were encountered:

trrwilson · 2024-10-04T15:54:46Z

Thanks, @Ben-Pattinson; it looks like this may be a limitation with NAudio's cross-platform integration with Windows's built-in AEC. I'll look into whether there's a good mitigation to make cancellation kick in appropriately without needing to specifically target Windows; if any astute readers have better audio abstractions, contributions are greatly welcomed!

It's also possible to turn the voice detection threshold up a bit to mitigate (TurnDetectionOptions on ConversationSessionOptions), but at some point that's not going to be adequate for true far-field use.

tlaukkanen · 2024-10-13T19:41:45Z

Most likely not only .NET issue. I'm also having the same chaos unfolding when running with Python on Linux, Raspberry Pi 4 together with mics on WM8960 Audio HAT and connected speakers. I was trying with these turn_detection settings:

turn_detection=ServerVAD(type="server_vad", threshold=0.5, prefix_padding_ms=200, silence_duration_ms=200)

Tried with various options like:

turn_detection=ServerVAD(type="server_vad", threshold=0.8, prefix_padding_ms=1000, silence_duration_ms=2000)

...but it's still picking up its' own voice as input and starts to babble with itself.

tlaukkanen · 2024-10-14T05:27:55Z

...ok, didn't think this through 😄 Most likely related to hardware setup then library itself. I should check for example the Pulseaudio echo cancellation settings for this to work :) Not sure if there is something similar on Windows.

trrwilson changed the title ~~Audio from playback of AI interpreted as user input~~ [.NET microphone app] Audio from playback of AI interpreted as user input Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[.NET microphone app] Audio from playback of AI interpreted as user input #21

[.NET microphone app] Audio from playback of AI interpreted as user input #21

Ben-Pattinson commented Oct 4, 2024 •

edited

Loading

Please provide us with the following information:

trrwilson commented Oct 4, 2024 •

edited

Loading

tlaukkanen commented Oct 13, 2024

tlaukkanen commented Oct 14, 2024

[.NET microphone app] Audio from playback of AI interpreted as user input #21

[.NET microphone app] Audio from playback of AI interpreted as user input #21

Comments

Ben-Pattinson commented Oct 4, 2024 • edited Loading

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

trrwilson commented Oct 4, 2024 • edited Loading

tlaukkanen commented Oct 13, 2024

tlaukkanen commented Oct 14, 2024

Ben-Pattinson commented Oct 4, 2024 •

edited

Loading

This issue is for a: (mark with an `x`)

trrwilson commented Oct 4, 2024 •

edited

Loading