-
-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fire onSpeechEnd
or onVadMisfire
on pause
#63
Conversation
Hi @HayatoYagi , thanks for opening this PR. This would be a breaking change, though, and so I would not want to merge it as is. I would be open to enabling this behavior via the method I suggested here, or really any way that doesn't change the default behavior. |
@ricky0123 |
This change does not require any adjustments on the calling side, so I would categorize it as a bug fix or a minor specification change rather than a "breaking change". |
Hi @HayatoYagi
I don't quite understand what you mean here, can you explain? In my view, the pause button is something the user presses to stop all functionality associated with the VAD, not to clip and submit the current audio. In any case, it's a substantial change from the default behavior, so I don't want to do it. Possibly a better solution than an extra parameter to the onSpeechEnd callback is to add a new option at initialization. Eg myvad = await MicVAD.new({ submitUserSpeechOnPause: true }) or something similar. I agree with you that each option we add increases the complexity for developers, but this is clearly an important issue about which people have different intuitions, so it's an important option to add. And I do not want to change the default behavior right now. If we had enough of a community on Discord, I would maybe leave it to a poll, but we have few enough people on Discord that I wouldn't trust the poll to be representative. So let's just implement the initialization option unless you have a different idea. |
1a. The user finishes speaking. 1b. The user finishes speaking. The difference between 2a and 2b is only a 1-frame timing distinction. While users shouldn't be conscious of the redemption frame, they are required to wait for the redemption frame to elapse before pressing the pause button. |
@ricky0123
How many people do you think is enough? |
Thanks for your PR @HayatoYagi I'll try to look at it tomorrow. |
Hi @HayatoYagi thanks for your contribution. I refactored it a little, added a couple of test pages, merged, and published it. Thank you for clarifying what you meant about the redemption frames. I can see both perspectives. I guess it depends partially on why you think the user is pressing the pause button. For my use case, it was mostly something they would press if they were interrupted and wanted to discard the current speech segment (i.e. immediately stopping all functionality associated with the vad). I do agree that if you are using a long redemption period and they hit pause towards the end of it, it would produce an unexpected result from their perspective. I haven't thought concretely about how many people I would want to participate in a poll. Hopefully, this PR will satisfy most people. Thanks! |
Hello 👋 After testing, we are losing some capabilities compared to what we had in my fork. The main thing is that we can't know at
This is important because I want to remove all the redemption frames in the natural case, and not in the forced-stop case. We have 2 options:
@ricky0123, @HayatoYagi, what do you think? |
Fix #30 based on #45
Currently, the last speech segment can be discarded on pause. This PR fixes the behavior to treat a pause as something similar to stopping speaking.