Fire `onSpeechEnd` or `onVadMisfire` on pause #63

HayatoYagi · 2023-11-28T15:20:46Z

Fix #30 based on #45

Currently, the last speech segment can be discarded on pause. This PR fixes the behavior to treat a pause as something similar to stopping speaking.

HayatoYagi · 2023-11-28T15:25:14Z

I wanted to test this PR with my updated test site so the base branch is #62, but these two PRs have nothing to do with each other. I will rebase to master after #62 is addressed.

ricky0123 · 2023-12-02T20:44:39Z

Hi @HayatoYagi , thanks for opening this PR. This would be a breaking change, though, and so I would not want to merge it as is. I would be open to enabling this behavior via the method I suggested here, or really any way that doesn't change the default behavior.

HayatoYagi · 2023-12-04T04:30:53Z

@ricky0123
In my opinion, the current behavior is a bug, and this change is acceptable. The difference in timing when pressing the pause button for just one frame, which is uncontrollable for the users, after finishing a long speech leads to the inconsistency whether entire speech is discarded or not. It creates significant confusion. I wonder if there are cases where the current behavior is desired. Additionally, I believe adding options indiscriminately has a negative impact on the developer experience. However, if you still prefer not to change the current behavior, I will explore ways to add an option based on your suggestion!

HayatoYagi · 2023-12-04T14:12:30Z

This change does not require any adjustments on the calling side, so I would categorize it as a bug fix or a minor specification change rather than a "breaking change".
Actually, we were surprised and confused when we first observed this behavior, as the audio segment just before the pause was completely lost.

ricky0123 · 2023-12-04T16:16:47Z

Hi @HayatoYagi

The difference in timing when pressing the pause button for just one frame, which is uncontrollable for the users, after finishing a long speech leads to the inconsistency whether entire speech is discarded or not.

I don't quite understand what you mean here, can you explain?

In my view, the pause button is something the user presses to stop all functionality associated with the VAD, not to clip and submit the current audio.

In any case, it's a substantial change from the default behavior, so I don't want to do it. Possibly a better solution than an extra parameter to the onSpeechEnd callback is to add a new option at initialization. Eg

myvad = await MicVAD.new({ submitUserSpeechOnPause: true })

or something similar. I agree with you that each option we add increases the complexity for developers, but this is clearly an important issue about which people have different intuitions, so it's an important option to add. And I do not want to change the default behavior right now. If we had enough of a community on Discord, I would maybe leave it to a poll, but we have few enough people on Discord that I wouldn't trust the poll to be representative. So let's just implement the initialization option unless you have a different idea.

HayatoYagi · 2023-12-05T02:26:07Z

I don't quite understand what you mean here, can you explain?

1a. The user finishes speaking.
2a. Press the pause button after redemptionFrames have elapsed.
3a. onSpeechEnd is triggered.

1b. The user finishes speaking.
2b. Press the pause button when redemptionFrames-1 have elapsed.
3b. The entire last speech segment is discarded.

The difference between 2a and 2b is only a 1-frame timing distinction. While users shouldn't be conscious of the redemption frame, they are required to wait for the redemption frame to elapse before pressing the pause button.

HayatoYagi · 2023-12-13T11:12:31Z

@ricky0123
I've added an option to enable this change.

If we had enough of a community on Discord, I would maybe leave it to a poll

How many people do you think is enough?

ricky0123 · 2023-12-15T02:03:49Z

Thanks for your PR @HayatoYagi I'll try to look at it tomorrow.

ricky0123 · 2023-12-15T23:37:57Z

Hi @HayatoYagi thanks for your contribution. I refactored it a little, added a couple of test pages, merged, and published it.

Thank you for clarifying what you meant about the redemption frames. I can see both perspectives. I guess it depends partially on why you think the user is pressing the pause button. For my use case, it was mostly something they would press if they were interrupted and wanted to discard the current speech segment (i.e. immediately stopping all functionality associated with the vad). I do agree that if you are using a long redemption period and they hit pause towards the end of it, it would produce an unexpected result from their perspective.

I haven't thought concretely about how many people I would want to participate in a poll. Hopefully, this PR will satisfy most people.

Thanks!

antoine-lizee · 2024-02-09T12:09:55Z

Hello 👋

After testing, we are losing some capabilities compared to what we had in my fork.

The main thing is that we can't know at onSpeechEnd if it comes from a toggle or a natural redemption timeout.

In my implementation, I set vad.listening to false before the callback is executed, so we can use it as a proxy (if you're not listening, you got "stopped").
In your implementation, everything happens at the same time (because you trigger self.endSegment() directly)

This is important because I want to remove all the redemption frames in the natural case, and not in the forced-stop case.

We have 2 options:

Give visibility on this natural vs forced case, typically with something like earlyStop in onSpeechEnd()
Do the trim of the redemption frames in the VAD directly, which would make sense! We could even do it in both natural and forced stop cases, by removing all the redemption frames that have been measured, whenever we stop. (we probably want to keep n frames)

@ricky0123, @HayatoYagi, what do you think?

HayatoYagi and others added 4 commits November 27, 2023 13:45

Update build-test-site script

7120dd7

Update test-site logs

9e817ae

enh: fire onSpeechEnd or Misfires when paused

dfe81f8

not trigger misfire when not speaking

77fd0fd

HayatoYagi requested a review from ricky0123 November 28, 2023 15:20

HayatoYagi self-assigned this Nov 28, 2023

HayatoYagi changed the base branch from update-test-site to master November 28, 2023 15:24

HayatoYagi changed the base branch from master to update-test-site November 28, 2023 15:25

Base automatically changed from update-test-site to master December 2, 2023 17:50

HayatoYagi added 3 commits December 13, 2023 16:21

Merge branch 'master' into on-pause-message

e75ad88

Add submitUserSpeechOnPause option

4fa640b

remove debug log

3e6810b

refactor for new option

cd81223

ricky0123 approved these changes Dec 15, 2023

View reviewed changes

ricky0123 merged commit d34fcd6 into master Dec 15, 2023

ricky0123 mentioned this pull request Dec 23, 2023

Fire onSpeechEnd or Misfires when paused #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fire `onSpeechEnd` or `onVadMisfire` on pause #63

Fire `onSpeechEnd` or `onVadMisfire` on pause #63

HayatoYagi commented Nov 28, 2023

HayatoYagi commented Nov 28, 2023 •

edited

Loading

ricky0123 commented Dec 2, 2023

HayatoYagi commented Dec 4, 2023

HayatoYagi commented Dec 4, 2023

ricky0123 commented Dec 4, 2023

HayatoYagi commented Dec 5, 2023

HayatoYagi commented Dec 13, 2023

ricky0123 commented Dec 15, 2023

ricky0123 commented Dec 15, 2023

antoine-lizee commented Feb 9, 2024

Fire onSpeechEnd or onVadMisfire on pause #63

Fire onSpeechEnd or onVadMisfire on pause #63

Conversation

HayatoYagi commented Nov 28, 2023

HayatoYagi commented Nov 28, 2023 • edited Loading

ricky0123 commented Dec 2, 2023

HayatoYagi commented Dec 4, 2023

HayatoYagi commented Dec 4, 2023

ricky0123 commented Dec 4, 2023

HayatoYagi commented Dec 5, 2023

HayatoYagi commented Dec 13, 2023

ricky0123 commented Dec 15, 2023

ricky0123 commented Dec 15, 2023

antoine-lizee commented Feb 9, 2024

Fire `onSpeechEnd` or `onVadMisfire` on pause #63

Fire `onSpeechEnd` or `onVadMisfire` on pause #63

HayatoYagi commented Nov 28, 2023 •

edited

Loading