-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vad patch #1369
base: master
Are you sure you want to change the base?
Vad patch #1369
Conversation
|
OK, I'll sort it out. The another LFR that affects the recognition result can be processed separately. |
@@ -102,12 +101,12 @@ bool SileroVadModelConfig::Validate() const { | |||
std::string SileroVadModelConfig::ToString() const { | |||
std::ostringstream os; | |||
|
|||
os << "SileroVadModelConfig("; | |||
os << "SilerVadModelConfig("; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't change it.
@@ -31,8 +31,7 @@ void SileroVadModelConfig::Register(ParseOptions *po) { | |||
po->Register( | |||
"silero-vad-max-speech-duration", &max_speech_duration, | |||
"In seconds. If a speech segment is longer than this value, then we " | |||
"increase the threshold to 0.9. After finishing detecting the segment, " | |||
"the threshold value is reset to its original value."); | |||
"cut a segment."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't remove it.
// the threshold to 0.9. After finishing detecting the segment, | ||
// the threshold value is reset to its original value. | ||
float max_speech_duration = 20; // in seconds | ||
float max_speech_duration = 20; // in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't remove the comments.
min_silence_samples_ = | ||
sample_rate_ * config_.silero_vad.min_silence_duration; | ||
(int32_t)(sample_rate_ * config_.silero_vad.min_silence_duration); | ||
|
||
min_speech_samples_ = sample_rate_ * config_.silero_vad.min_speech_duration; | ||
min_speech_samples_ = | ||
(int32_t)(sample_rate_ * config_.silero_vad.min_speech_duration); | ||
|
||
max_speech_samples_ = | ||
(int32_t)(sample_rate_ * config_.silero_vad.max_speech_duration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to make such changes?
By the way, could you describe the issue this PR tries to fix? |
Optimized the sharding logic, the SpeechSegment time period is controlled in the voice-activity-detector module, and the export of the last flush is more concise