Vad patch #1369

laochen · 2024-09-22T13:05:29Z

Optimized the sharding logic, the SpeechSegment time period is controlled in the voice-activity-detector module, and the export of the last flush is more concise

csukuangfj · 2024-09-22T13:30:05Z

Could you revert unrelated changes?
Please remove debug statements.
Please use English comments.

laochen · 2024-09-26T03:22:08Z

OK, I'll sort it out. The another LFR that affects the recognition result can be processed separately.

csukuangfj · 2024-10-29T06:53:20Z

sherpa-onnx/csrc/silero-vad-model-config.cc

@@ -102,12 +101,12 @@ bool SileroVadModelConfig::Validate() const {
 std::string SileroVadModelConfig::ToString() const {
  std::ostringstream os;

-  os << "SileroVadModelConfig(";
+  os << "SilerVadModelConfig(";


please don't change it.

csukuangfj · 2024-10-29T06:53:41Z

sherpa-onnx/csrc/silero-vad-model-config.cc

@@ -31,8 +31,7 @@ void SileroVadModelConfig::Register(ParseOptions *po) {
  po->Register(
      "silero-vad-max-speech-duration", &max_speech_duration,
      "In seconds. If a speech segment is longer than this value, then we "
-      "increase the threshold to 0.9. After finishing detecting the segment, "
-      "the threshold value is reset to its original value.");
+      "cut a segment.");  


please don't remove it.

csukuangfj · 2024-10-29T06:53:52Z

sherpa-onnx/csrc/silero-vad-model-config.h

-  // the threshold to 0.9. After finishing detecting the segment,
-  // the threshold value is reset to its original value.
-  float max_speech_duration = 20;  // in seconds
+  float max_speech_duration = 20;  // in seconds  


please don't remove the comments.

csukuangfj · 2024-10-29T06:54:18Z

sherpa-onnx/csrc/silero-vad-model.cc

    min_silence_samples_ =
-        sample_rate_ * config_.silero_vad.min_silence_duration;
+        (int32_t)(sample_rate_ * config_.silero_vad.min_silence_duration);

-    min_speech_samples_ = sample_rate_ * config_.silero_vad.min_speech_duration;
+    min_speech_samples_ =
+        (int32_t)(sample_rate_ * config_.silero_vad.min_speech_duration);
+
+    max_speech_samples_ =
+        (int32_t)(sample_rate_ * config_.silero_vad.max_speech_duration);


Is there a reason to make such changes?

csukuangfj · 2024-10-30T13:07:31Z

By the way, could you describe the issue this PR tries to fix?

laochen added 3 commits September 22, 2024 19:28

Optimize segment fragmentation and flush

e1106cf

another LFR

c6f332e

remove debug code

5a7f0a7

laochen added 2 commits September 26, 2024 11:42

Delete Chinese comments

6ddbdee

LFR module restored

b68424d

csukuangfj requested changes Oct 29, 2024

View reviewed changes

Update voice-activity-detector.cc

a7f1a7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vad patch #1369

Vad patch #1369

laochen commented Sep 22, 2024

csukuangfj commented Sep 22, 2024

laochen commented Sep 26, 2024

csukuangfj Oct 29, 2024

csukuangfj Oct 29, 2024

csukuangfj Oct 29, 2024

csukuangfj Oct 29, 2024

csukuangfj commented Oct 30, 2024

Vad patch #1369

Are you sure you want to change the base?

Vad patch #1369

Conversation

laochen commented Sep 22, 2024

csukuangfj commented Sep 22, 2024

laochen commented Sep 26, 2024

csukuangfj Oct 29, 2024

Choose a reason for hiding this comment

csukuangfj Oct 29, 2024

Choose a reason for hiding this comment

csukuangfj Oct 29, 2024

Choose a reason for hiding this comment

csukuangfj Oct 29, 2024

Choose a reason for hiding this comment

csukuangfj commented Oct 30, 2024