Skip to content

JVM library for voice activity detection written in Kotlin based on C library fvad and Silero

License

Notifications You must be signed in to change notification settings

numq/voice-activity-detection

Repository files navigation

Voice Activity Detection

JVM library for voice activity detection written in Kotlin based on the C library libfvad and ML model Silero

See also

When to use

Note

For best results, it is recommended to apply noise reduction to the input data.

libfvad

Detects any audio activity, regardless of the sound type. The detection behavior depends on the selected mode. Suitable for general voice activity detection.

Silero

Detects voice activity specifically containing human speech. Best for speech-focused tasks like transcription and voice-controlled systems.

Features

  • Detects voice activity in PCM audio data
  • Supports any sampling rate and number of channels due to resampling and downmixing
  • Supports different detection modes to balance between sensitivity and accuracy (fvad)

Installation

  • Download latest release

  • Add library dependency

    dependencies {
         implementation(file("/path/to/jar"))
    }

libfvad

  • Unzip binaries

Silero

  • Add ONNX dependency
    dependencies {
         implementation("com.microsoft.onnxruntime:onnxruntime:1.20.0")
    }

Usage

See the example module for implementation details

TL;DR

  • Call detect to process the input data, use isContinuous = true with streaming audio

Step-by-step

  • Load binaries if you are going to use fvad

    VoiceActivityDetection.Fvad.load(libfvad = "/path/to/libfvad", voiceActivityDetection = "/path/to/voice-activity-detection")
  • Create an instance

    fvad

    VoiceActivityDetection.Fvad.create()

    Silero

    VoiceActivityDetection.Silero.create()
  • Call inputSizeForMillis to get the input data size for N milliseconds

  • Call minimumInputSize to get the audio producer buffer size for real-time detection

  • Call detect passing the input data, sample rate and number of channels as arguments

  • Call reset to reset the internal state - for example when the audio source changes

  • Call close to release resources

Requirements

  • JVM version 9 or higher

License

This project is licensed under the Apache License 2.0

Acknowledgments