JVM library for voice activity detection written in Kotlin based on the C library libfvad and ML model Silero
-
Stretch to change the speed of audio without changing the pitch
-
Speech recognition to transcribe audio to text
-
Speech generation to generate voice audio from text
-
Text generation to generate text from prompt
-
Noise reduction to remove noise from audio
Note
For best results, it is recommended to apply noise reduction to the input data.
Detects any audio activity, regardless of the sound type. The detection behavior depends on the selected mode. Suitable for general voice activity detection.
Detects voice activity specifically containing human speech. Best for speech-focused tasks like transcription and voice-controlled systems.
- Detects voice activity in PCM audio data
- Supports any sampling rate and number of channels due to resampling and downmixing
- Supports different detection modes to balance between sensitivity and accuracy (fvad)
-
Download latest release
-
Add library dependency
dependencies { implementation(file("/path/to/jar")) }
- Unzip binaries
- Add ONNX dependency
dependencies { implementation("com.microsoft.onnxruntime:onnxruntime:1.20.0") }
See the example module for implementation details
- Call
detect
to process the input data, useisContinuous = true
with streaming audio
-
Load binaries if you are going to use fvad
VoiceActivityDetection.Fvad.load(libfvad = "/path/to/libfvad", voiceActivityDetection = "/path/to/voice-activity-detection")
-
Create an instance
VoiceActivityDetection.Fvad.create()
VoiceActivityDetection.Silero.create()
-
Call
inputSizeForMillis
to get the input data size for N milliseconds -
Call
minimumInputSize
to get the audio producer buffer size for real-time detection -
Call
detect
passing the input data, sample rate and number of channels as arguments -
Call
reset
to reset the internal state - for example when the audio source changes -
Call
close
to release resources
- JVM version 9 or higher
This project is licensed under the Apache License 2.0