speech-recorder is a cross-platform, native node.js addon for getting a stream of audio from a device's microphone. Using speech-recorder, you can also get only the audio that corresponds to someone speaking.
This module is used for speech recognition in Serenade. Serenade enables you to write code through natural speech, rather than typing.
speech-recorder has been tested on Windows 10, macOS 10.14+, and Ubuntu 18.04+ (and may work on other platforms as well).
To install speech-recorder, run:
yarn add speech-recorder
If you're using this library with Electron, you should probably use electron-rebuild.
You can get a list of supported devices with:
import { getDevices } from "speech-recorder";
console.log(getDevices());
You can write all audio to a file with:
import { SpeechRecorder } from "speech-recorder";
const recorder = new SpeechRecorder();
const writeStream = fs.createWriteStream("audio.raw");
recorder.start({
onAudio: (audio) => {
writeStream.write(audio);
}
});
Or, just the speech with:
import { SpeechRecorder } from "speech-recorder";
const recorder = new SpeechRecorder({ sampleRate: 16000, framesPerBuffer: 320 });
const writeStream = fs.createWriteStream("audio.raw");
recorder.start({
onAudio: (audio, speech) => {
if (speech) {
writeStream.write(audio);
}
}
});
As you can see, onSpeech
will be called whenever speech is detected, and onAudio
will be called regardless (i.e., on every frame).
The SpeechRecorder
constructor supports the following options:
error
: callback called on audio stream error. defaults tonull
.framesPerBuffer
: the number of audio frames to read at a time. defaults to320
.highWaterMark
: thehighWaterMark
to be applied to the underlying stream, or how much audio can be buffered in memory. defaults to64000
(64kb).leadingPadding
: the number of frames to buffer at the start of a speech chunk. this can be prevent audio at the start of the file from getting cut off. defaults to30
.level
: the VAD aggressiveness level on a scale of 0-3, with 0 being the least aggressive and 3 being the most aggressive. defaults to3
.sampleRate
: the sample rate for the audio; must be 8000, 16000, 32000, or 48000. defaults to16000
.speakingThreshold
: the number of consecutive speaking frames before considering speech to have started.silenceThreshold
: the number of consecutive non-speaking frames before considering speech to be finished.triggers
: a list ofTrigger
objects that can optionally specify when theonTrigger
callback is executed.
The start
method supports the following options:
deviceId
:id
value fromgetDevices
corresponding to the device you want to use; a value of-1
uses the default device.onAudio
: a callback to be executed when audio data is received from the mic.onChunkStart
: a callback to be executed when a speech chunk starts. will be passed the leading buffer, whose size is determined byleadingPadding
.onChunkEnd
: a callback to be executed when a speech chunk ends.onTrigger
: a callback to be executed when a trigger threshold is met.
See the examples/
directory for example usages.
- speech-recorder uses PortAudio for native microphone access.
- speech-recorder uses webrtcvad for detecting voice.
- speech-recorder is based on node-portaudio, which in turn is based on naudiodon.