Node.js API reference

Note: the API is not fully stable yet. It may change at every new version. There are many methods, types and internal data structures that are not yet exposed.

Importing as a Node.js module

To import the echograden package as a Node.js module:

Install as a dependency in your project:

npm install echogarden

Import with:

import * as Echogarden from 'echogarden'

All methods, properties and arguments have TypeScript type information. You can use it to get more detailed and up-to-date type information that may not be covered here.

Related pages

Options reference
List of all supported engines
Quick guide to the command line interface
WebSocket server reference

Text-to-speech

`synthesize(input, options, onSegment, onSentence)`

Synthesizes the given input.

input: text to synthesize, can be a string, or a string[]. When given an array of strings, the elements of the array would be seen as predefined segments (this is useful if you would like to have more control over how segments are split, or your input has a special format requiring a custom splitting method).
options: synthesis options object
onSegment: a callback that is called whenever a segment has been synthesized (optional)
onSentence: a callback that is called whenever a sentence has been synthesized (optional)

Returns (via promise):

{
	audio: RawAudio | Buffer
	timeline: Timeline
	language: string
}

audio may either be a

RawAudio object, which is a structure containing the sample rate and raw 32-bit float channels:

{
	sampleRate: number
	channels: Float32Array[]
}

A Uint8Array containing the audio in encoded form, in the case a particular codec was specified in the outputAudioFormat.codec option.

Segment and sentence event callbacks

You can optionally pass two async callbacks to synthesize, onSegment and onSentence.

For example:

async function onSegment(data: SynthesisSegmentEventData) {
	console.log(data.transcript)
}

const { audio } = await Echogarden.synthesize("Hello World!", { engine: 'espeak' }, onSegment)

SynthesisSegmentEventData is an object with the structure:

{
	index: number              // Index of part
	total: number              // Total number of parts

	audio: RawAudio | Buffer   // Audio for part

	timeline: Timeline         // Timeline for part
	transcript: string         // Transcript for part
	language: string           // Language for part

	peakDecibelsSoFar: number  // Peak decibels measured for all synthesized audio, so far
}

`requestVoiceList(options)`

Requests a list of voices for a particular engine.

options: voice list request options object

Returns (via promise):

{
	voiceList: SynthesisVoice[]
	bestMatchingVoice: SynthesisVoice
}

Speech-to-text

`recognize(input, options)`

Applies speech recognition to the input.

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
options: recognition options object

Returns (via promise):

{
	transcript: string

	timeline: Timeline
	wordTimeline: Timeline

	language: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

Speech-to-transcript alignment

`align(input, transcript, options)`

Aligns input audio with the given transcript.

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
transcript: the transcript to align to
options: alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	transcript: string
	language: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

Speech-to-text translation

`translateSpeech(input, options)`

Translates speech audio directly to a transcript in a different language (only English is currently supported).

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
options: speech translation options object

Returns (via promise):

{
	transcript: string
	timeline: Timeline
	wordTimeline?: Timeline

	sourceLanguage: string
	targetLanguage: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

Text-to-text translation

`translateText(input, options)`

Translates text to text.

input: string
options: text translation options object

Returns (via promise):

{
	text: string
	translatedText: string

	translationPairs: TranslationPair[]

	sourceLanguage: string
	targetLanguage: string
}

translationPairs is an array of objects corresponding to individual segments of the text and their translations.

Speech-to-translated-transcript alignment

`alignTranslation(input, translatedTranscript, options)`

Aligns input audio with the given translated transcript.

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
translatedTranscript: the translated transcript to align to
options: translation alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	translatedTranscript: string
	sourceLanguage: string
	targetLanguage: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

`alignTranscriptAndTranslation(input, transcript, translatedTranscript, options)`

Aligns input audio to both the native language transcript a translated one.

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
transcript: the transcript to align to, in the native speech language
translatedTranscript: the translated transcript to align to
options: transcript and translation alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	translatedTimeline: Timeline
	translatedWordTimeline: Timeline

	transcript: string
	translatedTranscript: string

	sourceLanguage: string
	targetLanguage: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

`alignTimelineTranslation(inputTimeline, translatedTranscript, options)`

Aligns given timeline with its translated transcript.

inputTimeline: input timeline in the native language
translatedTranscript: the translated transcript to align to
options: timeline translation alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	sourceLanguage?: string
	targetLanguage: string

	rawAudio?: RawAudio
}

Language detection

`detectSpeechLanguage(input, options)`

Detects language of spoken audio.

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
options: speech language detection options object

Returns (via promise):

{
	detectedLanguage: string
	detectedLanguageName: string
	detectedLanguageProbabilities: LanguageDetectionResults
}

`detectTextLanguage(input, options)`

Detects language of text.

input: input text as string
options: text language detection options object

Returns (via promise):

{
	detectedLanguage: string
	detectedLanguageName: string
	detectedLanguageProbabilities: LanguageDetectionResults
}

Voice activity detection

`detectVoiceActivity(input, options)`

Detects voice activity in audio (non-real-time).

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
options: voice activity detection options object

Returns (via promise):

{
	timeline: Timeline
}

Speech denoising

`denoise(input, options)`

Tries to reduce background noise in spoken audio.

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
options: denoising options object

Returns (via promise):

{
	denoisedAudio: RawAudio
}

Source separation

`isolate(input, options)`

Attempts to isolate an individual audio stem, like human voice, or one or more musical instruments (depending on model training), from the given waveform.

input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
options: source separation options object

Returns (via promise):

{
	inputRawAudio: RawAudio
	isolatedRawAudio: RawAudio
	backgroundRawAudio: RawAudio
}

Subtitles

`timelineToSubtitles(timeline, options)`

Converts a timeline to subtitles.

timeline: timeline object
options: subtitles configuration object

Returns:

Subtitle file content, as a string.

`subtitlesToTimeline(subtitles)`

Converts subtitles to a timeline.

subtitles: timeline object

Note: This function simply converts each individual cue to a segment entry in a timeline. Since subtitle cues may contain parts of sentences or phrases, this may not produce very useful results for your needs. However, you can use it as a means to parse a subtitle file (srt or vtt), and apply your own processing later.

Returns:

Timeline object.

Global options

`setGlobalOption(key, value)`

Sets a global option.

See the options reference for more details about the available global options.

`getGlobalOption(key)`

Gets a global option.

Returns:

The value associated with the given key.

TODO

Expose more methods that may be useful for developers, like phonemization, etc.
Expose audio playback used in CLI, possibly with timeline synchronization support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!