Skip to content

Latest commit

 

History

History
411 lines (282 loc) · 9.86 KB

API.md

File metadata and controls

411 lines (282 loc) · 9.86 KB

Node.js API reference

Note: the API is not fully stable yet. It may change at every new version. There are many methods, types and internal data structures that are not yet exposed.

Importing as a Node.js module

To import the echograden package as a Node.js module:

Install as a dependency in your project:

npm install echogarden

Import with:

import * as Echogarden from 'echogarden'

All methods, properties and arguments have TypeScript type information. You can use it to get more detailed and up-to-date type information that may not be covered here.

Related pages

Text-to-speech

synthesize(input, options, onSegment, onSentence)

Synthesizes the given input.

  • input: text to synthesize, can be a string, or a string[]. When given an array of strings, the elements of the array would be seen as predefined segments (this is useful if you would like to have more control over how segments are split, or your input has a special format requiring a custom splitting method).
  • options: synthesis options object
  • onSegment: a callback that is called whenever a segment has been synthesized (optional)
  • onSentence: a callback that is called whenever a sentence has been synthesized (optional)

Returns (via promise):

{
	audio: RawAudio | Buffer
	timeline: Timeline
	language: string
}

audio may either be a

  • RawAudio object, which is a structure containing the sample rate and raw 32-bit float channels:
{
	sampleRate: number
	channels: Float32Array[]
}
  • A Uint8Array containing the audio in encoded form, in the case a particular codec was specified in the outputAudioFormat.codec option.

Segment and sentence event callbacks

You can optionally pass two async callbacks to synthesize, onSegment and onSentence.

For example:

async function onSegment(data: SynthesisSegmentEventData) {
	console.log(data.transcript)
}

const { audio } = await Echogarden.synthesize("Hello World!", { engine: 'espeak' }, onSegment)

SynthesisSegmentEventData is an object with the structure:

{
	index: number              // Index of part
	total: number              // Total number of parts

	audio: RawAudio | Buffer   // Audio for part

	timeline: Timeline         // Timeline for part
	transcript: string         // Transcript for part
	language: string           // Language for part

	peakDecibelsSoFar: number  // Peak decibels measured for all synthesized audio, so far
}

requestVoiceList(options)

Requests a list of voices for a particular engine.

  • options: voice list request options object

Returns (via promise):

{
	voiceList: SynthesisVoice[]
	bestMatchingVoice: SynthesisVoice
}

Speech-to-text

recognize(input, options)

Applies speech recognition to the input.

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • options: recognition options object

Returns (via promise):

{
	transcript: string

	timeline: Timeline
	wordTimeline: Timeline

	language: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

Speech-to-transcript alignment

align(input, transcript, options)

Aligns input audio with the given transcript.

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • transcript: the transcript to align to
  • options: alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	transcript: string
	language: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

Speech-to-text translation

translateSpeech(input, options)

Translates speech audio directly to a transcript in a different language (only English is currently supported).

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • options: speech translation options object

Returns (via promise):

{
	transcript: string
	timeline: Timeline
	wordTimeline?: Timeline

	sourceLanguage: string
	targetLanguage: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

Text-to-text translation

translateText(input, options)

Translates text to text.

  • input: string
  • options: text translation options object

Returns (via promise):

{
	text: string
	translatedText: string

	translationPairs: TranslationPair[]

	sourceLanguage: string
	targetLanguage: string
}

translationPairs is an array of objects corresponding to individual segments of the text and their translations.

Speech-to-translated-transcript alignment

alignTranslation(input, translatedTranscript, options)

Aligns input audio with the given translated transcript.

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • translatedTranscript: the translated transcript to align to
  • options: translation alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	translatedTranscript: string
	sourceLanguage: string
	targetLanguage: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

alignTranscriptAndTranslation(input, transcript, translatedTranscript, options)

Aligns input audio to both the native language transcript a translated one.

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • transcript: the transcript to align to, in the native speech language
  • translatedTranscript: the translated transcript to align to
  • options: transcript and translation alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	translatedTimeline: Timeline
	translatedWordTimeline: Timeline

	transcript: string
	translatedTranscript: string

	sourceLanguage: string
	targetLanguage: string

	inputRawAudio: RawAudio
	isolatedRawAudio?: RawAudio
	backgroundRawAudio?: RawAudio
}

alignTimelineTranslation(inputTimeline, translatedTranscript, options)

Aligns given timeline with its translated transcript.

  • inputTimeline: input timeline in the native language
  • translatedTranscript: the translated transcript to align to
  • options: timeline translation alignment options object

Returns (via promise):

{
	timeline: Timeline
	wordTimeline: Timeline

	sourceLanguage?: string
	targetLanguage: string

	rawAudio?: RawAudio
}

Language detection

detectSpeechLanguage(input, options)

Detects language of spoken audio.

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • options: speech language detection options object

Returns (via promise):

{
	detectedLanguage: string
	detectedLanguageName: string
	detectedLanguageProbabilities: LanguageDetectionResults
}

detectTextLanguage(input, options)

Detects language of text.

  • input: input text as string
  • options: text language detection options object

Returns (via promise):

{
	detectedLanguage: string
	detectedLanguageName: string
	detectedLanguageProbabilities: LanguageDetectionResults
}

Voice activity detection

detectVoiceActivity(input, options)

Detects voice activity in audio (non-real-time).

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • options: voice activity detection options object

Returns (via promise):

{
	timeline: Timeline
}

Speech denoising

denoise(input, options)

Tries to reduce background noise in spoken audio.

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • options: denoising options object

Returns (via promise):

{
	denoisedAudio: RawAudio
}

Source separation

isolate(input, options)

Attempts to isolate an individual audio stem, like human voice, or one or more musical instruments (depending on model training), from the given waveform.

  • input: can be an audio file path (string), encoded audio (Buffer or Uint8array) or a raw audio object (RawAudio)
  • options: source separation options object

Returns (via promise):

{
	inputRawAudio: RawAudio
	isolatedRawAudio: RawAudio
	backgroundRawAudio: RawAudio
}

Subtitles

timelineToSubtitles(timeline, options)

Converts a timeline to subtitles.

  • timeline: timeline object
  • options: subtitles configuration object

Returns:

Subtitle file content, as a string.

subtitlesToTimeline(subtitles)

Converts subtitles to a timeline.

  • subtitles: timeline object

Note: This function simply converts each individual cue to a segment entry in a timeline. Since subtitle cues may contain parts of sentences or phrases, this may not produce very useful results for your needs. However, you can use it as a means to parse a subtitle file (srt or vtt), and apply your own processing later.

Returns:

Timeline object.

Global options

setGlobalOption(key, value)

Sets a global option.

See the options reference for more details about the available global options.

getGlobalOption(key)

Gets a global option.

Returns:

The value associated with the given key.

TODO

  • Expose more methods that may be useful for developers, like phonemization, etc.
  • Expose audio playback used in CLI, possibly with timeline synchronization support.