Skip to content

Releases: echogarden-project/echogarden

v2.0.3

13 Nov 12:06
Compare
Choose a tag to compare

Fixes

  • Update espeak-ng-emscripten to 0.3.2, which fixes incorrect detection of the global process object when resolving the data eSpeak-ng data path

Full Changelog: v2.0.2...v2.0.3

v2.0.2

13 Nov 11:44
Compare
Choose a tag to compare

Fixes

  • Update espeak-ng-emscripten to 0.3.1, which fixes a path resolution issue when with loading eSpeak-ng's data file on non-Windows systems

Full Changelog: v2.0.1...v2.0.2

v2.0.1

13 Nov 06:00
Compare
Choose a tag to compare

Fixes

  • Add workaround to prevent punycode-related warnings caused by the gaxios package (overrides version of whatwg-url to 14.0.0)

Internal

  • Remove several unused external packages
  • Replace html-escaper package with an internal implementation
  • Simplify UTF-32 decoder

Full Changelog: v2.0.0...v2.0.1

v2.0.0

10 Nov 06:47
Compare
Choose a tag to compare

New features

  • CLI: audio playback now using a newly developed Audio I/O package, which is based on 3 separate N-API addons (Windows, macOS and Linux). Each addon targets the native OS audio interface for the platform (MME for Windows, Core Audio for macOS and ALSA for Linux). Please report any audio playback issue you encounter! For example: no sound, distorted audio, crashes or other issues. It's likely they can be fixed relatively easily
  • CLI: new option --player to set audio player. Use --player=sox to switch back to the old player, if needed
  • Audio playback: when using the new player, now supports keyboard navigation with left arrow (1 second back), right arrow (1 second forward) and space bar (pause or resume)
  • Denoising: new denoising engine nsnet2 based on NSNet2 Noise Suppression models

Enhancements

  • DTW: now allows to set a relative window duration (calculated based on the total audio duration) by passing a percentage value like dtw.windowDuration=15%. On the API, this value should be passed as a string like '15%'
  • DTW: improve log messages
  • Synthesis: show total processing time
  • PCM format conversions: simplify operations for more efficient runtime
  • onnxruntime-node: updated to version 20.0.0
  • speex-resampler-wasm: updated to latest code and enable SIMD in Emscripten build. Improves sample rate conversion speed
  • rubberband-wasm: updated Rubberband to version 4.0.0

Fixes

  • Enable reading and writing files larger than 2 GiB
  • Workaround Node.js issue with dealing with Buffer objects larger than 4 GiB, by abandoning buffers and using Uint8Arrays instead
  • Fix minor issue with 24-bit sample conversion
  • Espeak: don't set voice if it is the same as the last voice set
  • CLI: better parsing union typed options
  • Fix some missing log messages

Behavioral changes:

  • Noise reduction (rnnoise and nsnet): resample denoised audio back to original sample rate
  • Source separation (mdx-net): resample output audio back to original sample rate
  • Denoising: use quality 0 when converting to the processing sample rate
  • API: now returns Uint8Arrays for audio buffers, instead of Node.js Buffer objects

Internal

  • Removed all internal usage of Node.js buffers, streams, and string processing methods, and replaced with portable JavaScript types and APIs like Uint8Array and TextEncoder/TextDecoder
  • Separated all file system operations to a dedicated module
  • All WASM packages now updated and recompiled to use ESM modules

Full Changelog: v1.8.7...v2.0.0

v1.8.7

17 Oct 13:45
Compare
Choose a tag to compare

Fixes

Full Changelog: v1.8.6...v1.8.7

v1.8.6

17 Oct 10:23
Compare
Choose a tag to compare

Fixes

  • espeak: ensure words are trimmed before converting them to fragments. Resolves issue with subtitle conversion failing to find a word in the text due to its surrounding whitespace

Full Changelog: v1.8.5...v1.8.6

v1.8.5

17 Oct 07:59
Compare
Choose a tag to compare

Fixes

  • Update to newer build of escript-ng-emscripten, with some removed code in espeakng_glue.cpp to prevent potential memory leaks. New build also has ALLOW_MEMORY_GROWTH option enabled, to ensure various kinds of out-of-memory errors are less likely to happen
  • espeak: don't set voice if it is the same as the last set voice. Avoid espeak-ng internal memory leak / fill issue with new voice structures being allocated but not correctly released

Full Changelog: v1.8.4...v1.8.5

v1.8.4

16 Oct 16:33
Compare
Choose a tag to compare

Enhancements

  • Performance improvements to Whisper's internal token alignment
  • Performance improvements to dtw alignment
  • Whisper: set default timestamp accuracy to high for the tiny and base models, and medium for the larger models

Full Changelog: v1.8.3...v1.8.4

v1.8.3

16 Oct 09:12
Compare
Choose a tag to compare

Fixes

  • Always use reduced attention head subset for the large-v3-turbo model (using all attention heads with this model doesn't seem to work at all)
  • Change default timestamp accuracy for whisper alignment to high (reasoning: whisper alignment works well with the default tiny and tiny.en models, and for those models, using all attention heads for token alignment isn't that expensive, so the extra computation is worth it, given the increase in accuracy)

Full Changelog: v1.8.2...v1.8.3

v1.8.2

15 Oct 21:51
Compare
Choose a tag to compare

Features

  • whisper: new option timestampAccuracy with possible values medium or high. medium uses a reduced subset of attention heads for alignment, which makes it fast to compute. high uses all attention heads for alignment, and is thus more accurate at the word level, but slower for larger models. Defaults to medium
  • whisper.cpp: new options temperature, temperatureIncrement, enableFlashAttention. Using flash attention can significantly improve performance in some cases. Note: enabling flash attention will automatically disable the enableDTW option since the two don't seem to work together

Fixes

  • whisper.cpp: derive correct model name for large-v3-turbo
  • whisper and whisper.cpp: error when model is set to large-v3-turbo and a translation task is requested (large-v3-turbo doesn't support translation tasks)

Full Changelog: v1.8.1...v1.8.2