Releases: echogarden-project/echogarden
Releases · echogarden-project/echogarden
v2.0.3
v2.0.2
Fixes
- Update
espeak-ng-emscripten
to0.3.1
, which fixes a path resolution issue when with loading eSpeak-ng's data file on non-Windows systems
Full Changelog: v2.0.1...v2.0.2
v2.0.1
Fixes
- Add workaround to prevent
punycode
-related warnings caused by thegaxios
package (overrides version ofwhatwg-url
to14.0.0
)
Internal
- Remove several unused external packages
- Replace
html-escaper
package with an internal implementation - Simplify UTF-32 decoder
Full Changelog: v2.0.0...v2.0.1
v2.0.0
New features
- CLI: audio playback now using a newly developed
Audio I/O
package, which is based on 3 separate N-API addons (Windows, macOS and Linux). Each addon targets the native OS audio interface for the platform (MME for Windows, Core Audio for macOS and ALSA for Linux). Please report any audio playback issue you encounter! For example: no sound, distorted audio, crashes or other issues. It's likely they can be fixed relatively easily - CLI: new option
--player
to set audio player. Use--player=sox
to switch back to the old player, if needed - Audio playback: when using the new player, now supports keyboard navigation with
left arrow
(1 second back),right arrow
(1 second forward) andspace bar
(pause or resume) - Denoising: new denoising engine
nsnet2
based on NSNet2 Noise Suppression models
Enhancements
- DTW: now allows to set a relative window duration (calculated based on the total audio duration) by passing a percentage value like
dtw.windowDuration=15%
. On the API, this value should be passed as a string like'15%'
- DTW: improve log messages
- Synthesis: show total processing time
- PCM format conversions: simplify operations for more efficient runtime
onnxruntime-node
: updated to version20.0.0
speex-resampler-wasm
: updated to latest code and enable SIMD in Emscripten build. Improves sample rate conversion speedrubberband-wasm
: updated Rubberband to version4.0.0
Fixes
- Enable reading and writing files larger than 2 GiB
- Workaround Node.js issue with dealing with
Buffer
objects larger than 4 GiB, by abandoning buffers and usingUint8Array
s instead - Fix minor issue with 24-bit sample conversion
- Espeak: don't set voice if it is the same as the last voice set
- CLI: better parsing union typed options
- Fix some missing log messages
Behavioral changes:
- Noise reduction (
rnnoise
andnsnet
): resample denoised audio back to original sample rate - Source separation (
mdx-net
): resample output audio back to original sample rate - Denoising: use quality
0
when converting to the processing sample rate - API: now returns
Uint8Array
s for audio buffers, instead of Node.jsBuffer
objects
Internal
- Removed all internal usage of Node.js buffers, streams, and string processing methods, and replaced with portable JavaScript types and APIs like
Uint8Array
andTextEncoder
/TextDecoder
- Separated all file system operations to a dedicated module
- All WASM packages now updated and recompiled to use ESM modules
Full Changelog: v1.8.7...v2.0.0
v1.8.7
Fixes
- Use workaround to avoid Node.js bug with
Buffer.concat
producing invalid outputs when the concatenated size is equal or greater than 4 GiB. Fixes issues with long audio inputs (about 3.5 hours or more for 44.1 kHz stereo) being corrupted and causing failures - Use Node.js
fs.writeFile
instead ofgracefulFS.writeFile
, to avoid gracefulFS throwing an error when trying to write files with size equal or greater than2 GiB
.
Full Changelog: v1.8.6...v1.8.7
v1.8.6
Fixes
espeak
: ensure words are trimmed before converting them to fragments. Resolves issue with subtitle conversion failing to find a word in the text due to its surrounding whitespace
Full Changelog: v1.8.5...v1.8.6
v1.8.5
Fixes
- Update to newer build of
escript-ng-emscripten
, with some removed code inespeakng_glue.cpp
to prevent potential memory leaks. New build also hasALLOW_MEMORY_GROWTH
option enabled, to ensure various kinds of out-of-memory errors are less likely to happen espeak
: don't set voice if it is the same as the last set voice. Avoidespeak-ng
internal memory leak / fill issue with new voice structures being allocated but not correctly released
Full Changelog: v1.8.4...v1.8.5
v1.8.4
Enhancements
- Performance improvements to Whisper's internal token alignment
- Performance improvements to
dtw
alignment - Whisper: set default timestamp accuracy to
high
for thetiny
andbase
models, andmedium
for the larger models
Full Changelog: v1.8.3...v1.8.4
v1.8.3
Fixes
- Always use reduced attention head subset for the
large-v3-turbo
model (using all attention heads with this model doesn't seem to work at all) - Change default timestamp accuracy for whisper alignment to
high
(reasoning: whisper alignment works well with the defaulttiny
andtiny.en
models, and for those models, using all attention heads for token alignment isn't that expensive, so the extra computation is worth it, given the increase in accuracy)
Full Changelog: v1.8.2...v1.8.3
v1.8.2
Features
whisper
: new optiontimestampAccuracy
with possible valuesmedium
orhigh
.medium
uses a reduced subset of attention heads for alignment, which makes it fast to compute.high
uses all attention heads for alignment, and is thus more accurate at the word level, but slower for larger models. Defaults tomedium
whisper.cpp
: new optionstemperature
,temperatureIncrement
,enableFlashAttention
. Using flash attention can significantly improve performance in some cases. Note: enabling flash attention will automatically disable theenableDTW
option since the two don't seem to work together
Fixes
whisper.cpp
: derive correct model name forlarge-v3-turbo
whisper
andwhisper.cpp
: error when model is set tolarge-v3-turbo
and a translation task is requested (large-v3-turbo
doesn't support translation tasks)
Full Changelog: v1.8.1...v1.8.2