Releases: remsky/Kokoro-FastAPI
v0.2.2
Fixes
- speak not engaging reliably on the CPU image as a fallback
- audio quality bumped up by adjusting compression settings, bug with webui format selection
- advanced normalization settings added @fireblade2534
What's Changed
- Add Helm chart by @zucher in #157 #162
- fixed a bunch of stuff by @fireblade2534 in #152
- added settings based override of default lang_code by @Krurst in #155
- docs update @eltociear in #156
New Contributors
- @zucher made their first contribution in #157
- @Krurst made their first contribution in #155
- @eltociear made their first contribution in #156
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- adjustment to improve compatibility with espeak-loader dependency on misaki #127
- added v1/models dummy endpoint for compatibility #144
- fixed issue with duplicates captions, swapping to a stream on audio + tempfile download at completion for caption files #139
- fixed some problems in the build system and model download system by @fireblade2534 in #131
Full Changelog: v0.2.0...v0.2.1
v0.2.0
- Complete Model Overhaul:
- Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
- Integration with hexgrad/kokoro and hexgrad/misaki packages
- Pre-installed all multi-language support from Misaki:
- English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
- Note: This will likely controlled via env variable in upcoming versions
- All voice packs included for supported languages, along with the original versions
- Enhanced Audio Generation Features:
- Per-word timestamped caption generation
- Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
- Web UI Improvements:
- Weighted voice mixing
- Text file upload support
- Improved text editor, user interface changes
What's Changed
- Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
- Bumping PyTorch version to 2.6.0, CUDA 12.4
- Adjustments to Docker workflows + Incorporating Docker Bake
Contributors
Full Changelog: v0.1.4...v0.2.0
v0.1.4
- Changes to simplify streaming/async inference pathways still somewhat in progress.
- WebUI added as a lighter-weight alternative to the Gradio UI
- More of the configuration variables are exposed, temporary file management settings
- Added new debug endpoints for system and storage information (threads, sessions, etc)
- Significant restructuring towards concurrency, decoupling inference workflows, more flexibility
What's Changed
- Update README.md with new local endpoint usage example by @jteijema in #50
- Update UI access with environment URL and PORT by @jteijema in #51
- Fixed python tests by @fireblade2534 in #69
- Try to add AAC audio format w/ updated test by @richardr1126 in #74
- Fixed thread leak because of excessive E-speak backends by @fireblade2534 in #87
- Fix truncated playback issue in streaming WAV responses by @JoshRosen in #94
- Fixes auto downloading models by @fireblade2534 in #99
- V0.1.4 by @remsky in #102
- V0.1.4 - CI updates by @remsky in #104
New Contributors
- @jteijema made their first contribution in #50
- @richardr1126 made their first contribution in #74
- @JoshRosen made their first contribution in #94
Full Changelog: v0.1.0...v0.1.4
v0.1.0
What's Changed
- Potentially Breaking Changes
- Swapped to
uv
dependency management from pip - Baked model files and voicepacks directly into gpu + cpu images
latest-slim
tags could use some community testing, but will be optimizing and checking on deployability- Location of dockerfiles + docker compose has been moved into the
docker
directory. Be sure to check the paths when launching
- Swapped to
-
UI Changes:
- Multi-select and merging of voices has been enabled.
- An environment flag was set to disable local saving/filepath operations. By default it should still be saving locally
- Made the waveform a dynamic blue color
-
API Changes
- Simplified audio normalization, more stable (likely won't notice a difference as the end user)
- Streaming now respects broken connections, will stop processing on the next chunk
- Minor/Moderate GPU memory handling cleanup and safeties added (clearing intermediate tensors, adding pressure warning, etc)
-
CI/CD live on Github Actions
- Pytest will run through all API tests on any pull requests now. You can modify them to align with new functionality, and add as needed but try not to lose any coverage, makes my life a bit easier
- Pytorch mocks mostly removed, run on CPU version for automated testing.
This has been a great model to work with. Looking forward to when the new 0.24 version is released by https://huggingface.co/hexgrad/Kokoro-82M.
Be sure to check their page out out for updates on model development, and keep in mind they're always looking for more data
New Contributors
Full Changelog: v0.0.5...v0.1.0
v0.0.5post1
What's Changed
-
fix: Add missing healthcheck dependency (curl) by @Galunid in #32
-
Minor docker tagging and configuration changes
-
Gradio & gpu memory management bug fix
-
Bonus Voice Pack attached
af_irulan
: drag and drop into your api/voices folder
New Contributors
Full Changelog: v0.0.5...v0.0.5post1
v0.1.0-pre
- Initial swap of dependency management to
uv
to simplify testing and deployments - Dropping
model-fetcher
container & baking models directly into docker images - Standardizing tagging to allow for consistent usage of
latest
tag across architectures - Minor structural changes towards accommodating incoming custom Voice Mixer module
Full Changelog: v0.0.5...v0.1.0-pre
v0.0.5
- Stabilized issues with images tagging and structures from v0.0.4
- Added automatic master to develop branch synchronization
- Improved release tagging and structures
- Initial CI/CD setup
Full Changelog: v0.0.4...v0.0.5
v0.0.4
What's Changed
- Update README.md by @fireblade2534 in #14
- Fix url parsing for urls without https, http, or www by @fireblade2534 in #12
Full Changelog: v0.0.3...v0.0.4
v0.0.3
What's Changed
- Feat/streaming by @remsky in #9
- Make urls readable by @fireblade2534 in #10
New Contributors
- @fireblade2534 made their first contribution in #10
Full Changelog: v0.0.2...v0.0.3