Releases · remsky/Kokoro-FastAPI

13 Feb 09:49

github-actions

v0.2.2

cfae7db

v0.2.2 Latest

Latest

Fixes

speak not engaging reliably on the CPU image as a fallback
audio quality bumped up by adjusting compression settings, bug with webui format selection
advanced normalization settings added @fireblade2534

What's Changed

Add Helm chart by @zucher in #157 #162
fixed a bunch of stuff by @fireblade2534 in #152
added settings based override of default lang_code by @Krurst in #155
docs update @eltociear in #156

New Contributors

@zucher made their first contribution in #157
@Krurst made their first contribution in #155
@eltociear made their first contribution in #156

Full Changelog: v0.2.1...v0.2.2

Contributors

zucher, eltociear, and 2 other contributors

Assets 2

10 Feb 05:59

github-actions

v0.2.1

cc4d5ac

v0.2.1

What's Changed

adjustment to improve compatibility with espeak-loader dependency on misaki #127
added v1/models dummy endpoint for compatibility #144
fixed issue with duplicates captions, swapping to a stream on audio + tempfile download at completion for caption files #139
fixed some problems in the build system and model download system by @fireblade2534 in #131

Full Changelog: v0.2.0...v0.2.1

Contributors

fireblade2534

Assets 2

07 Feb 11:23

remsky

v0.2.0

bfdb5c0

v0.2.0

Complete Model Overhaul:
- Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
- Integration with hexgrad/kokoro and hexgrad/misaki packages
- Pre-installed all multi-language support from Misaki:
  - English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
  - Note: This will likely controlled via env variable in upcoming versions
- All voice packs included for supported languages, along with the original versions
Enhanced Audio Generation Features:
- Per-word timestamped caption generation
- Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
Web UI Improvements:
- Weighted voice mixing
- Text file upload support
- Improved text editor, user interface changes

What's Changed

Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
Bumping PyTorch version to 2.6.0, CUDA 12.4
Adjustments to Docker workflows + Incorporating Docker Bake

Contributors

Full Changelog: v0.1.4...v0.2.0

Contributors

JoshRosen, eschmidbauer, and 5 other contributors

Assets 2

31 Jan 09:06

github-actions

v0.1.4

8156b29

v0.1.4

Changes to simplify streaming/async inference pathways still somewhat in progress.
WebUI added as a lighter-weight alternative to the Gradio UI
More of the configuration variables are exposed, temporary file management settings
Added new debug endpoints for system and storage information (threads, sessions, etc)
Significant restructuring towards concurrency, decoupling inference workflows, more flexibility

What's Changed

Update README.md with new local endpoint usage example by @jteijema in #50
Update UI access with environment URL and PORT by @jteijema in #51
Fixed python tests by @fireblade2534 in #69
Try to add AAC audio format w/ updated test by @richardr1126 in #74
Fixed thread leak because of excessive E-speak backends by @fireblade2534 in #87
Fix truncated playback issue in streaming WAV responses by @JoshRosen in #94
Fixes auto downloading models by @fireblade2534 in #99
V0.1.4 by @remsky in #102
V0.1.4 - CI updates by @remsky in #104

New Contributors

@jteijema made their first contribution in #50
@richardr1126 made their first contribution in #74
@JoshRosen made their first contribution in #94

Full Changelog: v0.1.0...v0.1.4

Contributors

JoshRosen, remsky, and 3 other contributors

Assets 4

14 Jan 15:25

github-actions

v0.1.0

880fa7a

v0.1.0

What's Changed

Potentially Breaking Changes
- Swapped to uv dependency management from pip
- Baked model files and voicepacks directly into gpu + cpu images
- latest-slim tags could use some community testing, but will be optimizing and checking on deployability
- Location of dockerfiles + docker compose has been moved into the docker directory. Be sure to check the paths when launching

UI Changes:
- Multi-select and merging of voices has been enabled.
- An environment flag was set to disable local saving/filepath operations. By default it should still be saving locally
- Made the waveform a dynamic blue color
API Changes
- Simplified audio normalization, more stable (likely won't notice a difference as the end user)
- Streaming now respects broken connections, will stop processing on the next chunk
- Minor/Moderate GPU memory handling cleanup and safeties added (clearing intermediate tensors, adding pressure warning, etc)
CI/CD live on Github Actions
- Pytest will run through all API tests on any pull requests now. You can modify them to align with new functionality, and add as needed but try not to lose any coverage, makes my life a bit easier
- Pytorch mocks mostly removed, run on CPU version for automated testing.

This has been a great model to work with. Looking forward to when the new 0.24 version is released by https://huggingface.co/hexgrad/Kokoro-82M.

Be sure to check their page out out for updates on model development, and keep in mind they're always looking for more data

New Contributors

@Galunid made their first contribution in #32

Full Changelog: v0.0.5...v0.1.0

Contributors

Galunid

Assets 20

13 Jan 06:46

github-actions

v0.0.5post1

1e45a31

v0.0.5post1

What's Changed

fix: Add missing healthcheck dependency (curl) by @Galunid in #32
Minor docker tagging and configuration changes
Gradio & gpu memory management bug fix
Bonus Voice Pack attached af_irulan: drag and drop into your api/voices folder

New Contributors

@Galunid made their first contribution in #32

Full Changelog: v0.0.5...v0.0.5post1

Contributors

Galunid

Assets 3

12 Jan 13:47

github-actions

v0.1.0-pre

d2522bc

v0.1.0-pre Pre-release

Pre-release

Initial swap of dependency management to uv to simplify testing and deployments
Dropping model-fetcher container & baking models directly into docker images
Standardizing tagging to allow for consistent usage of latest tag across architectures
Minor structural changes towards accommodating incoming custom Voice Mixer module

Full Changelog: v0.0.5...v0.1.0-pre

Assets 2

11 Jan 05:20

github-actions

v0.0.5

22c52fd

v0.0.5

Stabilized issues with images tagging and structures from v0.0.4
Added automatic master to develop branch synchronization
Improved release tagging and structures
Initial CI/CD setup

Full Changelog: v0.0.4...v0.0.5

Assets 2

09 Jan 20:24

github-actions

v0.0.4

f6e3afa

v0.0.4

What's Changed

Update README.md by @fireblade2534 in #14
Fix url parsing for urls without https, http, or www by @fireblade2534 in #12
- Added phoneme-centric endpoints by @remsky in #17

Full Changelog: v0.0.3...v0.0.4

Contributors

remsky and fireblade2534

Assets 2

07 Jan 11:50

github-actions

v0.0.3

d7e8a5c

v0.0.3

What's Changed

Feat/streaming by @remsky in #9
Make urls readable by @fireblade2534 in #10

New Contributors

@fireblade2534 made their first contribution in #10

Full Changelog: v0.0.2...v0.0.3

Contributors

remsky and fireblade2534

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: remsky/Kokoro-FastAPI

v0.2.2

Fixes

What's Changed

New Contributors

Contributors

v0.2.1

What's Changed

Contributors

v0.2.0

Contributors

v0.1.4

What's Changed

New Contributors

Contributors

v0.1.0

What's Changed

New Contributors

Contributors

v0.0.5post1

What's Changed

New Contributors

Contributors

v0.1.0-pre

v0.0.5

v0.0.4

What's Changed

Contributors

v0.0.3

What's Changed

New Contributors

Contributors