Skip to content

Releases: remsky/Kokoro-FastAPI

v0.2.2

13 Feb 09:49
Compare
Choose a tag to compare

Fixes

  • speak not engaging reliably on the CPU image as a fallback
  • audio quality bumped up by adjusting compression settings, bug with webui format selection
  • advanced normalization settings added @fireblade2534

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.2.2

v0.2.1

10 Feb 05:59
Compare
Choose a tag to compare

What's Changed

  • adjustment to improve compatibility with espeak-loader dependency on misaki #127
  • added v1/models dummy endpoint for compatibility #144
  • fixed issue with duplicates captions, swapping to a stream on audio + tempfile download at completion for caption files #139
  • fixed some problems in the build system and model download system by @fireblade2534 in #131

Full Changelog: v0.2.0...v0.2.1

v0.2.0

07 Feb 11:23
Compare
Choose a tag to compare
  • Complete Model Overhaul:
    • Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
    • Integration with hexgrad/kokoro and hexgrad/misaki packages
    • Pre-installed all multi-language support from Misaki:
      • English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
      • Note: This will likely controlled via env variable in upcoming versions
    • All voice packs included for supported languages, along with the original versions
  • Enhanced Audio Generation Features:
    • Per-word timestamped caption generation
    • Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
  • Web UI Improvements:
    • Weighted voice mixing
    • Text file upload support
    • Improved text editor, user interface changes

What's Changed

  • Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
  • Bumping PyTorch version to 2.6.0, CUDA 12.4
  • Adjustments to Docker workflows + Incorporating Docker Bake

Contributors

Full Changelog: v0.1.4...v0.2.0

v0.1.4

31 Jan 09:06
8156b29
Compare
Choose a tag to compare
  • Changes to simplify streaming/async inference pathways still somewhat in progress.
  • WebUI added as a lighter-weight alternative to the Gradio UI
  • More of the configuration variables are exposed, temporary file management settings
  • Added new debug endpoints for system and storage information (threads, sessions, etc)
  • Significant restructuring towards concurrency, decoupling inference workflows, more flexibility

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.1.4

v0.1.0

14 Jan 15:25
Compare
Choose a tag to compare

What's Changed


  • Potentially Breaking Changes
    • Swapped to uv dependency management from pip
    • Baked model files and voicepacks directly into gpu + cpu images
    • latest-slim tags could use some community testing, but will be optimizing and checking on deployability
    • Location of dockerfiles + docker compose has been moved into the docker directory. Be sure to check the paths when launching

  • UI Changes:

    • Multi-select and merging of voices has been enabled.
    • An environment flag was set to disable local saving/filepath operations. By default it should still be saving locally
    • Made the waveform a dynamic blue color
  • API Changes

    • Simplified audio normalization, more stable (likely won't notice a difference as the end user)
    • Streaming now respects broken connections, will stop processing on the next chunk
    • Minor/Moderate GPU memory handling cleanup and safeties added (clearing intermediate tensors, adding pressure warning, etc)
  • CI/CD live on Github Actions

    • Pytest will run through all API tests on any pull requests now. You can modify them to align with new functionality, and add as needed but try not to lose any coverage, makes my life a bit easier
    • Pytorch mocks mostly removed, run on CPU version for automated testing.

This has been a great model to work with. Looking forward to when the new 0.24 version is released by https://huggingface.co/hexgrad/Kokoro-82M.

Be sure to check their page out out for updates on model development, and keep in mind they're always looking for more data


New Contributors

Full Changelog: v0.0.5...v0.1.0

v0.0.5post1

13 Jan 06:46
Compare
Choose a tag to compare

What's Changed

  • fix: Add missing healthcheck dependency (curl) by @Galunid in #32

  • Minor docker tagging and configuration changes

  • Gradio & gpu memory management bug fix

  • Bonus Voice Pack attached af_irulan: drag and drop into your api/voices folder

New Contributors

Full Changelog: v0.0.5...v0.0.5post1

v0.1.0-pre

12 Jan 13:47
Compare
Choose a tag to compare
v0.1.0-pre Pre-release
Pre-release
  • Initial swap of dependency management to uv to simplify testing and deployments
  • Dropping model-fetcher container & baking models directly into docker images
  • Standardizing tagging to allow for consistent usage of latest tag across architectures
  • Minor structural changes towards accommodating incoming custom Voice Mixer module

Full Changelog: v0.0.5...v0.1.0-pre

v0.0.5

11 Jan 05:20
Compare
Choose a tag to compare
  • Stabilized issues with images tagging and structures from v0.0.4
  • Added automatic master to develop branch synchronization
  • Improved release tagging and structures
  • Initial CI/CD setup

Full Changelog: v0.0.4...v0.0.5

v0.0.4

09 Jan 20:24
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.0.3...v0.0.4

v0.0.3

07 Jan 11:50
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.2...v0.0.3