Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update klite.embd #719

Closed
wants to merge 4 commits into from
Closed

Update klite.embd #719

wants to merge 4 commits into from

Conversation

illtellyoulater
Copy link

Added AllTalk support, a more recent and better performing XTTSv2 implementation supporting streaming mode, narration mode, deep-speed, multiple inference endpoints and many more features - https://github.com/erew123/alltalk_tts

Implemented new code for:

  • retrieving available voices using AllTalk's available voices API endpoint: "/api/voices"
  • sending TTS generation requests using AllTalk's TTS generation endpoint: "/api/tts-generate"
  • new default settings for: xtts default base URL (using AllTalk base URL by default), xttp default voice language, xttp default setting for streaming mode

Compatibility with legacy XTTS mode (and legacy XTTS code) has been kept throughout all changed parts. The only missing part in order to enable full support for both XTTS implementations, is in the request payload sending block, where a simple condition should be added to to select the correct payload based on the selected XTTS implementation (but we currently lack UI setting for that as well, so...)

Other than that, the only other changes this commit implements consist in a couple of variable renaming for consistency and some minor CSS typo fixes.

Added AllTalk support, a more recent and better performing XTTSv2 implementation supporting streaming mode, narration mode, deep-speed, multiple inference endpoints and many more features - https://github.com/erew123/alltalk_tts

Implemented new code for:
- retrieving available voices using AllTalk's available voices API endpoint: "/api/voices" 
- sending TTS generation requests using AllTalk's TTS generation endpoint: "/api/tts-generate"
- new default settings for: xtts default base URL (using AllTalk base URL by default), xttp default voice language, xttp default setting for streaming mode

Compatibility with legacy XTTS mode (and legacy XTTS code) has been kept throughout all changed parts. The only missing   part in order to enable full support for both XTTS implementations, is in the request payload sending block, where a simple condition should be added to to select the correct payload based on the selected XTTS implementation (but we currently lack UI setting for that as well, so...)

Other than that, the only other changes this commit implements consist in a couple of variable renaming for consistency and some minor CSS typo fixes.
@LostRuins
Copy link
Owner

Hi, thanks for this PR. But are all the fields necessary? I feel like it's almost like an entirely different endpoint rather than an XTTS drop in, especially if its not even expecting a JSON payload? Let's follow up at erew123/alltalk_tts#88

use `autoplay=false` in AllTalk TTS request payload code, otherwise the generated audio will be played by AllTalk sever-side, instead of being sent back to the browser
Copy link
Author

@illtellyoulater illtellyoulater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the autoplay parameter to false in the AllTalk TTS generation request.

With this change the audio is now sent back to the browser and played by it, rather than being played server-side by the XTTS endpoint.

the filename supplied for the generated TTS audio must *not* include an extension, dashes, etc.
Copy link
Author

@illtellyoulater illtellyoulater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing the default audio filename used in the TTS request (it must not include an extension, dashes, etc.)

fixed `streaming` (needs to be set to `true` for a non-streaming request to work... @daswer123, why?)
Copy link
Author

@illtellyoulater illtellyoulater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed streaming (needs to be set to true for a non-streaming request to work... @daswer123, why?)

@daswer123
Copy link

Hi, I am the author of the project https://github.com/daswer123/xtts-api-server , as I understand you got mistaken and meant @erew123? :)

@erew123
Copy link

erew123 commented Mar 3, 2024

Hi @daswer123 Hi Danil thanks for the heads up! Hope you are keeping well! :)

@illtellyoulater Is this a question for me? Im suspecting it may more be a question for LostRuins. Ive only had a quick glance at the code youve written/changed but I suspect that "streaming" in this case may be something to do with how Kobold is handing the audio return to play in the webpage.

@LostRuins
Copy link
Owner

I am refactoring this PR, I will probably split the implementation between AllTalk and XTTS as they are too different, rather than try to fit both APIs together. The user will pick which one they wish to use.

@LostRuins
Copy link
Owner

Hi @illtellyoulater, I have added tentative AllTalk support as a separate endpoint based on this PR.

As I didn't manage to actually get AllTalk running on colab, this has not been properly tested. It's now added as a separate option from the XTTS-API-Server which functions the same as before.

Could you select the AllTalk option in https://lite.koboldai.net to do a quick test, and see if it works fine for you? Thanks!

image

btw, in future, all kobold lite development happens at the lite repo at https://github.com/LostRuins/lite.koboldai.net so PRs should be directed there.

cc: @erew123

@LostRuins
Copy link
Owner

AllTalk implementation is in, please test

@LostRuins LostRuins closed this Mar 4, 2024
@erew123
Copy link

erew123 commented Mar 7, 2024

Hi @illtellyoulater

I'm back from travelling (for now) did you manage to test this @illtellyoulater or is there anything my help is needed on this?

Thanks

@LostRuins
Copy link
Owner

LostRuins commented Mar 7, 2024

I did not manage to test this, however, it is merged based on the fields in the PR. If someone could test it, would be good.

If I can get it running on colab i'd be happy to test it.

@erew123
Copy link

erew123 commented Mar 7, 2024

Hi @LostRuins, hope you are well!

I've downloaded a local copy of Kobold and given it a test both in the settings page and the main chat interface, both seem to be working.

Just for reference the message highlighted in grey (as below) is fine. That's normal when using the streaming method and adjusting the model to streaming only.

image

So as a base configuration it seems absolutely fine for the streaming mode.

If somewhere down the line people wanted to use AllTalk's narrator function, thats a different API call and a few extra things to check (depending on what Kobold may filter from the text it sends over).

All in though, it works and cant see an issue!

Ill have another shot at getting the colab working at some point!

All the best to you both @LostRuins @illtellyoulater

Thanks

@LostRuins
Copy link
Owner

Oh that is awesome! Glad to know the integration worked perfectly :) thanks for testing

@LostRuins
Copy link
Owner

Does the audio file get decoded and played correctly?

@erew123
Copy link

erew123 commented Mar 7, 2024

As its streaming its not actually generating a wav file as such, just an audio wav file blob that's its firing over as quickly as possible.

Taking a quick look at how its interacting it seems that Kobold is playing that back within the browser and handling the playback perfectly. It seems good to me! :)

@illtellyoulater
Copy link
Author

Got caught up with something else sorry, but I'm glad to see the progresses! Will give it a try as soon as possible! 👍 Good job guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants