-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Sentence Splitting with API #436
Comments
With which underlying TTS engine are you experiencing the problems with? (F5-TTS, PIPER, XTTS etc). Someone reported a similar issue #410 thanks |
I was using a Wav with a finetuned XTTS Model. IT didn't really seem to matter how much text I sent over it was doing it no matter what I was sending or receiving. I think it happened with the base XTTS model as well but Id have to double check that. |
@GoudaCouda Ive just checked this by sending this block of text to the OpenAI endpoint
You are welcome to test this yourself. Inside the Your issue is not AllTalk or the OpenAI endpoint. If you send over the example above block of text, it will go over to the OpenAI endpoint in 1x singe TTS generation request. If you are using some software, in your case
If you send that as 1x TTS generation request, it will take, lets say 5 seconds and you get 1x audio file back to play. If however you pre-split it as you are doing it gets sent like this:
|
Sorry, hit send before finishing that.
So by pre-splitting it, you are breaking it into multiple TTS requests and multiple playback requests, rather than one flowing request. This is not a Fault of the OpenAI endpoint or the TTS generation, but more how you are sending over the request. I either suggest you send it all as one, OR you can ask the people at "Open Web UI" to look into a difference cache management behaviour with how Open Web UI handles sending/generating multiple requests and buffering up multiple requests in their software. Hope that helps Thanks |
Just to be 100% clear, AllTalk has no concept of what the software making a TTS generation request is doing. It just generates the TTS it is requested to generate and sends it back. So if you send multiple TTS generation requests, AllTalk will generate multiple wav files and send them back as quickly as it can. AllTalk doesnt know your software sliced up a paragraph, it just sends the generated audio back for the TTS request that was made, which in your case, with pre-splitting your sentences, means Open Web UI is sending multiple TTS generation requests and getting back multiple TTS generation requests. |
There seems to be an issue with the API when using sentence splitting. Open Web UI has an option where it will split the sentences and send them by either punctuation or paragraphs. When this is active the all talk API sends messages out very slowly. It doesnt seem like it can take multiple strings at a time. It reads the first split and then waits and then I hear the next in about 5 seconds.
This does not seem to be an issue with Deep Seed as I tested with and without. This function also tends to cause the Audio to glitch out i just hear glitchy sounds
The text was updated successfully, but these errors were encountered: