Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerized servings.py #17

Merged
merged 19 commits into from
Feb 22, 2024
Merged

Containerized servings.py #17

merged 19 commits into from
Feb 22, 2024

Conversation

l4b4r4b4b4
Copy link
Contributor

@l4b4r4b4b4 l4b4r4b4b4 commented Feb 8, 2024

Added Dockerfile & docker-compose.yml for containerized deployment.

Had to make a slight adjustment to servings.py changing host ip from 127.0.0.1 to 0.0.0.0.

Hope this helps others with quick and convenient deployment and testing.

Addresses #48

@vatsalaggarwal
Copy link
Member

vatsalaggarwal commented Feb 8, 2024

Thanks for this, we'll have a look in a bit...

noticed that you were trying to add references... quick note on them:
i) we don't support cross-lingual cloning yet, so I think the german reference is unlikely to work
ii) we don't support references shorter than 30 seconds, and stuff that is unclean/has background noise etc is very unlikely to work!

@l4b4r4b4b4
Copy link
Contributor Author

Thanks for this, we'll have a look in a bit...

noticed that you were trying to add references... quick note on them: i) we don't support cross-lingual cloning yet, so I think the german reference is unlikely to work ii) we don't support references shorter than 30 seconds, and stuff is unclean/has background noise etc is very unlikely to work!

The references were just for testing, what comes out, if I through in a German reference. Outcome: Funny 😉

Looking forward to actual fine-tuning capabilities. Are those LoRAs?

@vatsalaggarwal
Copy link
Member

vatsalaggarwal commented Feb 8, 2024

The references were just for testing, what comes out, if I through in a German reference. Outcome: Funny 😉

lol expected

Looking forward to actual fine-tuning capabilities. Are those LoRAs?
I am not sure if finetuning will be able to make that 20s sample with background noise work :P ... have you tried others that were longer than 30s and were clean that you're hoping to finetune on but couldn't get zero-shot to work?

LoRAs - not sure yet... we're open to folks adding that, and I can provide some guidance!

@l4b4r4b4b4
Copy link
Contributor Author

IF it was transformers I would be able to help quickly with LoRA PEFT. But as I see you have your very custom GPT implementation.

@vatsalaggarwal
Copy link
Member

vatsalaggarwal commented Feb 9, 2024

I am happy to lead through those rough edges if it helps, otherwise we'll have to wait till we are able to do this! :(

Copy link
Member

@sidroopdaska sidroopdaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really appreciate you putting this together. a few changes requested

Dockerfile Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
fam/llm/serving.py Outdated Show resolved Hide resolved
fam/llm/serving.py Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
@coder543
Copy link

coder543 commented Feb 15, 2024

I do not understand why this API is using a very non-standard "X-Payload" header to contain the request body, and despite the link to /docs, there is no real documentation there. The FastAPI implementation does not explicitly say that TTSRequest is the expected request type, likely because pulling it out of a header isn't a common approach, so the generated docs assume there are no parameters at all.

For anyone else trying to figure this out, here is a curl command that serves as a basic example with the way the API is currently structured:

curl -X POST http://localhost:8869/tts \
     -H "X-Payload: {\"text\": \"Hello, this is a test!\", \"guidance\": 3.0, \"top_p\": 0.95, \"speaker_ref_path\": \"assets/av
a.flac\"}" \
     -o output.mp3

EDIT:

I have also discovered this:

    # NOTE: supports max. 220 characters atm.
    # Long form synthesis coming soon...
    MAX_CHARS = 220

so... that's fun. It's currently hardcoded to only support up to 220 characters of input. I can't see where the 220 character limit is actually being enforced, but I can't think of any use case where 220 characters is reliably large enough to use.

I am excited to see more options in the open source TTS space, so hopefully that limitation is lifted soon.

@l4b4r4b4b4
Copy link
Contributor Author

I do not understand why this API is using a very non-standard "X-Payload" header to contain the request body, and despite the link to /docs, there is no real documentation there. The FastAPI implementation does not explicitly say that TTSRequest is the expected request type, likely because pulling it out of a header isn't a common approach, so the generated docs assume there are no parameters at all.

For anyone else trying to figure this out, here is a curl command that serves as a basic example with the way the API is currently structured:

curl -X POST http://localhost:8869/tts \
     -H "X-Payload: {\"text\": \"Hello, this is a test!\", \"guidance\": 3.0, \"top_p\": 0.95, \"speaker_ref_path\": \"assets/av
a.flac\"}" \
     -o output.mp3

EDIT:

I have also discovered this:

    # NOTE: supports max. 220 characters atm.
    # Long form synthesis coming soon...
    MAX_CHARS = 220

so... that's fun. It's currently hardcoded to only support up to 220 characters of input. I can't see where the 220 character limit is actually being enforced, but I can't think of any use case where 220 characters is reliably large enough to use.

I am excited to see more options in the open source TTS space, so hopefully that limitation is lifted soon.

Yeah was wondering about X-Payload as well and thought of changing it to but just kept it as is in the end.

For the length, I think the current implementation expects you to basically feed the API a text in chunks, with a max char length of 220...

Dockerfile Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
docker-compose.yml Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
@sidroopdaska
Copy link
Member

sidroopdaska commented Feb 21, 2024

I am excited to see more options in the open source TTS space, so hopefully that limitation is lifted soo

@coder543, we will lift this limit soon. Synthesising arbitrary lengths of text is on our roadmap after we ship optimisations to reduce inference latency & fine-tuning support. What are you building with TTS today?

@sidroopdaska
Copy link
Member

I do not understand why this API is using a very non-standard "X-Payload" header to contain the request body

Will fix this shortly

@coder543
Copy link

@sidroopdaska With a good enough synthesized voice, I would enjoy being able to paste in an article and have it read it to me sometimes. So, I was just playing around with that kind of thing.

docker-compose.yml Outdated Show resolved Hide resolved
@sidroopdaska sidroopdaska merged commit 33cd288 into metavoiceio:main Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants