You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking to build on your research. I understand this isn't the scope of your project. Just curious for an interesting. Just wanted the thoughts of the creators. I want to retrain and repuprose and expertiment with this for expressive TTS instead of generic . I'm somewhat new to working with these models.
OBJECTIVES ->
retrain on a more dynamic dataset
synthetic dataset -> speech/text[w/special utterences]{real speech/lofi speech from 'BARK'}, speech w/synthetic audio envirnoments generated by 'tango'/text[I have a rather large dataset]
EXPECTATATIOS ->
most expressive hybrid TTS[TTS with semantic conditioned background environments]
QUESTIONS ->
what are your thought on approaching voice cloning with this style of architecture? I figure I should approach like inpainting?
If possible, wouldn't it clone any artifact contained in the speech audio?
CLOSING THOUGHTS ->
I'm opening to sharing my results with you guys privately. Appreciate your contribution to the community.
The text was updated successfully, but these errors were encountered:
I'm looking to build on your research. I understand this isn't the scope of your project. Just curious for an interesting. Just wanted the thoughts of the creators. I want to retrain and repuprose and expertiment with this for expressive TTS instead of generic . I'm somewhat new to working with these models.
OBJECTIVES ->
EXPECTATATIOS ->
QUESTIONS ->
CLOSING THOUGHTS ->
I'm opening to sharing my results with you guys privately. Appreciate your contribution to the community.
The text was updated successfully, but these errors were encountered: