-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Style text: make emotions and style similar to the style text by mixing bert #240
Merged
Stardust-minus
merged 21 commits into
fishaudio:dev-bert-style
from
litagin02:style-text
Dec 16, 2023
Merged
Feat: Style text: make emotions and style similar to the style text by mixing bert #240
Stardust-minus
merged 21 commits into
fishaudio:dev-bert-style
from
litagin02:style-text
Dec 16, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
feat: update fastapi.py. 添加更多错误日志信息
fix: update fastapi.py. 2.2 reference适配
* Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add files via upload * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete attentions_onnx.py * Delete models_onnx.py * Add files via upload * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update __init__.py * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Interesting,we will have a test. |
Stardust-minus
merged commit Dec 16, 2023
3eb0630
into
fishaudio:dev-bert-style
1 of 2 checks passed
Stardust-minus
added a commit
that referenced
this pull request
Dec 16, 2023
…y mixing bert (#240) (#241) * fix:(oldVersion210) Load on demand Emotion model * feat: update fastapi.py. 添加更多错误日志信息 * Switch pyopenjtalk to pyopenjtalk-prebuilt * fix: update fastapi.py. 2.2 reference适配 * Update resample.py * 修复Onnx导出的BUG (#237) * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add files via upload * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete attentions_onnx.py * Delete models_onnx.py * Add files via upload * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update __init__.py * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * Fix onnx * Format export * Feat: style-text and bert mixing (JA only) * Ensure the same tensor shape * Update * update gradio version * Fix * Style text for chinese and english (ver 2.2) * Style text for chinese and english (ver 2.1) * Style text in FastAPI * Translate style text desc in chinese --------- Co-authored-by: litagin02 <139731664+litagin02@users.noreply.github.com> Co-authored-by: Sora <654163754@qq.com> Co-authored-by: Sihan Wang <wangsihan1995@gmail.com> Co-authored-by: Ναρουσέ·μ·γιουμεμί·Χινακάννα <40709280+NaruseMioShirakana@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Stardust-minus
added a commit
that referenced
this pull request
Dec 19, 2023
* Fix inputs of duration discriminator * Add LSTM * Update models.py * Update tensorboard scalar * Noise injection for minimizing modality gap * Update infer.py * support bf16 run * del unused_para flag * support bf16 config * add grad clip * fix(logger and grad):add dur grad,fix grad clip * Update webui_preprocess.py * Fix English G2P * fix(bert_gen):add pass * Pass SDP to DD * Update webui_preprocess.py * Update config.json * Update webui.py * Update chinese_bert.py * Upload webui for deploy * Update webui.py * torch.save as pt not npy * Update config.json * add freeze emo vq * Update webui_preprocess.py * Fix tone_sandhi.py * Comment up grad clip * Fix in-place addition * Add SLM discriminator * Add DDP for WD * Feat: Style text: make emotions and style similar to the style text by mixing bert (#240) (#241) * fix:(oldVersion210) Load on demand Emotion model * feat: update fastapi.py. 添加更多错误日志信息 * Switch pyopenjtalk to pyopenjtalk-prebuilt * fix: update fastapi.py. 2.2 reference适配 * Update resample.py * 修复Onnx导出的BUG (#237) * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add files via upload * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete attentions_onnx.py * Delete models_onnx.py * Add files via upload * Add files via upload * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update __init__.py * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * Fix onnx * Format export * Feat: style-text and bert mixing (JA only) * Ensure the same tensor shape * Update * update gradio version * Fix * Style text for chinese and english (ver 2.2) * Style text for chinese and english (ver 2.1) * Style text in FastAPI * Translate style text desc in chinese --------- Co-authored-by: litagin02 <139731664+litagin02@users.noreply.github.com> Co-authored-by: Sora <654163754@qq.com> Co-authored-by: Sihan Wang <wangsihan1995@gmail.com> Co-authored-by: Ναρουσέ·μ·γιουμεμί·Χινακάννα <40709280+NaruseMioShirakana@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove CLAP * Revert "Remove CLAP" This reverts commit 62fd59b. Revert * Remove CLAP * bf16 audo grad cilp * Update webui and infer utils * Update webui.py * Update webui.py * Update webui-preprocess.py * Update webui_preprocess.py --------- Co-authored-by: Sihan Wang <wangsihan1995@gmail.com> Co-authored-by: OedoSoldier <31711261+OedoSoldier@users.noreply.github.com> Co-authored-by: litagin02 <139731664+litagin02@users.noreply.github.com> Co-authored-by: Sora <654163754@qq.com> Co-authored-by: Ναρουσέ·μ·γιουμεμί·Χινακάννα <40709280+NaruseMioShirakana@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds new emotion-style reference, "style text".
This mixes the bert of original text and the mean of the bert of the style text, and the resulting voice sounds like a voice of the original text but with the nuance, emotion, tone of the reading of style text.
For example, original text "This is a test of style text." with style text "Oh my god, I'm very sad, disappointed..." yields the reading of the original text with very sad emotion.
I'm Japanese so I mainly test with my own Japanese models, and this emotion reference seems to work well (while the text prompt and audio prompt is not so effective for models which are finetuned over the pretrained 2.2 models).
Also I tested with Chinese and English using the pretrained 2.2 model, and it looks good.
I add this feature for 2.1 and 2.2 models, and don't touch older models (when using older models it just simply ignore style text).
(By the result of my experiments, it seems that bert has lots of information of emotion and reading style.)