Skip to content

Commit

Permalink
Update 1.2 sft checkpoints, inference.ipynb and READMEs
Browse files Browse the repository at this point in the history
  • Loading branch information
leng-yue committed Jul 18, 2024
1 parent cee143d commit 1d942c8
Show file tree
Hide file tree
Showing 20 changed files with 185 additions and 133 deletions.
4 changes: 2 additions & 2 deletions API_FLAGS.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# --infer
# --api
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/fish-speech-1.2" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--decoder-config-name firefly_gan_vq
42 changes: 14 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
# Fish Speech

# Warning: We are updating code to fish-speech 1.2, the last stable branch is [1.1.2](https://github.com/fishaudio/fish-speech/tree/v1.1.2)

<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/lengyue233/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/lengyue233/fish-speech?style=flat-square&logo=docker"/>
</a>
Expand All @@ -17,42 +12,33 @@
</a>
</div>

This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.
[Chinese README](README.zh.md)

此代码库及模型根据 CC-BY-NC-SA-4.0 许可证发布。请参阅 [LICENSE](LICENSE) 了解更多细节.
This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.

## Disclaimer / 免责声明
## Disclaimer

We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规.
We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.

## Online Demo

[Fish Audio](https://fish.audio)

## Quick Start
## Quick Start for Local Inference

[inference.ipynb](https://nbviewer.org/github/AnyaCoder/fish-speech/blob/main/inference.ipynb)
[inference.ipynb](/inference.ipynb)

## Videos

#### Demo Video: https://www.bilibili.com/video/BV1wz421B71D
#### V1.2 Demo Video: https://www.bilibili.com/video/BV1wz421B71D

#### Tech slides Video: https://www.bilibili.com/video/BV1zJ4m1K7cj

## Documents / 文档
## Documents

- [English](https://speech.fish.audio/en/)
- [中文](https://speech.fish.audio/)
- [日本語](https://speech.fish.audio/ja/)

## Samples / 例子

- [English](https://speech.fish.audio/en/samples/)
- [中文](https://speech.fish.audio/samples/)
- [日本語](https://speech.fish.audio/ja/samples/)

## Credits / 鸣谢
## Credits

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
Expand All @@ -61,19 +47,19 @@ We do not hold any responsibility for any illegal usage of the codebase. Please
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

## Sponsor / 赞助
## Sponsor

<div>
<a href="https://6block.com/">
<img src="https://avatars.githubusercontent.com/u/60573493" width="100" height="100" alt="6Block Avatar"/>
</a>
<br>
<a href="https://6block.com/">数据处理服务器由 6Block 提供 (Data Processing sponsor by 6Block)</a>
<a href="https://6block.com/">Data Processing sponsor by 6Block</a>
</div>
<div>
<a href="http://fs.firefly.matce.cn/">
<img src="https://dice-forum.s3.ap-northeast-1.amazonaws.com/2024-05-10/1715299538-382065-04170e083d92c5e0eeff534d6e7704ee.jpg" width="158" height="80" alt="6Block Avatar"/>
<a href="https://www.lepton.ai/">
<img src="https://www.lepton.ai/favicons/apple-touch-icon.png" width="100" height="100" alt="Lepton Avatar"/>
</a>
<br>
<a href="http://fs.firefly.matce.cn/">在线推理Demo服务器由淮北艾阿网络科技有限公司提供 (Online inference sponsor)</a>
<a href="https://www.lepton.ai/">Fish Audio is served on Lepton.AI</a>
</div>
74 changes: 74 additions & 0 deletions README.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Fish Speech

<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/lengyue233/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/lengyue233/fish-speech?style=flat-square&logo=docker"/>
</a>
<a target="_blank" href="https://github.com/fishaudio/fish-speech/actions/workflows/build-windows-package.yml">
<img alt="Action" src="https://img.shields.io/github/actions/workflow/status/fishaudio/fish-speech/build-windows-package.yml?style=flat-square&label=Build%20Windows%20Package&logo=github"/>
</a>
</div>

此代码库及模型根据 CC-BY-NC-SA-4.0 许可证发布。请参阅 [LICENSE](LICENSE) 了解更多细节.

## 免责声明

我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规.

## 在线 DEMO

[Fish Audio](https://fish.audio)

## 快速开始本地推理

[inference.ipynb](/inference.ipynb)

## 视频

#### 1.2 介绍: https://www.bilibili.com/video/BV1wz421B71D

#### 1.1 技术介绍: https://www.bilibili.com/video/BV1zJ4m1K7cj

## 文档

- [English](https://speech.fish.audio/en/)
- [中文](https://speech.fish.audio/)
- [日本語](https://speech.fish.audio/ja/)

## 例子

- [English](https://speech.fish.audio/en/samples/)
- [中文](https://speech.fish.audio/samples/)
- [日本語](https://speech.fish.audio/ja/samples/)

## 鸣谢

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

## 赞助

<div>
<a href="https://6block.com/">
<img src="https://avatars.githubusercontent.com/u/60573493" width="100" height="100" alt="6Block Avatar"/>
</a>
<br>
<a href="https://6block.com/">数据处理服务器由 6Block 提供</a>
</div>
<div>
<a href="https://www.lepton.ai/">
<img src="https://www.lepton.ai/favicons/apple-touch-icon.png" width="100" height="100" alt="Lepton Avatar"/>
</a>
<br>
<a href="https://www.lepton.ai/">Fish Audio 在线推理与 Lepton 合作</a>
</div>
10 changes: 5 additions & 5 deletions docs/en/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ You need to convert your dataset into the above format and place it under `data`
Make sure you have downloaded the VQGAN weights. If not, run the following command:

```bash
huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2
huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft
```

You can then run the following command to extract semantic tokens:
Expand All @@ -45,7 +45,7 @@ You can then run the following command to extract semantic tokens:
python tools/vqgan/extract_vq.py data \
--num-workers 1 --batch-size 16 \
--config-name "firefly_gan_vq" \
--checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
```

!!! note
Expand Down Expand Up @@ -89,7 +89,7 @@ After the command finishes executing, you should see the `quantized-dataset-ft.p
Similarly, make sure you have downloaded the `LLAMA` weights. If not, run the following command:

```bash
huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2
huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft
```

Finally, you can start the fine-tuning by running the following command:
Expand Down Expand Up @@ -117,9 +117,9 @@ After training, you need to convert the LoRA weights to regular weights before p
```bash
python tools/llama/merge_lora.py \
--lora-config r_8_alpha_16 \
--base-weight checkpoints/fish-speech-1.2 \
--base-weight checkpoints/fish-speech-1.2-sft \
--lora-weight results/$project/checkpoints/step_000000010.ckpt \
--output checkpoints/fish-speech-1.2-yth-lora/
--output checkpoints/fish-speech-1.2-sft-yth-lora/
```
!!! note
You may also try other checkpoints. We suggest using the earliest checkpoint that meets your requirements, as they often perform better on out-of-distribution (OOD) data.
16 changes: 8 additions & 8 deletions docs/en/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Inference support command line, HTTP API and web UI.
Download the required `vqgan` and `llama` models from our Hugging Face repository.

```bash
huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2
huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft
```

### 1. Generate prompt from voice:
Expand All @@ -26,7 +26,7 @@ huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-
```bash
python tools/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
```

You should get a `fake.npy` file.
Expand All @@ -38,7 +38,7 @@ python tools/llama/generate.py \
--text "The text you want to convert" \
--prompt-text "Your reference text" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/fish-speech-1.2" \
--checkpoint-path "checkpoints/fish-speech-1.2-sft" \
--num-samples 2 \
--compile
```
Expand All @@ -59,7 +59,7 @@ This command will create a `codes_N` file in the working directory, where N is a
```bash
python tools/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
```

## HTTP API Inference
Expand All @@ -69,8 +69,8 @@ We provide a HTTP API for inference. You can use the following command to start
```bash
python -m tools.api \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/fish-speech-1.2" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down Expand Up @@ -142,8 +142,8 @@ You can start the WebUI using the following command:

```bash
python -m tools.webui \
--llama-checkpoint-path "checkpoints/fish-speech-1.2" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down
10 changes: 5 additions & 5 deletions docs/ja/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
VQGANの重みをダウンロードしたことを確認してください。まだダウンロードしていない場合は、次のコマンドを実行してください。

```bash
huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2
huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft
```

次に、次のコマンドを実行してセマンティックトークンを抽出できます。
Expand All @@ -45,7 +45,7 @@ huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-
python tools/vqgan/extract_vq.py data \
--num-workers 1 --batch-size 16 \
--config-name "firefly_gan_vq" \
--checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
```

!!! note
Expand Down Expand Up @@ -89,7 +89,7 @@ python tools/llama/build_dataset.py \
同様に、`LLAMA`の重みをダウンロードしたことを確認してください。まだダウンロードしていない場合は、次のコマンドを実行してください。

```bash
huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2
huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft
```

最後に、次のコマンドを実行して微調整を開始できます。
Expand Down Expand Up @@ -117,9 +117,9 @@ python fish_speech/train.py --config-name text2semantic_finetune \
```bash
python tools/llama/merge_lora.py \
--lora-config r_8_alpha_16 \
--base-weight checkpoints/fish-speech-1.2 \
--base-weight checkpoints/fish-speech-1.2-sft \
--lora-weight results/$project/checkpoints/step_000000010.ckpt \
--output checkpoints/fish-speech-1.2-yth-lora/
--output checkpoints/fish-speech-1.2-sft-yth-lora/
```
!!! note
他のチェックポイントを試すこともできます。要件を満たす最も早いチェックポイントを使用することをお勧めします。これらは通常、分布外(OOD)データでより良いパフォーマンスを発揮します。
16 changes: 8 additions & 8 deletions docs/ja/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
必要な`vqgan`および`llama`モデルを Hugging Face リポジトリからダウンロードします。

```bash
huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2
huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft
```

### 1. 音声からプロンプトを生成する:
Expand All @@ -26,7 +26,7 @@ huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-
```bash
python tools/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
```

`fake.npy`ファイルが生成されるはずです。
Expand All @@ -38,7 +38,7 @@ python tools/llama/generate.py \
--text "変換したいテキスト" \
--prompt-text "参照テキスト" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/fish-speech-1.2" \
--checkpoint-path "checkpoints/fish-speech-1.2-sft" \
--num-samples 2 \
--compile
```
Expand All @@ -63,7 +63,7 @@ python tools/llama/generate.py \
```bash
python tools/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
```

## HTTP API 推論
Expand All @@ -73,8 +73,8 @@ python tools/vqgan/inference.py \
```bash
python -m tools.api \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/fish-speech-1.2" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down Expand Up @@ -150,8 +150,8 @@ python -m tools.post_api \

```bash
python -m tools.webui \
--llama-checkpoint-path "checkpoints/fish-speech-1.2" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down
Loading

0 comments on commit 1d942c8

Please sign in to comment.