Bilingual support (#389)

* Add quick start ipynb * Remove redundant output * Fix docs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [Feature] Add Fast Whisper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Different suffix * Different audio format * Fix README.md for ja docs * Fix ZH docs * Fix docs images & WebUI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix spelling * Bilingual support~ * Fix indent * Fix ZH doc * how to use start.bat * Fix info pos --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Leng Yue <lengyue@lengyue.me>
fishaudio · Jul 18, 2024 · 04b6c10 · 04b6c10
1 parent 1d942c8
commit 04b6c10
Show file tree

Hide file tree

Showing 7 changed files with 42 additions and 54 deletions.
diff --git a/docs/en/index.md b/docs/en/index.md
@@ -13,9 +13,8 @@
 </div>
 
 !!! warning
-We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
-
-This codebase is released under the `BSD-3-Clause` license, and all models are released under the CC-BY-NC-SA-4.0 license.
+    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
+    This codebase is released under the `BSD-3-Clause` license, and all models are released under the CC-BY-NC-SA-4.0 license.
 
 <p align="center">
    <img src="../assets/figs/diagram.png" width="75%">

diff --git a/docs/en/inference.md b/docs/en/inference.md
@@ -54,7 +54,7 @@ This command will create a `codes_N` file in the working directory, where N is a
 
 ### 3. Generate vocals from semantic tokens:
 
-#### VQGAN Decoder (not recommended)
+#### VQGAN Decoder
 
 ```bash
 python tools/vqgan/inference.py \

diff --git a/docs/ja/index.md b/docs/ja/index.md
@@ -13,9 +13,8 @@
 </div>
 
 !!! warning
-私たちは、コードベースの違法な使用について一切の責任を負いません。お住まいの地域の DMCA（デジタルミレニアム著作権法）およびその他の関連法については、現地の法律を参照してください。
-
-このコードベースは `BSD-3-Clause` ライセンスの下でリリースされており、すべてのモデルは CC-BY-NC-SA-4.0 ライセンスの下でリリースされています。
+    私たちは、コードベースの違法な使用について一切の責任を負いません。お住まいの地域の DMCA（デジタルミレニアム著作権法）およびその他の関連法については、現地の法律を参照してください。 <br/>
+    このコードベースは `BSD-3-Clause` ライセンスの下でリリースされており、すべてのモデルは CC-BY-NC-SA-4.0 ライセンスの下でリリースされています。
 
 <p align="center">
    <img src="../assets/figs/diagram.png" width="75%">

diff --git a/docs/ja/inference.md b/docs/ja/inference.md
@@ -50,15 +50,11 @@ python tools/llama/generate.py \
     それに対応して、加速を使用しない場合は、`--compile`パラメータをコメントアウトできます。
 
 !!! info
-<<<<<<< HEAD
-    bf16をサポートしていないGPUの場合、`--half`パラメータを使用する必要があるかもしれません。
-=======
     bf16 をサポートしていない GPU の場合、`--half`パラメータを使用する必要があるかもしれません。
->>>>>>> upstream/main
 
 ### 3. セマンティックトークンから音声を生成する：
 
-#### VQGAN デコーダー（推奨されません）
+#### VQGAN デコーダー
 
 ```bash
 python tools/vqgan/inference.py \

diff --git a/docs/zh/index.md b/docs/zh/index.md
@@ -13,9 +13,8 @@
 </div>
 
 !!! warning
-我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规.
-
-此代码库根据 `BSD-3-Clause` 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布.
+    我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规. <br/>
+    此代码库根据 `BSD-3-Clause` 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布.
 
 <p align="center">
    <img src="../assets/figs/diagram.png" width="75%">
@@ -51,37 +50,27 @@ Windows 非专业用户可考虑以下为免 Linux 环境的基础运行方法
         - [Visual Studio 下载](https://visualstudio.microsoft.com/zh-hans/downloads/)
         - 安装好Visual Studio Installer之后，下载Visual Studio Community 2022
         - 如下图点击`修改`按钮，找到`使用C++的桌面开发`项，勾选下载
-<p align="center">
-   <img src="https://s2.loli.net/2024/07/15/pWdlYXNAMIzb8Lq.png" width="60%">
-</p>
-4. 双击 `start.bat`，进入 Fish-Speech 训练推理配置 WebUI 页面。
-    - (可选) 想直接进入推理页面？编辑项目根目录下的
-    -  进入网页后：
-
-<p align="center">
-  <img src="https://s2.loli.net/2024/05/06/gw2L39Qj4mClJSG.png" width="75%">
-</p>
-
-   -  简单说一下各部分区域构成，如下图所示，方便按图索骥：
-
-<p align="center">
-  <img src="https://s2.loli.net/2024/05/06/NvfsgyRZCSk72MG.png" width="75%">
-</p>
+    4. 下载安装 [CUDA Toolkit 12](https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64)
+4. 双击 `start.bat` 打开训练推理WebUI管理界面. 如有需要，可照下列提示修改`API_FLAGS`.
+
+!!! info "可选"
 
-   -  **1** banner（横幅）：进入网页后从左到右逐渐显示"Welcome to Fish-Speech"字样。以后可能变动。
-   -  **2** 功能区: 在这里，你将决定数据集文件的来源，文本标签的修改，训练参数的调整、推理页面的设置。
-   -  **3** 文件信息展示区：一般不可更改。指引你如何找到自己的预处理后的数据文件、训练后的模型文件所在路径。
-   -  **4** 版本/作者信息。可以多多支持一下作者。
-   -  **5** 欢迎更好的动效~
+    想启动 推理 WebUI 界面？编辑项目根目录下的 `API_FLAGS.txt`, 前三行修改成如下格式:
+    ```
+    --infer
+    # --api
+    # --listen ...
+    ...
+    ```
 
 !!! info "可选"
 
     想启动 API 服务器？编辑项目根目录下的 `API_FLAGS.txt`, 前三行修改成如下格式:
     ```
     # --infer
-        --api
-        --listen ...
-        ...
+    --api
+    --listen ...
+    ...
     ```
 
 !!! info "可选"

diff --git a/docs/zh/inference.md b/docs/zh/inference.md
@@ -58,10 +58,7 @@ python tools/llama/generate.py \
 !!! info
     对于不支持 bf16 的 GPU, 你可能需要使用 `--half` 参数.
 
-<<<<<<< HEAD
-=======
 ### 3. 从语义 token 生成人声:
->>>>>>> upstream/main
 
 #### VQGAN 解码
 
@@ -81,11 +78,12 @@ python -m tools.api \
     --llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" \
     --decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
     --decoder-config-name firefly_gan_vq
+```
+如果你想要加速推理，可以加上`--compile`参数。
 
-如果你想要加速推理，可以加上--compile参数。
-
-# 推荐中国大陆用户运行以下命令来启动 HTTP 服务:
-HF_ENDPOINT=https://hf-mirror.com python -m ...
+推荐中国大陆用户运行以下命令来启动 HTTP 服务:
+```bash
+HF_ENDPOINT=https://hf-mirror.com python -m ...(同上)
 ```
 
 随后, 你可以在 `http://127.0.0.1:8080/` 中查看并测试 API.

diff --git a/inference.ipynb b/inference.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### For Windows"
+    "### For Windows User / win用户"
    ]
   },
   {
@@ -31,7 +31,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### For Linux"
+    "### For Linux User / Linux 用户"
    ]
   },
   {
@@ -96,9 +96,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1. Encode reference audio\n",
+    "### 1. Encode reference audio: / 从语音生成 prompt: \n",
     "\n",
-    "You should get a `fake.npy` by doing this."
+    "You should get a `fake.npy` file.\n",
+    "\n",
+    "你应该能得到一个 `fake.npy` 文件."
    ]
   },
   {
@@ -127,10 +129,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 2. Generate semantic tokens from text:\n",
-    "> This command will create codes_N files in the working directory, where N is an integer starting from 0.\n",
+    "### 2. Generate semantic tokens from text: / 从文本生成语义 token:\n",
+    "\n",
+    "> This command will create a codes_N file in the working directory, where N is an integer starting from 0.\n",
+    "\n",
+    "> You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~300 tokens/second).\n",
+    "\n",
+    "> 该命令会在工作目录下创建 codes_N 文件, 其中 N 是从 0 开始的整数.\n",
     "\n",
-    "> You can use --compile to fuse CUDA kernels for faster inference."
+    "> 您可以使用 `--compile` 来融合 cuda 内核以实现更快的推理 (~30 tokens/秒 -> ~300 tokens/秒)"
    ]
   },
   {
@@ -156,7 +163,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3. Generate speech from semantic tokens:"
+    "### 3. Generate speecj from semantic tokens: / 从语义 token 生成人声:"
    ]
   },
   {