【持续更新】一些可能需要注意的“槽点”🤔

这里用来更新一些不适合实时更新但如果方便应该在V3的更新中修改的部分，开一个issue用于记录避免遗忘🤔
如果你觉得有什么槽点也可以在这里更新🤔
包括不限于一些知名屎山（如RoPE），以及一些涉及state dict的改动（保证权重兼容性）

### 1.RoPE Cache 

https://github.com/openvpi/DiffSinger/blob/main/modules/commons/rotary_embedding_torch.py#L179

当ONNX导出时，trace时输入的shape长度决定这里的`seq_len`（在acoustic时是[1](https://github.com/openvpi/DiffSinger/blob/main/deployment/exporters/acoustic_exporter.py#L171)，variance时是[5](https://github.com/openvpi/DiffSinger/blob/main/deployment/exporters/variance_exporter.py#L192)）

这导致了在ONNX的node:`/fs2/encoder/layers.0/op/self_attn/Slice`记录的data行为不正确（在acoustic中是全0，而variance中只有前五个数据，与trace时输入长度匹配）

但似乎目前还没有造成实质上的问题

### 2.MultiheadSelfAttentionWithRoPE 初始化 

- [X] FIXED (https://github.com/openvpi/DiffSinger/pull/250)

https://github.com/pytorch/pytorch/blob/v2.6.0/torch/nn/modules/activation.py#L1116

在pytorch官方的`nn.MultiheadSelfAttention`实现中采用了初始化设置，一般来说对于Attention的实现应该追随原版实现🤔

在MultiheadSelfAttentionWithRoPE中缺省了这些操作[#L162](https://github.com/openvpi/DiffSinger/blob/main/modules/commons/common_layers.py#L162)、[#L165](https://github.com/openvpi/DiffSinger/blob/main/modules/commons/common_layers.py#L165)

可以使用`common_layers`中的`XavierUniformInitLinear`方法，或者直接在`MultiheadSelfAttentionWithRoPE `的`init`下进行操作

### 3.kaiming normal的Conv1d命名

- [X] FIXED (https://github.com/openvpi/DiffSinger/pull/250)

在过往的Wavenet和后来的LYNXNet使用了通过kaiming normal的Conv1d

https://github.com/openvpi/DiffSinger/blob/muon_lynxnet2/modules/commons/common_layers.py#L128

这里的命名可能有些草率了🤔

### 4.LYNXNet2的condition cache

- [X] FIXED (https://github.com/openvpi/DiffSinger/pull/259)

https://github.com/openvpi/DiffSinger/blob/muon_lynxnet2/modules/backbones/lynxnet2.py#L44

在LYNXNet2中使用了`nn.Linear`处理condition的注入shape，在ONNX中会被分解为MatMul和Add，不兼容目前的`graph_extract_conditioner_projections`方法（支持Gemm和Conv）

尽管LYNXNet2改变注入方式后，只有一个独立的注入层，但是每次sample时依然会计算一次，故这里应该改为`Conv1d`

### 5.retake缩放行为问题

- [X] FIXED (https://github.com/openvpi/DiffSinger/pull/270)

https://github.com/openvpi/DiffSinger/blob/muon_lynxnet2/modules/toplevel.py#L329

这里行为似乎有些问题

应该是v_input去乘scaling，而不是整个emb，这样emb的bias会跟着一起缩放

考虑到区别应该不大外加retake机制重做的可能或许不用修改

若V3实装则应该直接实装正确的方式

### 6.mel提取方式

😇

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【持续更新】一些可能需要注意的“槽点”🤔 #249

1.RoPE Cache

2.MultiheadSelfAttentionWithRoPE 初始化

3.kaiming normal的Conv1d命名

4.LYNXNet2的condition cache

5.retake缩放行为问题

6.mel提取方式

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

【持续更新】一些可能需要注意的“槽点”🤔 #249

Description

1.RoPE Cache

2.MultiheadSelfAttentionWithRoPE 初始化

3.kaiming normal的Conv1d命名

4.LYNXNet2的condition cache

5.retake缩放行为问题

6.mel提取方式

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions