-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Open
Labels
Description
请提出你的问题 Please ask your question
纵轴是参数梯度L2范数,横轴是训练步数,input数据依次经过layer0到layer9,然后通过classifier 得到输出,并计算loss,没有设置共享参数
如图呈现靠近输入的梯度大,靠近输出的梯度小,怀疑是LayerList写法不对,但是按照文档和transformer.py写法结果仍然一样
目前是按照layerlist文档的写法
class MyLayer(paddle.nn.Layer):
def __init__(self):
super().__init__()
self.linears = paddle.nn.LayerList(
[paddle.nn.Linear(10, 10) for i in range(10)])
def forward(self, x):
# LayerList can act as an iterable, or be indexed using ints
for i, l in enumerate(self.linears):
x = self.linears[i // 2](x) + l(x)
return x
但是nn.layer.transformer文件中的写法如下:
def __init__(self, encoder_layer, num_layers, norm=None):
super().__init__()
self.layers = LayerList(
[
(
encoder_layer
if i == 0
else type(encoder_layer)(**encoder_layer._config)
)
for i in range(num_layers)
]
)