Skip to content

Transformer BLOCK 梯度异常,梯度反向传播异常放大?同#69174 #69310

Open
@johnyanccer

Description

请提出你的问题 Please ask your question

Transformer BLOCK 梯度异常,同#69174#69174
在2.6和3.0环境都有测试,aistudio有公开项目,
https://aistudio.baidu.com/projectdetail/8382058

原本以为是layerlist写法不对,调整后结果无差异,纵轴是参数梯度L2范数,横轴是训练步数,input数据依次经过encoder0到encoder9,然后通过classifier 得到输出,并计算loss,没有设置共享参数

grad

如图呈现靠近输入的梯度大,靠近输出的梯度小,并且梯度迅速减小到0,然后再增大,再减小

调整transformer BLOCK的初始化参数能改变梯度大小,但是仍然呈现反向放大,并且迅速归0的现象

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions