Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.12】Add AdaptiveLogSoftmaxWithLoss API to Paddle #770

Merged
merged 4 commits into from
Dec 11, 2023

Conversation

Patrick-Star125
Copy link
Contributor

@Patrick-Star125 Patrick-Star125 commented Dec 2, 2023

Copy link

paddle-bot bot commented Dec 2, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.


adaptive_log_softmax_with_loss的计算分步骤如下

$\text{head_output} = \text{linear}(\text{input}, \text{head_weight}, \text{head_bias})$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

公式格式好像有点问题

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用图片替代了

$\text{output} += \text{take_along_axis}(\text{head_logprob}, \text{gather_inds.unsqueeze(1)}, \text{axis}=1).\text{squeeze()}$

$\text{loss} = -\text{output.mean()}$

## 3、意义
在自然语言处理中,当字典维度过大时,embedding 将占据模型大部分参数量。
例如机器翻译任务中,词表维度大约是2^17,embedding维度取1024,那么就会产生将近1亿参数量,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个共享的说法是否准确?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除


adaptive_log_softmax_with_loss的计算分步骤如下

![image](https://github.com/PaddlePaddle/community/assets/69072522/3d43f3e9-deb0-4d52-96be-2cd85a104b90)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个图片好像还是有点问题,那个=1应该是axis=1吧,还有,把每一层在做什么也说明一下


layer层类API:`paddle.nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs, div_value=4.0, head_bias=False, name=None)`,包含两个主要方法:
- forward(self, input, label),用于训练,返回为`output` 和 `loss`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个格式好像也有点问题

# 六、测试和验收的考量
测试考虑的case如下:

- 数值正确性
- 数值正确性(CPU、GPU、动态图、静态图)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个正确性准备怎么验证呢

Copy link
Contributor Author

@Patrick-Star125 Patrick-Star125 Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和torch一样用计算等价的方式验证,numpy一部分缺失部分API,并且该API函数逻辑比较多,所以完全复现会比较繁琐

@Patrick-Star125
Copy link
Contributor Author

Done

Copy link
Contributor

@GGBond8488 GGBond8488 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 8058019 into PaddlePaddle:master Dec 11, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants