Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ERNIE 3.0系列词表存在重复token #6839

Open
1 task done
yqt opened this issue Aug 28, 2023 · 1 comment
Open
1 task done

[Bug]: ERNIE 3.0系列词表存在重复token #6839

yqt opened this issue Aug 28, 2023 · 1 comment
Assignees
Labels
bug Something isn't working triage

Comments

@yqt
Copy link

yqt commented Aug 28, 2023

软件环境

- paddlepaddle:2.5.1
- paddlepaddle-gpu: 无 
- paddlenlp: 2.6.0

重复问题

  • I have searched the existing issues

错误描述

token_id 12084和18005的token重复,均为美元符号`$`。load vocab的时候为map赋值操作,未检测重复token,导致token_id=12084没有对应token。

相关issue:https://github.com/PaddlePaddle/PaddleNLP/issues/6429

稳定复现步骤 & 代码

vocab.txt

line 12085: $
line 18006: $

@yqt yqt added the bug Something isn't working label Aug 28, 2023
@w5688414
Copy link
Contributor

感谢您的反馈,这是一个已知的问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants