Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the usage of nltk bug #1515

Merged
merged 3 commits into from
Dec 30, 2021
Merged

Conversation

joey12300
Copy link
Contributor

PR types

Bug fixes

PR changes

APIs

Description

  1. Specify the correct version of nltk in README.md
  2. Fix the nltk punkt download

@@ -1641,6 +1641,8 @@ def _tokenize(self, text, is_sentencepiece=True):
text = convert_to_unicode(text)
text = " ".join(text.split()) # remove duplicate whitespace
nltk = try_import('nltk')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

怎么可以再关键函数上反复try import呢,这些都得在初始化阶段去做

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已将nltk imort放到__init__函数中

@ZeyuChen
Copy link
Member

nltk 在下载模型的时候会很慢很卡,这个地方是否评估过了?@joey12300

@joey12300
Copy link
Contributor Author

nltk 在下载模型的时候会很慢很卡,这个地方是否评估过了?@joey12300

这里打开代理后下载就几秒,但是关了代理就要五六分钟,也没有输出进度条像是hang住一样。我把这条命令单独拿出来在README上说明一下

@ZeyuChen
Copy link
Member

好的,争取今天内合入。

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 128a7e5 into PaddlePaddle:develop Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants