[Bug Fix]update tokenizer utils #3204

wj-Mcat · 2022-09-06T04:41:10Z

PR types

Bug fixes

PR changes

APIs

Description

try to fix: #3195

ZHUI · 2022-09-06T08:17:08Z

examples/benchmark/clue/mrc/run_c3.py

@@ -292,6 +292,8 @@ def _truncate_seq_tuple(tokens_a, tokens_b, tokens_c, max_length):
    train_ds, dev_ds, test_ds = load_dataset(
        "clue", "c3", split=["train", "validation", "test"])

+    train_ds, dev_ds, test_ds = train_ds[:200], dev_ds[:200], test_ds[:200]


嗯嗯，好，这个没注意

wj-Mcat · 2022-09-06T08:47:38Z

单测其实是已经通过了的，新commit估计还会跑接近两个小时。

…to fix-split-into-word

update tokenizer utils

0956d15

wj-Mcat changed the title ~~update tokenizer utils~~ [Bug Fix]update tokenizer utils Sep 6, 2022

wj-Mcat added 2 commits September 6, 2022 13:00

update example

5352162

Merge branch 'develop' into fix-split-into-word

e154094

wj-Mcat mentioned this pull request Sep 6, 2022

[BUG] Tokenizer is_split_into_words 参数选项不符合预期。 #3195

Closed

ZHUI reviewed Sep 6, 2022

View reviewed changes

remove debug code

ce43c8e

wj-Mcat added 5 commits September 6, 2022 16:47

Merge branch 'develop' into fix-split-into-word

3e1b2ed

test=document_fix

bbbe412

Merge branch 'fix-split-into-word' of github.com:wj-Mcat/PaddleNLP in…

8d89cc2

…to fix-split-into-word

test=document_fix

f21d3e8

test=document_fix

baf2b98

guoshengCS approved these changes Sep 6, 2022

View reviewed changes

guoshengCS merged commit 9d9b00b into PaddlePaddle:develop Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Fix]update tokenizer utils #3204

[Bug Fix]update tokenizer utils #3204

wj-Mcat commented Sep 6, 2022

ZHUI Sep 6, 2022

wj-Mcat Sep 6, 2022

wj-Mcat commented Sep 6, 2022

[Bug Fix]update tokenizer utils #3204

[Bug Fix]update tokenizer utils #3204

Conversation

wj-Mcat commented Sep 6, 2022

PR types

PR changes

Description

ZHUI Sep 6, 2022

Choose a reason for hiding this comment

wj-Mcat Sep 6, 2022

Choose a reason for hiding this comment

wj-Mcat commented Sep 6, 2022