Skip to content

Incorrect behavior in the subword tokenizer #504

Closed
@hunterhector

Description

@hunterhector

Describe the bug
The subword tokenizer did not consider lowercasing the text (even while using uncased BERT)

Expected behavior

  1. It should consider the case according to the pre-trained model
  2. It should also consider doing basic tokenization following texar-pytorch.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Python and Package verions: [e.g. Python version, Pytorch version]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions