Incorrect behavior in the subword tokenizer

**Describe the bug**
The subword tokenizer did not consider lowercasing the text (even while using uncased BERT)

**Expected behavior**
1. It should consider the case according to the pre-trained model
2. It should also consider doing basic tokenization following [texar-pytorch](https://texar-pytorch.readthedocs.io/en/latest/_modules/texar/torch/data/tokenizers/bert_tokenizer.html).

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Environment (please complete the following information):**
 - OS: [e.g. iOS]
 - Version [e.g. 22]
 - Python and Package verions: [e.g. Python version, Pytorch version]

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect behavior in the subword tokenizer #504

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect behavior in the subword tokenizer #504

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions