Open
Description
I was also hoping to get original text for the annotated files. Is that available? I would use it alongside the tokens from the NER dataset to build a tokenizer model.
There is no Universal Dependencies dataset for Bengali, which is how we build most of our tokenizers. I am under the impression that generally speaking, Bengali is tokenized by whitespace aside from the punctuation characters, but it would still be useful to make such a dataset.
Thanks!
Metadata
Assignees
Labels
No labels
Activity