Original text for the annotated files?

I was also hoping to get original text for the annotated files.  Is that available?  I would use it alongside the tokens from the NER dataset to build a tokenizer model.

There is no Universal Dependencies dataset for Bengali, which is how we build most of our tokenizers.  I am under the impression that generally speaking, Bengali is tokenized by whitespace aside from the punctuation characters, but it would still be useful to make such a dataset.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Original text for the annotated files? #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development