Non-English tokenizers

**Describe the solution you'd like**
For CJK languages, like for example Chinese, words are not separated by spaces. So there usually has a need to use a tokenizer to split sentences into word stems. Like this one: https://github.com/yanyiwu/cppjieba
Is it currently doable in Pisa? If not, is there any plan to add this feature in the future?

**Additional context**