Skip to content

Tokenize

Wannaphong Phatthiyaphaibun edited this page Aug 7, 2021 · 4 revisions

LaoNLP support

  • Word segment
  • sentence tokenize

Word segment

word_tokenize(text)

Example

from laonlp.tokenize import word_tokenize
txt= "ພາສາລາວໃນປັດຈຸບັນ."
print(word_tokenize(txt)) # ['ພາສາລາວ', 'ໃນ', 'ປັດຈຸບັນ', '.']

sentence tokenize

sent_tokenize(text)

Clone this wiki locally