-
Notifications
You must be signed in to change notification settings - Fork 287
Closed
Labels
bugbugs in the librarybugs in the library
Description
clause_tokenize should accept a list of strings (words) and return a list of list of strings (clauses). At the moment, it always returns an empty list.
Description
As stated above, clause_tokenize returns an empty list. I followed the example in the doc, but it did not produce the expected results. The issue still persisted after I downloaded lst20-cls.
from pythainlp.corpus import download
download('lst20-cls')Corpus: lst20-cls
- Downloading: lst20-cls 0.2
100%|███████████████████████████████████████████████████████████████████| 3738912/3738912 [00:00<00:00, 19519586.71it/s]
True
Expected results
[['ฉัน', 'นอน'], ['และ', 'คุณ', 'เล่น', 'มือถือ'], ['ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]Current results
[]
Steps to reproduce
from pythainlp.tokenize import clause_tokenize
clause_tokenize(["ฉัน","นอน","และ","คุณ","เล่น","มือถือ","ส่วน","น้อง","เขียน","โปรแกรม"])- See the returned result.
[]
Your environment
- PyThaiNLP version: 2.3.1
- Python version: 3.8.10
- Operating system and version (distro, 32/64-bit): Ubuntu (64-bit)
- More info (Docker, VM, etc.): WSL2
Metadata
Metadata
Assignees
Labels
bugbugs in the librarybugs in the library