Skip to content

clause_tokenize returns an empty list. #609

@ponrawee

Description

@ponrawee

clause_tokenize should accept a list of strings (words) and return a list of list of strings (clauses). At the moment, it always returns an empty list.

Description

As stated above, clause_tokenize returns an empty list. I followed the example in the doc, but it did not produce the expected results. The issue still persisted after I downloaded lst20-cls.

from pythainlp.corpus import download
download('lst20-cls')
Corpus: lst20-cls
- Downloading: lst20-cls 0.2
100%|███████████████████████████████████████████████████████████████████| 3738912/3738912 [00:00<00:00, 19519586.71it/s]
True

Expected results

[['ฉัน', 'นอน'], ['และ', 'คุณ', 'เล่น', 'มือถือ'], ['ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]

Current results

[]

Steps to reproduce

from pythainlp.tokenize import clause_tokenize
clause_tokenize(["ฉัน","นอน","และ","คุณ","เล่น","มือถือ","ส่วน","น้อง","เขียน","โปรแกรม"])
  1. See the returned result.
[]

Your environment

  • PyThaiNLP version: 2.3.1
  • Python version: 3.8.10
  • Operating system and version (distro, 32/64-bit): Ubuntu (64-bit)
  • More info (Docker, VM, etc.): WSL2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugbugs in the library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions