-
Notifications
You must be signed in to change notification settings - Fork 287
Closed
Labels
bugbugs in the librarybugs in the library
Description
Description
sent_tokenize(str, engine="whitespace") returns a list of a list of string, instead of a list of string.
Expected results
sent_tokenize should return a list ([]), not a list of a list ([[]])
Current results
https://github.com/PyThaiNLP/pythainlp/actions/runs/11627805664/job/32381817268
======================================================================
FAIL: test_sent_tokenize (tests.test_tokenize.TokenizeTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/runner/work/pythainlp/pythainlp/tests/test_tokenize.py", line 222, in test_sent_tokenize
self.assertEqual(
AssertionError: Lists differ: [['รักน้ำ', 'รักปลา', '']] != ['รักน้ำ', 'รักปลา', '']
First differing element 0:
['รักน้ำ', 'รักปลา', '']
'รักน้ำ'
Second list contains 2 additional elements.
First extra element 1:
'รักปลา'
- [['รักน้ำ', 'รักปลา', '']]
? - -
+ ['รักน้ำ', 'รักปลา', '']
Steps to reproduce
self.assertEqual(
sent_tokenize("รักน้ำ รักปลา ", engine="whitespace"),
["รักน้ำ", "รักปลา", ""],
)PyThaiNLP version
5.0.4
Python version
All
Operating system and version
All
More info
No response
Possible solution
No response
Files
No response
Metadata
Metadata
Assignees
Labels
bugbugs in the librarybugs in the library
Type
Projects
Status
No status