Cantonese segmentation tool 粵語分詞工具
$ pip install cantoseg
>>> import cantoseg
>>> cantoseg.cut('香港喺舊石器時代就有人住')
['香港', '喺', '舊石器時代', '就', '有人', '住']
A generator version is also available: cantoseg.lcut
.
See article Cantonese Segmentation and Part-Of-Speech Tagging (in Chinese).