Thai natural language processing in Python
-
Updated
Nov 21, 2025 - Python
Thai natural language processing in Python
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
CKIP Transformers
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models
A Japanese tokenizer based on recurrent neural networks
Cantonese Linguistics and NLP
中文文本分类、序列标注工具包(pytorch),支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Chinese text classification and sequence labeling toolkit, supports multi class and multi label classification, text similsrity, text summary and NER.
Python API for Kiwi
A PyTorch implementation of the BI-LSTM-CRF model.
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
CKIP CoreNLP Toolkits
A tool for comparing tokenizers
Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018
Source code for an ACL2016 paper of Chinese word segmentation
Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).
A Python wrapper for VnCoreNLP using a bidirectional communication channel.
Vietnamese Word Tokenize
A toolkit for pre-processing large source code corpora
Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Add a description, image, and links to the word-segmentation topic page so that developers can more easily learn about it.
To associate your repository with the word-segmentation topic, visit your repo's landing page and select "manage topics."