Name	Name	Last commit message	Last commit date
Latest commit History 994 Commits
.circleci	.circleci
.github/ISSUE_TEMPLATE	.github/ISSUE_TEMPLATE
docs	docs
examples	examples
pythainlp	pythainlp
tests	tests
.gitignore	.gitignore
.travis.yml	.travis.yml
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
Makefile	Makefile
README-pypi.md	README-pypi.md
README.md	README.md
appveyor.yml	appveyor.yml
build_pypi.bat	build_pypi.bat
requirements-travis.txt	requirements-travis.txt
requirements.txt	requirements.txt
setup.cfg	setup.cfg
setup.py	setup.py
tox.ini	tox.ini
travis_pypi_setup.py	travis_pypi_setup.py
upload_pypi.bat	upload_pypi.bat

PyThaiNLP

Thai Natural Language Processing in Python.

PyThaiNLP is a Python package for text processing and linguistic analysis, similar to nltk but with focus on Thai language.

PyThaiNLP supports Python 3.4+. Since version 1.7, PyThaiNLP deprecates its support for Python 2. Python 2 users can still use PyThaiNLP 1.6.

Capabilities

Thai word segmentation (word_tokenize), including subword segmentation based on Thai Character Cluster (tcc) and ETCC (etcc)
Thai romanization (romanize)
Thai part-of-speech taggers (pos_tag)
Read out number to Thai words (bahttext, num_to_thaiword)
Thai collation (sort by dictionoary order) (collate)
Thai-English keyboard misswitched fix (eng_to_thai, thai_to_eng)
Thai misspellings detection and spelling correction (spell)
Thai soundex (lk82, udom83, metasound)
Thai stop words (pythainlp.corpus.thai_stopwords)
Thai WordNet
and much more - see examples.

Installation

Using pip

Stable release

$ pip install pythainlp

Development release

$ pip install https://github.com/PyThaiNLP/pythainlp/archive/dev.zip

Note: PyTorch is required for ulmfit sentiment analyser. pip install torch is needed for the feature. gensim and keras packages may also needed for other modules that rely on these machine learning libraries.

Documentation

See https://thainlp.org/pythainlp/docs/1.7/

License

PyThaiNLP code uses Apache Software License 2.0
Corpus data created by PyThaiNLP project use Creative Commons Attribution-ShareAlike 4.0 International License
For other corpus that may included with PyThaiNLP distribution, please refer to Corpus License.

Contribute to PyThaiNLP

Please do fork and create a pull request :)

For style guide and other information, including references to algorithms we use, please refer to our contributing page.

ภาษาไทย

ประมวลภาษาไทยในภาษา Python

PyThaiNLP เป็นไลบารีภาษาไพทอนเพื่อการประมวลผลภาษาธรรมชาติ โดยเน้นการสนับสนุนภาษาไทย แจกจ่ายฟรี (ตลอดไป) เพื่อคนไทยและชาวโลกทุกคน!

เพราะโลกขับเคลื่อนต่อไปด้วยการแบ่งปัน

รองรับ Python 3.4 ขึ้นไป

หน้าหลัก GitHub: https://github.com/PyThaiNLP/pythainlp/

ความสามารถ

ตัดคำภาษาไทย (word_tokenize) และรองรับ Thai Character Clusters (tcc) และ ETCC (etcc)
ถอดเสียงภาษาไทยเป็นอักษรละติน (romanize)
ระบุชนิดคำ (part-of-speech) ภาษาไทย (pos_tag)
อ่านตัวเลขเป็นข้อความภาษาไทย (bahttext, num_to_thaiword)
เรียงลำดับคำตามพจนานุกรมไทย (collate)
แก้ไขปัญหาการพิมพ์ลืมเปลี่ยนภาษา (eng_to_thai, thai_to_eng)
ตรวจคำสะกดผิดในภาษาไทย (spell)
soundex ภาษาไทย (lk82, udom83, metasound)
stop word ภาษาไทย (pythainlp.corpus.thai_stopwords)
Thai WordNet
และอื่น ๆ ดูตัวอย่าง

ติดตั้ง

รุ่นเสถียร

$ pip install pythainlp

รุ่นกำลังพัฒนา

$ pip install https://github.com/PyThaiNLP/pythainlp/archive/dev.zip

หมายเหตุ: เนื่องจาก ulmfit sentiment analyser ต้องใช้ PyTorch จึงต้อง pip install torch เพื่อติดตั้ง PyTorhc ก่อน มอดูลที่อาศัยการเรียนรู้ของเครื่องอื่นๆ อาจจำเป็นต้องติดตั้ง gensim และ keras ก่อนเช่นกัน

เอกสารการใช้งาน

อ่านที่ https://thainlp.org/pythainlp/docs/1.7/

สัญญาอนุญาต

โค้ด PyThaiNLP ใช้สัญญาอนุญาต Apache Software License 2.0
คลังคำและข้อมูลที่สร้างโดยโครงการ PyThaiNLP ใช้สัญญาอนุญาตครีเอทีฟคอมมอนส์แบบแสดงที่มา-อนุญาตแบบเดียวกัน 4.0 Creative Commons Attribution-ShareAlike 4.0 International License
คลังคำและข้อมูลอื่นๆ ที่อาจแจกจ่ายไปพร้อมกับแพคเกจ PyThaiNLP อาจใช้สัญญาอนุญาตอื่น โปรดดูเอกสาร Corpus License

ตราสัญลักษณ์

ออกแบบโดยคุณ วรุตม์ พสุธาดล จากการประกวดที่ https://www.facebook.com/groups/408004796247683/permalink/475864542795041/ และ https://www.facebook.com/groups/408004796247683/permalink/474262752955220/

สนับสนุนและร่วมพัฒนา

คุณสามารถร่วมพัฒนาโครงการนี้ได้ โดยการ fork และส่ง pull request กลับมา

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyThaiNLP

Capabilities

Installation

Documentation

License

Contribute to PyThaiNLP

ภาษาไทย

ความสามารถ

ติดตั้ง

เอกสารการใช้งาน

สัญญาอนุญาต

ตราสัญลักษณ์

สนับสนุนและร่วมพัฒนา

About

Uh oh!

Releases 124

Packages

Used by 1.6k

Contributors 56

Uh oh!

Languages

License

PyThaiNLP/pythainlp

Folders and files

Latest commit

History

Repository files navigation

PyThaiNLP

Capabilities

Installation

Documentation

License

Contribute to PyThaiNLP

ภาษาไทย

ความสามารถ

ติดตั้ง

เอกสารการใช้งาน

สัญญาอนุญาต

ตราสัญลักษณ์

สนับสนุนและร่วมพัฒนา

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 124

Packages 0

Used by 1.6k

Contributors 56

Uh oh!

Languages

Packages