Name	Name	Last commit message	Last commit date
Latest commit History 2,185 Commits
.circleci	.circleci
.github/ISSUE_TEMPLATE	.github/ISSUE_TEMPLATE
bin	bin
docs	docs
pythainlp	pythainlp
tests	tests
.gitignore	.gitignore
.travis.yml	.travis.yml
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
Makefile	Makefile
README.md	README.md
SECURITY.md	SECURITY.md
appveyor.docs.yml	appveyor.docs.yml
appveyor.yml	appveyor.yml
build_pypi.bat	build_pypi.bat
requirements.txt	requirements.txt
setup.cfg	setup.cfg
setup.py	setup.py
tox.ini	tox.ini
travis_pypi_setup.py	travis_pypi_setup.py
upload_pypi.bat	upload_pypi.bat

PyThaiNLP

Thai Natural Language Processing in Python.

PyThaiNLP is a Python package for text processing and linguistic analysis, similar to nltk but with focus on Thai language.

This is 2.1.3 stable release. See change log.
For latest development, see dev branch.
📫 follow our PyThaiNLP Facebook page

Capabilities

Convenient character and word classes, like Thai consonants (pythainlp.thai_consonants), vowels (pythainlp.thai_vowels), digits (pythainlp.thai_digits), and stop words (pythainlp.corpus.thai_stopwords) -- comparable to constants like string.letters, string.digits, and string.punctuation
Thai word segmentation (word_tokenize), including subword segmentation based on Thai Character Cluster (subword_tokenize)
Thai transliteration (transliterate)
Thai part-of-speech taggers (pos_tag)
Read out number to Thai words (bahttext, num_to_thaiword)
Thai collation (sort by dictionoary order) (collate)
Thai-English keyboard misswitched fix (eng_to_thai, thai_to_eng)
Thai spelling suggestion and correction (spell and correct)
Thai soundex (soundex) with three engines (lk82, udom83, metasound)
Thai WordNet wrapper
and much more - see examples in tutorials.

Installation

PyThaiNLP uses PyPI as its main distribution channel, see https://pypi.org/project/pythainlp/

Stable release

pip install pythainlp

Development release

pip install --upgrade --pre pythainlp

Install options

For some functionalities, like named-entity recognition, extra packages may be needed. Install them with these install options:

pip install pythainlp[extra1,extra2,...]

where extras can be

attacut (to support attacut, a fast and accurate tokenizer)
icu (for ICU, International Components for Unicode, support in transliteration and tokenization)
ipa (for IPA, International Phonetic Alphabet, support in transliteration)
ml (to support ULMFiT models for classification)
ner (for named-entity recognizer)
thai2fit (for Thai word vector)
thai2rom (for machine-learnt romanization)
full (install everything)

For dependency details, look at extras variable in setup.py.

Data directory

Some additional data (like word lists and language models) maybe automatically downloaded by the library during runtime and it will be kept under the directory ~/pythainlp-data by default.

The data location can be changed, using PYTHAINLP_DATA_DIR environment variable.

Documentation

PyThaiNLP Get Started
More tutorials at https://www.thainlp.org/pythainlp/tutorials/
See full documentation at https://thainlp.org/pythainlp/docs/2.0/

Python 2 Users

PyThaiNLP 2 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See 1.7 -> 2.0 change log.
- Upgrading from 1.7
- Upgrade ThaiNER from 1.7
Python 2.7 users can use PyThaiNLP 1.6

License

PyThaiNLP code uses Apache Software License 2.0
Corpus data created by PyThaiNLP project use Creative Commons Attribution-ShareAlike 4.0 International License
For other corpus that may included with PyThaiNLP distribution, please refer to Corpus License.

Contribute to PyThaiNLP

Please do fork and create a pull request :) For style guide and other information, including references to algorithms we use, please refer to our contributing page.

Made with ❤️
PyThaiNLP Team
"We build Thai NLP"

ภาษาไทย

PyThaiNLP เป็นไลบารีภาษาไพทอนเพื่อการประมวลผลภาษาธรรมชาติ โดยเน้นการสนับสนุนภาษาไทย แจกจ่ายฟรี (ตลอดไป) เพื่อคนไทยและชาวโลกทุกคน!

เพราะโลกขับเคลื่อนต่อไปด้วยการแบ่งปัน

รุ่นนี้คือรุ่นเสถียร [2.1.3(https://github.com/PyThaiNLP/pythainlp/releases) ดูความเปลี่ยนแปลงในรุ่นนี้ที่ 2.1 change log.
สำหรับรุ่นที่กำลังพัฒนา ดูใน dev branch
PyThaiNLP 2.1 รองรับ Python 3.6 ขึ้นไป ผู้ใช้ Python 2.7+ ยังสามารถใช้ PyThaiNLP 1.6 ได้
📫 ติดตามข่าวสารได้ที่ Facebook PyThaiNLP

ความสามารถ

ชุดค่าคงที่ตัวอักษระและคำไทยที่เรียกใช้ได้สะดวก เช่น พยัญชนะ (pythainlp.thai_consonants), สระ (pythainlp.thai_vowels), ตัวเลขไทย (pythainlp.thai_digits), และ stop word (pythainlp.corpus.thai_stopwords) -- เหมือนกับค่าคงที่อย่าง string.letters, string.digits, และ string.punctuation
ตัดคำภาษาไทย (word_tokenize) และรองรับการตัดระดับต่ำกว่าคำโดยใช้ Thai Character Clusters (subword_tokenize)
ถอดเสียงภาษาไทยเป็นอักษรละตินและสัทอักษร (transliterate)
ระบุชนิดคำ (part-of-speech) ภาษาไทย (pos_tag)
อ่านตัวเลขเป็นข้อความภาษาไทย (bahttext, num_to_thaiword)
เรียงลำดับคำตามพจนานุกรมไทย (collate)
แก้ไขปัญหาการพิมพ์ลืมเปลี่ยนภาษา (eng_to_thai, thai_to_eng)
ตรวจคำสะกดผิดในภาษาไทย (spell, correct)
soundex ภาษาไทย (soundex) 3 วิธีการ (lk82, udom83, metasound)
Thai WordNet wrapper
และอื่น ๆ ดูตัวอย่างได้ใน tutorials สอนวิธีใช้งาน

ติดตั้ง

รุ่นเสถียร

pip install pythainlp

รุ่นกำลังพัฒนา

pip install --upgrade --pre pythainlp

การติดตั้งความสามารถเพิ่มเติม

สำหรับความสามารถเพิ่มเติมบางอย่าง เช่น named-entity recognition จำเป็นต้องติดตั้งแพคเกจสนับสนุนเพิ่มเติม ติดตั้งแพคเกจเหล่านั้นได้ ด้วยการระบุออปชันเหล่านี้ตอน pip install:

pip install pythainlp[extra1,extra2,...]

โดยที่ extras คือ

attacut (ตัวตัดคำที่แม่นกว่า newmm เมื่อเทียบกับชุดข้อมูล BEST)
icu (สำหรับการถอดตัวสะกดเป็นสัทอักษรและการตัดคำด้วย ICU)
ipa (สำหรับการถอดตัวสะกดเป็นสัทอักษรสากล (IPA))
ml (สำหรับการรองรับโมเดล ULMFiT)
ner (สำหรับการติดป้ายชื่อเฉพาะ (named-entity))
thai2fit (สำหรับ word vector)
thai2rom (สำหรับการถอดตัวสะกดเป็นอักษรละติน)
full (ติดตั้งทุกอย่าง)

สำหรับรายละเอียดแพคเกจของเสริม สามารถดูได้ในตัวแปรชื่อ extras ใน setup.py

ไดเรกทอรีเก็บข้อมูล

ระหว่างการทำงาน PyThaiNLP อาจดาวน์โหลดข้อมูลเพิ่มเติม เช่น ตัวแบบภาษา และรายการคำ ข้อมูลเหล่านี้จะถูกเก็บไว้ที่ไดเรกทอรี ~/pythainlp-data เป็นตำแหน่งมาตรฐาน

ตำแหน่งเก็บข้อมูลนี้สามารถกำหนดเองได้ โดยการเปลี่ยนแปลงตัวแปรสิ่งแวดล้อม PYTHAINLP_DATA_DIR ของระบบปฏิบัติการ

เอกสารการใช้งาน

เริ่มต้นใช้งาน PyThaiNLP
สอนการใช้งานเพิ่มเติม ในรูปแบบ notebook https://www.thainlp.org/pythainlp/tutorials/
เอกสารตัวเต็ม https://thainlp.org/pythainlp/docs/2.0/

สัญญาอนุญาต

โค้ด PyThaiNLP ใช้สัญญาอนุญาต Apache Software License 2.0
คลังคำและข้อมูลที่สร้างโดยโครงการ PyThaiNLP ใช้สัญญาอนุญาตครีเอทีฟคอมมอนส์แบบแสดงที่มา-อนุญาตแบบเดียวกัน 4.0 Creative Commons Attribution-ShareAlike 4.0 International License
คลังคำและข้อมูลอื่นๆ ที่อาจแจกจ่ายไปพร้อมกับแพคเกจ PyThaiNLP อาจใช้สัญญาอนุญาตอื่น โปรดดูเอกสาร Corpus License

ตราสัญลักษณ์

ออกแบบโดยคุณ วรุตม์ พสุธาดล จากการประกวดที่ https://www.facebook.com/groups/408004796247683/permalink/475864542795041/ และ https://www.facebook.com/groups/408004796247683/permalink/474262752955220/

สนับสนุนและร่วมพัฒนา

คุณสามารถร่วมพัฒนาโครงการนี้ได้ โดยการ fork และส่ง pull request กลับมา

สร้างด้วย ❤️
ทีม PyThaiNLP
"พวกเราสร้าง Thai NLP"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyThaiNLP

Capabilities

Installation

Stable release

Development release

Install options

Data directory

Documentation

Python 2 Users

License

Contribute to PyThaiNLP

ภาษาไทย

ความสามารถ

ติดตั้ง

รุ่นเสถียร

รุ่นกำลังพัฒนา

การติดตั้งความสามารถเพิ่มเติม

ไดเรกทอรีเก็บข้อมูล

เอกสารการใช้งาน

สัญญาอนุญาต

ตราสัญลักษณ์

สนับสนุนและร่วมพัฒนา

About

Uh oh!

Releases 125

Packages

Uh oh!

Contributors 56

Uh oh!

Languages

License

PyThaiNLP/pythainlp

Folders and files

Latest commit

History

Repository files navigation

PyThaiNLP

Capabilities

Installation

Stable release

Development release

Install options

Data directory

Documentation

Python 2 Users

License

Contribute to PyThaiNLP

ภาษาไทย

ความสามารถ

ติดตั้ง

รุ่นเสถียร

รุ่นกำลังพัฒนา

การติดตั้งความสามารถเพิ่มเติม

ไดเรกทอรีเก็บข้อมูล

เอกสารการใช้งาน

สัญญาอนุญาต

ตราสัญลักษณ์

สนับสนุนและร่วมพัฒนา

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 125

Packages 0

Uh oh!

Contributors 56

Uh oh!

Languages

Packages