Skip to content

Commit d61bfa3

Browse files
authored
Merge branch 'dev' into 2.1
2 parents e7d802f + 58d4d00 commit d61bfa3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+9451
-3830
lines changed

.travis.yml

Lines changed: 12 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,24 @@
1-
# Config file for automatic testing at travis-ci.org
2-
# This file will be regenerated if you run travis_pypi_setup.py
3-
41
language: python
2+
dist: bionic
53
python:
6-
- "3.6"
7-
8-
# workaround to make boto work on travis
9-
# from https://github.com/travis-ci/travis-ci/issues/7940
4+
- '3.6'
105
before_install:
11-
- sudo rm -f /etc/boto.cfg
12-
13-
# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
6+
- sudo rm -f /etc/boto.cfg
147
install:
15-
- pip install -r requirements.txt
16-
- pip install .[artagger,icu,ipa,ner,thai2fit,deepcut]
17-
- pip install coveralls
18-
8+
- pip install -U numpy
9+
- pip install -q -r requirements.txt
10+
- pip install -q .[full]
11+
- pip install coveralls
1912
os:
20-
- linux
21-
22-
# command to run tests, e.g. python setup.py test
23-
script:
24-
coverage run --source=pythainlp setup.py test
25-
26-
after_success:
27-
coveralls
28-
29-
# After you create the Github repo and add it to Travis, run the
30-
# travis_pypi_setup.py script to finish PyPI deployment setup
13+
- linux
14+
script: coverage run --source=pythainlp setup.py test
15+
after_success: coveralls
3116
deploy:
3217
provider: pypi
3318
distributions: sdist bdist_wheel
3419
user: wannaphong
3520
password:
36-
secure: PLEASE_REPLACE_ME
21+
secure: zX35+8niw5W9H8XbFwacrDAhqyIibdUdC/cARnHlmxLN/2H9IynK0NW04UZwkBlrwrIZrU/g+cqYXFQXu6jE1ozlBKBxUd3xG8d1kixuntI0j9e+erPTs8Ju/KazUZtlknJPvnDMP+/1Dq+RMnMCP3RRlBrH6lvG70OgZ1aBpgx8FxRfs0xHfBIZvo5CVtR/QlDzhDJM1cgEyWkSgnlAhPxpv8qIQbh4/Rw89jXIZqv0bGCVJorrrcTA1oCzkr/4E4u/WZaARnvPjUr2a9U1w7C2IysDHiBfqQWlovdMmpoSLFE56YlG3smbmXfldWjmiMRQoWL+Ifu+smisvOLmR0ja78UMrrhHWP4mdzIeBVVRnT6eHUv0ChmLT2uCkOLE0newhtEJIYToot2TSoLFavXXIQB1fIHt6e74KRTV6WGnm0nFfHuGP+b5SgSPQFgqx8tBpn0rBOeqZ1y3pRISc/drF0F4reWMnlqoQfZZFmLmU1UmDZbvWNvXPu6MWyyuZ1F6fE9jyb3mG+kDuJf1PZ4ejC/sdIvpLlwUGLFGzRMa2TtxXqGq5CWsywPxo8Sx+bpMPCOImuW60PB9K/xKgfLhAtb7gZwndzUGqDbtSJCd5PmTkfEH8fawv/XnydvsssYUpipBCmFDZlNREyAkgOcLlL099Y5fAO8l2gOLyKs=
3722
on:
3823
tags: true
39-
repo: wannaphongcom/pythainlp
40-
#condition: $TOXENV == py35
24+
repo: pythainlp/pythainlp

CONTRIBUTING.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ We use the famous [gitflow](http://nvie.com/posts/a-successful-git-branching-mod
3636

3737
# Discussion
3838

39-
- Facebook group: https://www.facebook.com/groups/thainlp
40-
- GitHub issues: https://github.com/PyThaiNLP/pythainlp/issues
39+
- Facebook group (for Thai NLP Discussion only): https://www.facebook.com/groups/thainlp
40+
- GitHub issues (Problems and suggestions): https://github.com/PyThaiNLP/pythainlp/issues
4141

4242
Happy hacking! (;
4343

@@ -50,6 +50,8 @@ Happy hacking! (;
5050
- Charin Polpanumas
5151
- Peeradej Tanruangporn
5252
- Arthit Suriyawongkul
53+
- Chakri Lowphansirikul
54+
- Pattarawat Chormai
5355

5456
## newmm (onecut), mm, TCC, and Thai Soundex Code
5557
- Korakot Chaovavanich
@@ -59,9 +61,15 @@ Happy hacking! (;
5961

6062
## Docs
6163
- Peeradej Tanruangporn
64+
- Chakri Lowphansirikul
6265

6366
## Maintainers
6467
- Arthit Suriyawongkul
68+
- Wannaphong Phatthiyaphaibun
69+
70+
## Benchmark
71+
- Charin Polpanumas
72+
- Pattarawat Chormai
6573

6674
## Contributors
6775
- See more contributions here https://github.com/PyThaiNLP/pythainlp/graphs/contributors

README.md

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,40 @@
22

33
# PyThaiNLP
44

5-
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/cb946260c87a4cc5905ca608704406f7)](https://www.codacy.com/app/pythainlp/pythainlp_2?utm_source=github.com&utm_medium=referral&utm_content=PyThaiNLP/pythainlp&utm_campaign=Badge_Grade)[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
5+
[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)
6+
[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
67
[![Downloads](https://pepy.tech/badge/pythainlp/month)](https://pepy.tech/project/pythainlp)
8+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
9+
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2FPyThaiNLP%2Fpythainlp.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2FPyThaiNLP%2Fpythainlp?ref=badge_shield)
710
[![Build Status](https://travis-ci.org/PyThaiNLP/pythainlp.svg?branch=develop)](https://travis-ci.org/PyThaiNLP/pythainlp)
811
[![Build status](https://ci.appveyor.com/api/projects/status/9g3mfcwchi8em40x?svg=true)](https://ci.appveyor.com/project/wannaphongcom/pythainlp-9y1ch)
12+
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/cb946260c87a4cc5905ca608704406f7)](https://www.codacy.com/app/pythainlp/pythainlp_2?utm_source=github.com&utm_medium=referral&utm_content=PyThaiNLP/pythainlp&utm_campaign=Badge_Grade)
913
[![Coverage Status](https://coveralls.io/repos/github/PyThaiNLP/pythainlp/badge.svg?branch=dev)](https://coveralls.io/github/PyThaiNLP/pythainlp?branch=dev)
10-
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
11-
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2FPyThaiNLP%2Fpythainlp.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2FPyThaiNLP%2Fpythainlp?ref=badge_shield)
1214

1315
Thai Natural Language Processing in Python.
1416

1517
PyThaiNLP is a Python package for text processing and linguistic analysis, similar to `nltk` but with focus on Thai language.
1618

19+
**News**
20+
21+
>Hello,
22+
>We are conducting a survey for PyThaiNLP’s users and those who are working in the field of Thai NLP.
23+
24+
>We would love to hear your feedback in order to improve the library. Also, we will prioritize for the implementation of new Thai NLP features such as Thai-English Machine Translation, Speech-to-Text, or Text-to-Speech.
25+
26+
>You could take part in this survey via the Google Form shown below:
27+
>https://forms.gle/aLdSHnvkNuK5CFyt9
28+
1729
**This is a document for development branch (post 2.0). Things will break.**
1830

19-
- The latest stable release is [2.0.5](https://github.com/PyThaiNLP/pythainlp/releases)
31+
- The latest stable release is [2.0.7](https://github.com/PyThaiNLP/pythainlp/releases)
2032
- PyThaiNLP 2 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
2133
- [Upgrading from 1.7](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
2234
- [Upgrade ThaiNER from 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
2335
- Python 2.7+ users can use PyThaiNLP 1.6.
36+
- 📫 follow us on Facebook [PyThaiNLP](https://www.facebook.com/pythainlp/)
2437

25-
📫 follow us on Facebook [PyThaiNLP](https://www.facebook.com/pythainlp/)
38+
[![Google Colab Badge](https://badgen.net/badge/Launch%20Quick%20Start%20Guide/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb)
2639

2740
## Capabilities
2841

@@ -96,6 +109,12 @@ Please do fork and create a pull request :)
96109
For style guide and other information, including references to algorithms we use, please refer to our [contributing](https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md) page.
97110

98111

112+
Made with ❤️
113+
114+
We build Thai NLP.
115+
116+
PyThaiNLP team.
117+
99118
# ภาษาไทย
100119

101120
ประมวลภาษาไทยในภาษา Python
@@ -104,9 +123,19 @@ PyThaiNLP เป็นไลบารีภาษาไพทอนเพื่
104123

105124
> เพราะโลกขับเคลื่อนต่อไปด้วยการแบ่งปัน
106125
126+
**ข่าวสาร**
127+
128+
>สวัสดีค่ะ,
129+
130+
>ทางทีมพัฒนา PyThaiNLP อยากขอสอบถามความคิดเห็นของผู้ใช้งาน library PyThaiNLP ปัจจุบัน หรือผู้ที่ทำงานในด้าน NLP ภาษาไทย เพื่อที่เราจะนำไปปรับปรุง library ให้ดียิ่งขึ้น และพัฒนาฟีเจอร์ใหม่ๆ สำหรับ NLP ภาษาไทย เช่น Thai-English Machine Translation, Speech-to-Text หรือ Text-to-Speech
131+
132+
>โดยสามารถตอบแบบสอบถาม ผ่านทาง Google Form ด้านล่างนี้
133+
134+
>https://forms.gle/aLdSHnvkNuK5CFyt9
135+
107136
**เอกสารนี้สำหรับรุ่นพัฒนา อาจมีการเปลี่ยนแปลงได้ตลอด**
108137

109-
- รุ่นเสถียรล่าสุดคือรุ่น [2.0.5](https://github.com/PyThaiNLP/pythainlp/releases)
138+
- รุ่นเสถียรล่าสุดคือรุ่น [2.0.7](https://github.com/PyThaiNLP/pythainlp/releases)
110139
- PyThaiNLP 2 รองรับ Python 3.6 ขึ้นไป
111140
- ผู้ใช้ Python 2.7+ ยังสามารถใช้ PyThaiNLP 1.6 ได้
112141

@@ -178,3 +207,10 @@ $ pip install pythainlp[extra1,extra2,...]
178207
## สนับสนุนและร่วมพัฒนา
179208

180209
คุณสามารถ[ร่วมพัฒนาโครงการนี้](https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md)ได้ โดยการ fork และส่ง pull request กลับมา
210+
211+
212+
สร้างด้วย ❤️
213+
214+
พวกเราสร้าง Thai NLP
215+
216+
ทีม PyThaiNLP

SECURITY.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Security Policy
2+
3+
## Supported Versions
4+
5+
| Version | Supported |
6+
| ------- | ------------------ |
7+
| 2.0.x | :white_check_mark: |
8+
| < 2.0 | :x: |

appveyor.yml

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,11 @@
22
# https://www.lfd.uci.edu/~gohlke/pythonlibs/
33

44
build: off
5+
image: Visual Studio 2015
56

67
environment:
8+
global:
9+
CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\appveyor\\run_with_env.cmd"
710
matrix:
811
# - PYTHON: "C:/Python36"
912
# PYTHON_VERSION: "3.6"
@@ -16,6 +19,7 @@ environment:
1619
PYTHON_ARCH: "64"
1720
ARTAGGER_PKG: "https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger"
1821
PYICU_PKG: "https://www.dropbox.com/s/7t0rrxwckqbgivi/PyICU-2.3.1-cp36-cp36m-win_amd64.whl?dl=1"
22+
DISTUTILS_USE_SDK: "1"
1923

2024
# - PYTHON: "C:/Python37"
2125
# PYTHON_VERSION: "3.7"
@@ -28,23 +32,36 @@ environment:
2832
PYTHON_ARCH: "64"
2933
ARTAGGER_PKG: "https://github.com/wannaphongcom/artagger/tarball/master#egg=artagger"
3034
PYICU_PKG: "https://www.dropbox.com/s/le5dckc3231opqt/PyICU-2.3.1-cp37-cp37m-win_amd64.whl?dl=1"
35+
DISTUTILS_USE_SDK: "1"
3136

3237
init:
3338
- "ECHO %PYTHON% %PYTHON_VERSION% %PYTHON_ARCH%"
3439
# - ps: "ls C:/Python*"
3540

41+
platform:
42+
- x64
43+
3644
install:
3745
- "chcp 65001"
3846
- "set PYTHONIOENCODING=utf-8"
39-
- "%PYTHON%/python.exe --version"
47+
- "%PYTHON%\\python.exe -m pip install wheel"
48+
# - ECHO "Installed SDKs:"
49+
# - ps: "ls \"C:/Program Files/Microsoft SDKs/Windows\""
50+
- IF "%ARCH%"=="32" (call "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" x86) ELSE (ECHO "probably a 64bit build")
51+
- IF "%ARCH%"=="64" (call "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" amd64) ELSE (ECHO "probably a 32bit build")
52+
- '"%VS140COMNTOOLS%\..\..\VC\vcvarsall.bat" %PLATFORM%'
53+
- ps: if (-not(Test-Path($env:PYTHON))) { & appveyor\install.ps1 }
54+
- "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"
55+
- "python --version"
4056
# - "set ICU_VERSION=62"
41-
- "%PYTHON%/python.exe -m pip install --upgrade pip"
42-
- "%PYTHON%/python.exe -m pip install coveralls[yaml]"
43-
- "%PYTHON%/python.exe -m pip install coverage"
44-
- "%PYTHON%/python.exe -m pip install %PYICU_PKG%"
45-
- "%PYTHON%/python.exe -m pip install %ARTAGGER_PKG%"
46-
- "%PYTHON%/python.exe -m pip install -e .[artagger,icu,ipa,ner,thai2fit,deepcut]"
57+
- "pip install --disable-pip-version-check --user --upgrade pip setuptools"
58+
- "pip install coveralls[yaml]"
59+
- "pip install coverage"
60+
- "pip install torch==1.2.0+cpu torchvision==0.4.0+cpu -f https://download.pytorch.org/whl/torch_stable.html"
61+
- "pip install %PYICU_PKG%"
62+
- "pip install %ARTAGGER_PKG%"
63+
- "pip install -e .[full]"
4764

4865
test_script:
49-
- "%PYTHON%/python.exe -m pip --version"
50-
- "%PYTHON%/python.exe -m coverage run --source=pythainlp setup.py test"
66+
- "pip --version"
67+
- "python setup.py test"

bin/pythainlp

100644100755
Lines changed: 26 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,31 @@
1-
#!python3
1+
#!/usr/bin/env python
22
# -*- coding: utf-8 -*-
33

4-
_VERSION = "2.0.5"
5-
64
import argparse
5+
import sys
6+
7+
from pythainlp import cli
8+
9+
10+
parser = argparse.ArgumentParser(
11+
usage="pythainlp namespace command [options]"
12+
)
713

8-
parser = argparse.ArgumentParser()
9-
parser.add_argument("-t", "--text", default=None, help="text", type=str)
10-
parser.add_argument("-seg", "--segment", help="word segment", action="store_true")
11-
parser.add_argument("-c", "--corpus", help="mange corpus", action="store_true")
12-
parser.add_argument("-pos", "--postag", help="postag", action="store_true")
13-
parser.add_argument("-soundex", "--soundex", help="soundex", default=None)
14-
parser.add_argument("-e", "--engine", default="newmm", help="the engine", type=str)
15-
parser.add_argument("-pos-e", "--postag_engine", default="perceptron", help="the engine for word tokenize", type=str)
16-
parser.add_argument("-pos-c", "--postag_corpus", default="orchid", help="corpus for postag", type=str)
17-
args = parser.parse_args()
18-
19-
if args.corpus:
20-
from pythainlp.corpus import *
21-
print("PyThaiNLP Corpus")
22-
temp=""
23-
while temp!="exit":
24-
print("\n1. Install\n2. Remove\n3. Update\n4. Exit\n")
25-
temp=input("Choose 1, 2, 3, or 4: ")
26-
if temp=="1":
27-
name=input("Corpus name:")
28-
download(name)
29-
elif temp=="2":
30-
name=input("Corpus name:")
31-
remove(name)
32-
elif temp=="3":
33-
name=input("Corpus name:")
34-
download(name)
35-
elif temp=="4":
36-
break
37-
else:
38-
print("Choose 1, 2, 3, or 4:")
39-
elif args.text!=None:
40-
from pythainlp.tokenize import word_tokenize
41-
tokens=word_tokenize(args.text, engine=args.engine)
42-
if args.segment:
43-
print("|".join(tokens))
44-
elif args.postag:
45-
from pythainlp.tag import pos_tag
46-
print("\t".join([i[0]+"/"+i[1] for i in pos_tag(tokens, engine=args.postag_engine, corpus=args.postag_corpus)]))
47-
elif args.soundex!=None:
48-
from pythainlp.soundex import soundex
49-
if args.engine=="newmm":
50-
args.engine="lk82"
51-
print(soundex(args.soundex, engine=args.engine))
14+
parser.add_argument(
15+
"namespace",
16+
type=str,
17+
default="",
18+
nargs="?",
19+
help="[%s]" % "|".join(cli.available_namespaces)
20+
)
21+
22+
args = parser.parse_args(sys.argv[1:2])
23+
24+
cli.exit_if_empty(args.namespace, parser)
25+
26+
if hasattr(cli, args.namespace):
27+
namespace = getattr(cli, args.namespace)
28+
namespace.App(sys.argv)
5229
else:
53-
print(f"PyThaiNLP {_VERSION}")
30+
print(f"Namespace not available: {args.namespace}\nPlease run with --help for alternatives")
31+

0 commit comments

Comments
 (0)