Skip to content

Commit a6a7717

Browse files
authored
Merge pull request #180 from PyThaiNLP/dev
PyThaiNLP 2.0
2 parents ab79eab + 4094632 commit a6a7717

File tree

161 files changed

+140719
-4174
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

161 files changed

+140719
-4174
lines changed

.gitignore

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ var/
2626
*.egg
2727

2828
# PyInstaller
29-
# Usually these files are written by a python script from a template
30-
# before PyInstaller builds the exe, so as to inject date/other infos into it.
29+
# Usually these files are written by a python script from a template
30+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
3131
*.manifest
3232
*.spec
3333

@@ -58,6 +58,11 @@ target/
5858

5959
# Jupyter Notebook
6060
.ipynb_checkpoints
61+
Untitled*.ipynb
62+
63+
# IDE files
64+
.idea
65+
.vscode
6166

6267
# macOS generated files
6368
.DS_Store
@@ -66,6 +71,8 @@ target/
6671
.Spotlight-V100
6772
.Trashes
6873

74+
# Document generator temporary files
75+
docs/_build/
6976

7077
\.idea/codeStyles/
7178

.travis.yml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,17 @@
33

44
language: python
55
python:
6-
- "3.4"
7-
- "3.5"
86
- "3.6"
7+
8+
# workaround to make boto work on travis
9+
# from https://github.com/travis-ci/travis-ci/issues/7940
10+
before_install:
11+
- sudo rm -f /etc/boto.cfg
12+
913
# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
1014
install:
11-
- pip install -r requirements-travis.txt
15+
- pip install -r requirements.txt
16+
- pip install .[artagger,icu,ipa,ner,thai2fit,deepcut]
1217
- pip install coveralls
1318

1419
os:

CONTRIBUTING.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,21 +23,22 @@ We use the famous [gitflow](http://nvie.com/posts/a-successful-git-branching-mod
2323
- Write tests for your new features (please see "Tests" topic below);
2424
- Always remember that [commented code is dead
2525
code](http://www.codinghorror.com/blog/2008/07/coding-without-comments.html);
26-
- Name identifiers (variables, classes, functions, module names) with readable
27-
names (`x` is always wrong);
26+
- Name identifiers (variables, classes, functions, module names) with meaningful
27+
and pronounceable names (`x` is always wrong);
2828
- When manipulating strings, use [Python's new-style
2929
formatting](http://docs.python.org/library/string.html#format-string-syntax)
3030
(`'{} = {}'.format(a, b)` instead of `'%s = %s' % (a, b)`);
3131
- All `#TODO` comments should be turned into issues (use our
32-
[GitHub issue system](tps://github.com/wannaphongcom/pythainlp/));
32+
[GitHub issue system](https://github.com/PyThaiNLP/pythainlp/));
3333
- Run all tests before pushing (just execute `tox`) so you will know if your
3434
changes broke something;
35+
- All source code and all text files should be ended with one empty line. This is [to please git](https://stackoverflow.com/questions/5813311/no-newline-at-end-of-file#5813359) and also [to keep up with POSIX standard](https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline).
3536

3637

3738
# Discussion
3839

3940
- Facebook group: https://www.facebook.com/groups/thainlp
40-
- GitHub issues: https://github.com/wannaphongcom/pythainlp/issues
41+
- GitHub issues: https://github.com/PyThaiNLP/pythainlp/issues
4142

4243
Happy hacking! (;
4344

@@ -54,14 +55,14 @@ Happy hacking! (;
5455
## newmm (onecut), mm, TCC, and Thai Soundex Code
5556
- Korakot Chaovavanich
5657

57-
## Thai2Vec & ulmfit
58+
## thai2fit & ULMFiT
5859
- Charin Polpanumas
5960

6061
## Docs
6162
- Peeradej Tanruangporn
6263

6364
## Contributors
64-
- See more contributions here https://github.com/wannaphongcom/pythainlp/graphs/contributors
65+
- See more contributions here https://github.com/PyThaiNLP/pythainlp/graphs/contributors
6566

6667

6768
# References

README-pypi.md

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
![PyThaiNLP Logo](https://avatars0.githubusercontent.com/u/32934255?s=200&v=4)
22

3-
# PyThaiNLP 1.7
3+
# PyThaiNLP 2.0
44

55
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/cb946260c87a4cc5905ca608704406f7)](https://www.codacy.com/app/pythainlp/pythainlp_2?utm_source=github.com&utm_medium=referral&utm_content=PyThaiNLP/pythainlp&utm_campaign=Badge_Grade)[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
66
[![Build Status](https://travis-ci.org/PyThaiNLP/pythainlp.svg?branch=develop)](https://travis-ci.org/PyThaiNLP/pythainlp)
@@ -10,32 +10,60 @@
1010

1111
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.
1212

13-
PyThaiNLP features include Thai word and subword segmentations, soundex, romanization, part-of-speech taggers, and spelling corrections.
13+
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.
1414

15-
## What's new in version 1.7 ?
15+
📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see [From PyThaiNLP 1.7 to PyThaiNLP 2.0](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
1616

17-
- Deprecate Python 2 support
18-
- Refactor pythainlp.tokenize.pyicu for readability
19-
- Add Thai NER model to pythainlp.ner
20-
- thai2vec v0.2 - larger vocab, benchmarking results on Wongnai dataset
21-
- Sentiment classifier based on ULMFit and various product review datasets
22-
- Add ULMFit utility to PyThaiNLP
23-
- Add Thai romanization model thai2rom
24-
- Retrain POS-tagging model
17+
📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see [Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
18+
19+
📫 follow us on Facebook [Pythainlp](https://www.facebook.com/pythainlp/)
20+
21+
## What's new in version 2.0 ?
22+
23+
- New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
24+
- Terminate Python 2 support. Remove all Python 2 compatibility code.
25+
- Remove old, obsolated, deprecated, and experimental code.
26+
- Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
27+
- ThaiNER 1.0
28+
- Remove sentiment analysis
2529
- Improved word_tokenize (newmm, mm) and dict_word_tokenize
26-
- Documentation added
30+
- Improved POS-tagging
31+
- More and improved examples
32+
- see [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118)
2733

2834
## Install
2935

36+
For stable version:
37+
3038
```sh
3139
pip install pythainlp
3240
```
3341

42+
For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:
43+
44+
```
45+
pip install pythainlp[extra1,extra2,...]
46+
```
47+
48+
where extras can be
49+
50+
- `artagger` (to support artagger part-of-speech tagger)*
51+
- `deepcut` (to support deepcut machine-learnt tokenizer)
52+
- `icu` (for ICU support in transliteration and tokenization)
53+
- `ipa` (for International Phonetic Alphabet support in transliteration)
54+
- `ml` (to support fastai 1.0.22 ULMFiT models)
55+
- `ner` (for named-entity recognizer)
56+
- `thai2fit` (for Thai word vector)
57+
- `thai2rom` (for machine-learnt romanization)
58+
- `full` (install everything)
59+
3460
**Note for Windows**: `marisa-trie` wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie
3561
Install it with pip, for example: `pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl`
3662

3763
## Links
3864

39-
- Docs: https://thainlp.org/pythainlp/docs/1.7/
65+
- User guide : [English](https://colab.research.google.com/drive/1MQ10D1mJC5r1vQAHcj4ShoRS14vz8ZF-) , [ภาษาไทย](https://colab.research.google.com/drive/1rEkB2Dcr1UAKPqz4bCghZV7pXx2qxf89)
66+
- Docs: https://thainlp.org/pythainlp/docs/2.0/
4067
- GitHub: https://github.com/PyThaiNLP/pythainlp
4168
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
69+
- Facebook : [Pythainlp](https://www.facebook.com/pythainlp/)

0 commit comments

Comments
 (0)