PyThaiNLP 5.0 Change Log

**Schedule**
- First Beta release: 5 February 2024
- Production release: 10 February 2024

See [5.0 Milestone](https://github.com/PyThaiNLP/pythainlp/milestone/18).

## What is new?

### License information
- Use SPDX license identifier at the header of source code #876

### Deprecation and other API changes
- Change default NER to thainer-v2 https://github.com/PyThaiNLP/pythainlp/commit/5e97e7c4ebcf68bca64e4f942c8dfe3a5ab2ebc5
- Move `pythainlp.util.is_native_thai` to `pythainlp.morpheme.is_native_thai` https://github.com/PyThaiNLP/pythainlp/commit/524759ac1926fb9837bb9464f0a40cd984af2608

### Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841

### New API
- Add `pythainlp.coref` for Thai coreference resolution #802
- Add `wtpsplit` to sentence segmentation & paragraph segmentation #804  and add `paragraph_threshold` into `paragraph_tokenize()` function #806
- Add word approximation to `pythainlp.soundex.sound` #809 by @wannaphong
- Add `pythainlp.wsd` for Thai word sense disambiguation #818 by @wannaphong
- Add `pythainlp.chat` and `WangChanGLM` to `pythainlp.generate` #819 by @wannaphong
- Add `pythainlp.cls` a param-free classification model #821 by @c4n
- Add `pythainlp.el` entity linking #822 by @wannaphong
- Add `pythainlp.ancient` by @wannaphong in #833
- Add `pythainlp.util.rhyme` by @wannaphong in #849
- Add: `remove_trailing_repeat_consonants` by @konbraphat51 in #862
- Add `pythainlp.util.to_idn` by @wannaphong in #875
- Add `pythainlp.corpus.find_synonyms` by @wannaphong in #890
- Add `pythainlp.util.morse` by @wannaphong in #891
- Add `pythainlp.morpheme` by @wannaphong in #896

### Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892

### Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New `paragraph_tokenize` funtion to split Thai text to a paragraph #804
- Add `paragraph_threshold` into `paragraph_tokenize()` function #806 by @pavaris-pm in
- Add 🪿 Han-solo by @wannaphong in #830
- Fix `newmm` to better handle non-Thai characters in tokens #856 by @konbraphat51
- Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanberta_thai_grammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856


### Tag
- add function for pos tag with transformers by @MpolaarbearM in #857
- Update pos_tag_transformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873

### Chat
- Fixed bug #828

### Translate
- Add small100 to `pythainlp.translate` #815 by @wannaphong

### Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852

### Corpus
-  Add `pythainlp.corpus.thai_orst_words()` Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong
- Add `pythainlp.corpus.thai_wikipedia_titles()` Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51
- Add `pythainlp.corpus.thai_volubilis_words()` Thai word list from Volubilis dictionary #870 by @konbraphat51
- Add `pythainlp.corpus.thai_icu_words()` Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm
- Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882

### Util
- Add `pythainlp.util.encoding` #813 by @wannaphong
- Add `pythainlp.util.spell_words` #817 by @wannaphong
- Add `pythainlp.util.remove_trailing_repeat_consonants()` #862 by @konbraphat51


## New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856 
- @MpolaarbearM made their first contribution in #857


**Full Changelog**: https://github.com/PyThaiNLP/pythainlp/compare/v4.0.2...v5.0.0

## Contributors

<a href="https://github.com/PyThaiNLP/pythainlp/graphs/contributors">
  <img src="https://contributors-img.firebaseapp.com/image?repo=PyThaiNLP/pythainlp" />
</a>

Thanks all the [contributors](https://github.com/PyThaiNLP/pythainlp/graphs/contributors). (Image made with [contributors-img](https://contributors-img.firebaseapp.com))

If you want to contributing to PyThaiNLP, you can read [Contributing to PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyThaiNLP 5.0 Change Log #788

What is new?

License information

Deprecation and other API changes

Dependency

New API

Improve

Tokenizer

Tag

Chat

Translate

Transliterate

Corpus

Util

New Contributors

Contributors

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyThaiNLP 5.0 Change Log #788

Description

What is new?

License information

Deprecation and other API changes

Dependency

New API

Improve

Tokenizer

Tag

Chat

Translate

Transliterate

Corpus

Util

New Contributors

Contributors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions