Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1615 commits
Select commit Hold shift + click to select a range
67202fe
Revise documentation
hankcs Nov 17, 2021
0a3f3f7
Improve hints for downloading
hankcs Nov 30, 2021
119116b
Use Token as the header for NER/SRL/CON
hankcs Dec 2, 2021
46a838f
Clean up
hankcs Dec 12, 2021
83db2ea
Implement a simple component for word2vec such that it can be loaded …
hankcs Dec 13, 2021
9f6bfaa
Revise documentation
hankcs Dec 20, 2021
283e005
Clean up parsers that are not interesting
hankcs Dec 26, 2021
b88049e
Remove dependency on alnlp
hankcs Dec 27, 2021
ca5e5cd
Remove dependency on bert-for-tf2
hankcs Dec 27, 2021
8a55380
Implement a simple component for fasttext such that it can be loaded …
hankcs Dec 27, 2021
d976584
Revise documentation
hankcs Dec 27, 2021
195071b
Improve logging on empty data file
hankcs Dec 27, 2021
b461e23
Separate resource from datasets and group tf components together with…
hankcs Dec 29, 2021
9d292be
Rename some modules
hankcs Dec 29, 2021
bf5cf56
Fix en pipeline
hankcs Dec 29, 2021
0a3a9b5
Make the pipeline API compatible with both TensorFlow and PyTorch bac…
hankcs Dec 29, 2021
6df38e3
Beta Launch
hankcs Dec 29, 2021
1cbcdfe
Enrich the SpanF1 metric
hankcs Oct 26, 2021
e8044b2
Fix tf memory leak: https://github.com/tensorflow/tensorflow/issues/3…
hankcs Dec 29, 2021
6c02812
Fix loading legacy NgramConvTokenizer components: https://bbs.hankcs.…
hankcs Jan 15, 2022
7c09e39
Release two constituency models
hankcs Jan 18, 2022
096d780
Improve constituency tree visualization
hankcs Jan 18, 2022
dcd70cd
Improve pipeline inputs
hankcs Jan 18, 2022
17fb9f7
Release a CTB9 pos model
hankcs Jan 18, 2022
8aab9ed
Improve the `__repr__` of `Pipe`
hankcs Jan 18, 2022
f2d7b3e
Revise documentation
hankcs Jan 19, 2022
ed9066a
Release a state-of-the-art AMR model for English
hankcs Jan 25, 2022
52337a1
Revise documentation
hankcs Jan 26, 2022
006c323
Check version conflicts for some careless users
hankcs Jan 26, 2022
ca76dc6
Revise documentation
hankcs Jan 27, 2022
00eaae9
Guide the offline user to https://hanlp.hankcs.com/docs/install.html#…
hankcs Jan 27, 2022
81ccd12
Revise documentation
hankcs Jan 27, 2022
d88ce5b
Implement `most_similar` for word2vec
hankcs Jan 30, 2022
b9af710
Implement Masked Language Model for filling blank
hankcs Jan 30, 2022
190fc31
Revise documentation
hankcs Jan 30, 2022
3589f0a
Improve typing for `save_json`
hankcs Jan 31, 2022
92a4e8c
Fix `unk` in word2vec
hankcs Jan 31, 2022
4fc0537
Enable word2vec to load arbitrary txt vector files
hankcs Jan 31, 2022
42de9b6
Mirror Chinese word vectors from https://github.com/Embedding/Chinese…
hankcs Jan 31, 2022
cefbd4f
Block special tokens from the output of MLM
hankcs Jan 31, 2022
171be44
Add version info into word2vec
hankcs Jan 31, 2022
36437b2
Add batch_size to `most_similar`
hankcs Feb 1, 2022
1d19d74
Revise documentation
hankcs Feb 1, 2022
fbdb4e2
Fix visualization `html` in Jupyter
hankcs Feb 1, 2022
7f24146
Allow the user to disable IPYTHON
hankcs Feb 1, 2022
2c78f64
Remove experimental `StructuralAttentionModel`
hankcs Feb 2, 2022
e4039c0
Release a CTB9 tok model
hankcs Feb 4, 2022
997a2e5
Release a CTB9 dep model
hankcs Feb 5, 2022
20cef74
Improve spelling checking
hankcs Feb 5, 2022
3af1ba5
Add a `conll=True` parameter to parsers
hankcs Feb 5, 2022
923d911
Rename to `CTB9_CON_ELECTRA_SMALL` for clarity
hankcs Feb 5, 2022
de161d0
Add a `conll=True` parameter to parsers
hankcs Feb 5, 2022
ca56508
Rename to `CTB9_DEP_ELECTRA_SMALL`
hankcs Feb 5, 2022
f44e132
Fix offset mapping in `transformer_tokenizer`
hankcs Feb 5, 2022
f297d22
Support mengzi PLMs
hankcs Feb 5, 2022
be60143
Release a small fine-grained tok model
hankcs Feb 6, 2022
ffdf1ec
Add `tokenizer_config.json` to ernie-gram mirror
hankcs Feb 7, 2022
9e5de0a
Revise documentation
hankcs Feb 8, 2022
809b21d
Release a SDP model
hankcs Feb 8, 2022
ef6ba7e
Fix invalid escape sequence
hankcs Feb 8, 2022
c7e2b62
Revise documentation
hankcs Feb 10, 2022
d3a9c84
Implement extra features for Transformer tagger
hankcs Feb 14, 2022
5ad3242
Release scripts for PKU Multi-view Chinese Treebank (PMT) 1.0
hankcs Feb 15, 2022
b6dcb2a
Release a `pos` model with radical embeddings
hankcs Feb 15, 2022
ab6dab3
Support conversion from Penn Treebank to Universal Dependencies
hankcs Feb 15, 2022
9f0a92b
Add extra `transform` to SRL components
hankcs Feb 15, 2022
8c8e573
Upgrade tok, ner, con to support traditional Chinese
hankcs Feb 16, 2022
3df5475
Release a dep model trained on PKU Multi-view Chinese Treebank (PMT)
hankcs Feb 16, 2022
1034c6a
Support traditional Chinese tok, pos, ner, dep, con, srl
hankcs Feb 17, 2022
19f0331
Update COARSE_ELECTRA_SMALL_ZH
hankcs Feb 20, 2022
f6e085a
Revise documentation
hankcs Feb 21, 2022
d994f8e
Simplify context layer in span ranking SRL
hankcs Feb 22, 2022
8888ba6
stop mirroring ernie weights
hankcs Feb 23, 2022
7b8d81e
Fix edge cases on empty str
hankcs Feb 23, 2022
86d4651
Check edge cases that `tok` key is not presented in `Document`
hankcs Feb 23, 2022
7924bb5
Revise documentation
hankcs Mar 10, 2022
3a35376
Update semeval16.md
frank1998sj Mar 11, 2022
912fd38
Test on ubuntu-latest, macos-latest, windows-latest
hankcs Mar 11, 2022
a8734c1
Check the type of dict_tags
hankcs Mar 22, 2022
b74d4ff
`tok` and `dict_combine` supports tokens containing spaces
hankcs Mar 22, 2022
eea8992
Clean up
hankcs Mar 22, 2022
e7eb64b
Let `dict_force` match original text directly
hankcs Mar 22, 2022
e843a4e
Test on ubuntu-latest, macos-latest, windows-latest
hankcs Mar 22, 2022
4467a3c
Optimize merging sub-tokens
hankcs Mar 22, 2022
cb6ee6c
Release an ERNIE-GRAM constituency model
hankcs Mar 31, 2022
dd18590
Improve visualization of constituency tree
hankcs Apr 1, 2022
740e37d
Add `language` parameter to `hanlp_restful.HanLPClient.__call__`
hankcs Apr 6, 2022
342a2b2
Revise documentation
hankcs Apr 8, 2022
a808b0f
Fix training CRF in `TaggingNamedEntityRecognition`: https://bbs.hank…
hankcs Apr 12, 2022
26ff093
Release a SOTA joint Chinese-English AMR model
hankcs Apr 13, 2022
19eb659
Release RESTful `abstract_meaning_representation` APIs
hankcs Apr 13, 2022
ea11f96
Use the latest perin-parser
hankcs Apr 14, 2022
15bb02f
Fix edge cases of empty inputs for MTL
hankcs Apr 14, 2022
9b3a786
Release a Chinese MRP model with Mengzi PLM
hankcs Apr 15, 2022
95f8956
Release RESTful `keyphrase_extraction` APIs
hankcs Apr 16, 2022
d90717d
Revise documentation
hankcs Apr 16, 2022
e52dc9f
Fix matching issue caused by `dict_force` in https://github.com/hankc…
hankcs Apr 16, 2022
86c6865
Revise documentation
hankcs Apr 16, 2022
5317922
Fix fasttext URL in `PTB_POS_RNN_FASTTEXT_EN`
hankcs Apr 18, 2022
396568c
Give `PadSequenceDataLoader` the option to skip padding
hankcs Apr 19, 2022
77217d5
Fix `output_spans` with `dict_combine` fix: https://github.com/hankcs…
hankcs Apr 20, 2022
7af9578
Fix the `len` of trie fix: https://github.com/hankcs/HanLP/issues/1728
hankcs Apr 30, 2022
eb35d12
Revise documentation
hankcs Apr 20, 2022
2d5aba2
Improve the robustness of SRL visualization
hankcs Apr 20, 2022
94b88c0
Warn the user that only `zh` supports coarse tokenization
hankcs Apr 25, 2022
c371be1
Release two Electra base tok models trained on CTB9
hankcs Apr 26, 2022
65058e4
Release RESTful `extractive_summarization` APIs
hankcs May 4, 2022
3fb16cc
Revise documentation
hankcs May 5, 2022
11113b8
Release `MSR_TOK_ELECTRA_BASE_CRF` model
hankcs May 7, 2022
672d662
Deprecated `length_field`. Since the memory consumption is dominated …
hankcs May 11, 2022
68e6527
Improve error log
hankcs May 13, 2022
8721373
Support Universal Dependencies 2.10
hankcs May 19, 2022
924c768
Support accelerated PyTorch on macOS M1 chips: https://www.hankcs.com…
hankcs Jun 7, 2022
c78515c
Fix offset generated with dict_force
hankcs Jun 8, 2022
646ca7d
Support 130 languages trained on Universal Dependencies 2.10
hankcs Jun 8, 2022
e1c0700
`max_sequence_length` of `TransformerEncoder` defaults to `max_positi…
hankcs Jun 10, 2022
1c474e3
Revise documentation
hankcs Jun 10, 2022
9b1ed20
Support eval_trn to speed up training
hankcs Jun 3, 2021
17492c1
Fix pruning using max_seq_len
hankcs Jun 11, 2022
606fe2f
Release `xlm-roberta-base-no-space` which has spaces pruned
hankcs Jun 11, 2022
ee3d178
Update two tok models with F1 > 98%
hankcs Jun 12, 2022
044156a
Revise documentation
hankcs Jun 12, 2022
df8308a
Release `xlm-roberta-small-no-space` which has spaces pruned
hankcs Jun 13, 2022
52cf3b5
Activate `dict_force` in `load`
hankcs Jun 15, 2022
584ce7e
Release a multilingual tokenizer trained with MiniLMv2
hankcs Jun 15, 2022
c9317ae
Fix edge cases in split_sentence
hankcs Jun 15, 2022
18275b5
Expose only `split_sentence`
hankcs Jun 15, 2022
65b1e58
`transformer_layers` means number of bottom layers
hankcs Jun 15, 2022
d16773d
Revise documentation
hankcs Jun 16, 2022
3e0d16e
Update `UD_TOK_XLM_SMALL` model
hankcs Jun 16, 2022
4fbd69b
Release a multilingual MTL model trained with MiniLMv2
hankcs Jun 16, 2022
7616dbb
Update two tok models trained on 100m corpora
hankcs Jun 16, 2022
3a3b246
Release mMiniLMv2 with spaces pruned
hankcs Jun 16, 2022
c480492
Prepare to retire `SUBWORD_ENCODING_CWS`
hankcs Jun 16, 2022
9d9f45c
Release multilingual tokenizers trained with MiniLMv2
hankcs Jun 17, 2022
9c8b620
Fix transformer tokenizer on `CIMERLI™`
hankcs Jun 18, 2022
3d01174
Replace `XLM_SMALL` with `MMINILMV2L6`
hankcs Jun 19, 2022
16e32af
Revise documentation
hankcs Jun 19, 2022
5c53c38
Release mMiniLMv2L12 version of MTL on UD210
hankcs Jun 21, 2022
7a71cee
Release a small MTL model trained on our new corpora
hankcs Jun 26, 2022
08724dd
Improve helper functions for `Document`
hankcs Jun 29, 2022
eca5f99
Improve pretty_print style
hankcs Jul 2, 2022
5d14517
Fix cases that a single char gets split into multiple subtokens fix: …
hankcs Jul 6, 2022
2a628ea
Fix decompression on Windows fix: https://github.com/hankcs/HanLP/iss…
hankcs Jul 7, 2022
6dbd5a8
Revise documentation
hankcs Jul 16, 2022
4339b09
avoid cycle between a pair of nodes
hankcs Jul 18, 2022
342481b
Fix sdp that root doesn't get learnt
hankcs Jul 18, 2022
75c95b9
Update the SDP model
hankcs Jul 19, 2022
c3d9015
Disable MPS on M1 due to its poor robustness
hankcs Jul 19, 2022
6d1d6a0
Release `grammatical_error_correction` APIs
hankcs Jul 29, 2022
8fe00ad
Update MUL-MTL model with SDP fixed
hankcs Jul 31, 2022
866d8a6
Ask users to read the doc when they try to set dict for a MTL component
hankcs Aug 11, 2022
ea17ae2
Fix tokenizer evaluation during training fix: https://github.com/hank…
hankcs Aug 11, 2022
3dddcfe
Revise documentation
hankcs Aug 12, 2022
76bde78
Script to train SOTA PKU CWS
hankcs Aug 24, 2022
9210d5a
Fix empty string tokens in `TransformerSequenceTokenizer` fix: https:…
hankcs Aug 26, 2022
0133188
Make empty string consistent between STL and MTL: https://github.com/…
hankcs Aug 27, 2022
87a5e9b
add TSL cert verify switch to support network env behind private TSL …
Sep 15, 2022
125b2b0
Revise documentation
hankcs Sep 15, 2022
5ba95ea
Release language identification APIs which can recognize 176 languages
hankcs Sep 28, 2022
8a4af18
Revise documentation
hankcs Sep 29, 2022
863aa36
Remove `PKU98_POS_ELECTRA_SMALL` as it is replaced by `PKU_POS_ELECTR…
hankcs Oct 6, 2022
dd989cc
allow for zero length dataset
hankcs Nov 3, 2022
7f7d7b6
Fix printing dummy root constituent
hankcs Nov 3, 2022
025f964
Improve how MTL handles empty strings
hankcs Nov 3, 2022
188604d
Test on ubuntu-latest, macos-latest, windows-latest
hankcs Nov 3, 2022
5d46c4b
Improve log
hankcs Nov 4, 2022
266a783
Add dependency on sentencepiece
hankcs Nov 4, 2022
a8ac5e1
Implementation of "Graph Pre-training for AMR Parsing and Generation"
hankcs Dec 7, 2022
651acff
Test on ubuntu-20.04, macos-latest, windows-latest
hankcs Dec 7, 2022
fbfc8fb
Add support for Python 3.10
hankcs Dec 7, 2022
4a7dd62
Upgrade Jackson Databind
hankcs Dec 10, 2022
9d0f880
Release `abstractive_summarization` APIs
hankcs Feb 7, 2023
36aace0
Add a classifier that can directly load HF models
hankcs Feb 18, 2023
52b3d2a
Training script for UD-MTL
hankcs Feb 22, 2023
91ec9fe
Add a regression component that can directly load HF models
hankcs Feb 22, 2023
bf77f50
Release `sentiment_analysis` APIs
hankcs Feb 22, 2023
752b480
Improve error logging
hankcs Mar 10, 2023
af70075
Add `protobuf<3.19` to tf dependencies fix: https://github.com/hankcs…
hankcs Mar 10, 2023
ea9565e
Lazily load TensorFlow fix: https://github.com/hankcs/HanLP/issues/1810
hankcs Mar 25, 2023
5f787ef
Revise the annotations of 863 pos
hankcs Apr 5, 2023
135a399
Use the original tokens to build constituency tree fix: https://githu…
hankcs Apr 8, 2023
5997c59
Support overriding `batch_size` for a constituency parser fix: https:…
hankcs Apr 14, 2023
bccf0fe
Revise documentations
hankcs Apr 14, 2023
47b6881
Fix JSON float vs double error: https://bbs.hanlp.com/t/java/5043
hankcs Apr 15, 2023
81795fc
Support TensorFlow on Python 3.10 fix: https://github.com/hankcs/HanL…
hankcs May 23, 2023
ca70784
Update badges of downloads
hanlpbot Aug 20, 2023
a27c564
Add explaintion of `IC` in CTB guideline: https://bbs.hankcs.com/t/ct…
hanlpbot Sep 19, 2023
698c7e8
Revise documentations
hanlpbot Oct 14, 2023
4c1f457
Unpin the version of `tokenizers` fix: https://github.com/hankcs/HanL…
hanlpbot Oct 14, 2023
9579160
Suppress HF "AdamW is deprecated" warning
hanlpbot Oct 19, 2023
19eb67a
Resize vocab and classifier automatically for taggers
hanlpbot Oct 19, 2023
6bbf1ff
Update resource & mirror sites
hanlpbot Nov 28, 2023
8cbe01e
fix: docs of MSR Tokenization Guidelines of Chinese Text (V5.0)
webSue Nov 26, 2023
0450298
Pin the version of `tokenizers` on macOS Python3.6 to compile it
hankcs Nov 28, 2023
08f91c8
let pipline support copy()
Vela-zz Dec 4, 2023
fa9f6ed
add test case
Vela-zz Dec 4, 2023
beab770
Mirror tokenizers from our file servers fix: https://github.com/hankc…
hankcs Dec 22, 2023
785bed5
Fix tokenization of Korean chars fix: https://github.com/hankcs/HanLP…
hankcs Feb 24, 2024
dfd8d5e
Fix phrasetree fix: https://github.com/hankcs/HanLP/issues/1886
hankcs Mar 23, 2024
b8742df
Test on ubuntu-latest, macos-latest, windows-latest
hankcs Jul 11, 2024
eab05ca
fix:Correcting Documentation Errors
webSue Jul 7, 2024
b28d38b
Test on ubuntu-latest, macos-latest, windows-latest
hankcs Jul 11, 2024
77fe054
Support Python 3.6 on Windows
hankcs Jul 11, 2024
b32d4b9
Update the link to unit tests
hankcs Jul 11, 2024
022d0fb
Parse raw sentences for custom tags, fix: https://github.com/hankcs/H…
hankcs Aug 18, 2024
32ed29b
Specify supported tensorflow versions, fix: https://github.com/hankcs…
hankcs Aug 22, 2024
e64efb8
Support new versions of tensorflow and numpy
hankcs Aug 22, 2024
4280529
Fix extrapolation in relative transformer, fix: https://github.com/ha…
hankcs Sep 8, 2024
036f593
Please cite our EMNLP paper: https://aclanthology.org/2021.emnlp-main…
hankcs Oct 5, 2024
32428a2
Avoid redundant downloading and decompressing across processes
hankcs Oct 8, 2024
5008d7b
Improve the safety of `torch.load` with `weights_only=True`
hankcs Oct 8, 2024
fe47f5a
Move from `pkg_resources` to `packaging`, fix: https://github.com/han…
hankcs Nov 17, 2024
09574c0
Convert `fea` to `feats` in `hanlp_common.document.Document.to_conll`
hankcs Nov 30, 2024
b8a165c
Fix typo
hankcs Nov 30, 2024
1197741
Fix loading issue of fine-tuned NER models
hankcs Dec 2, 2024
c01f4f6
Fix `dep` key of `to_pretty`
hankcs Dec 7, 2024
d42012d
fix the risk of permissions and unpinned dependencies in the workflow
gcanlin Dec 10, 2024
ae3a910
Revert binding to the immutable full sha1 as it may prevent us from g…
hankcs Dec 18, 2024
d8e6f2a
Upgrade GitHub Actions to the latest versions
hankcs Dec 18, 2024
336fd93
Support ModernBERT as a tokenizer
hankcs Dec 21, 2024
36c9e6f
Heuristic to fix lemma for digits
hankcs Dec 21, 2024
77481bd
Replace `FloatTensor` with `sparse_coo_tensor`
hankcs Dec 21, 2024
c583768
Implement `PrependSpace`
hankcs Dec 21, 2024
007f29f
Support coarse tokenization of Japanese
hankcs Dec 23, 2024
75edc60
Update eos model config to support new lines
hankcs Dec 24, 2024
5698f32
Revise documentation
hankcs Dec 25, 2024
3aba649
Release an English MTL model with ModernBERT encoder
hankcs Dec 28, 2024
3202623
TransformList supports config
hankcs Dec 22, 2024
d4660c3
Improve logging
hankcs Dec 29, 2024
749ecf6
Release `2.1.0`
hankcs Dec 29, 2024
7a4b735
Replace `FloatTensor` with `sparse_coo_tensor`
hankcs Jan 7, 2025
3185188
Release `EN_TOK_LEM_POS_NER_SRL_UDEP_SDP_CON_MODERNBERT_LARGE`
hankcs Jan 7, 2025
5c6d17b
Automatically detect file type by `.conllu` extension
hankcs Jan 11, 2025
9529587
Mirror `bert-ancient-chinese`
hankcs Jan 11, 2025
c0d5d92
Skip hidden states averaging for tokenization only models
hankcs Jan 11, 2025
e8bad3f
Release Ancient Chinese tokenization model `KYOTO_EVAHAN_TOK_LZH`
hankcs Jan 12, 2025
9988be6
Implement TransformerTaggingLemmatizer
hankcs Jan 12, 2025
7aeb0fc
Support XPOS for CoNLL-U
hankcs Jan 12, 2025
c1aa002
Display the main pos by default
hankcs Jan 12, 2025
f717be1
Release Ancient Chinese MTL model `KYOTO_EVAHAN_TOK_LEM_POS_UDEP_LZH`
hankcs Jan 13, 2025
0636dc6
Revise documents for Ancient Chinese models
hankcs Jan 13, 2025
2eb94f2
Release Ancient Chinese models for tokenization, lemmatization, part-…
hankcs Jan 13, 2025
a639dbc
Revise document
hankcs Jan 13, 2025
80506cf
Keep the new hyperparameters when fine-tuning a model, fix: https://g…
hankcs Jan 14, 2025
81983df
Test on ubuntu-20.04, macos-latest, windows-latest
hankcs Jan 14, 2025
199f3f3
Support reusing existing model and config for finetuning, fix: https:…
hankcs Jan 15, 2025
5f46aec
Fix stdio redirection, fix: https://github.com/hankcs/HanLP/issues/1958
hankcs Sep 7, 2025
e72db55
Test on ubuntu-latest, macos-latest, windows-latest
hankcs Sep 11, 2025
942ce93
TF is deprecated in Transformers and no longer maintained: https://gi…
hankcs Oct 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
39 changes: 39 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
name: 🐛Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: hankcs

---

<!--
Thank you for reporting a possible bug in HanLP.
Please fill in the template below to bypass our spam filter.
以下必填,否则恕不受理。
-->

**Describe the bug**
A clear and concise description of what the bug is.

**Code to reproduce the issue**
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

```python
```

**Describe the current behavior**
A clear and concise description of what happened.

**Expected behavior**
A clear and concise description of what you expected to happen.

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Python version:
- HanLP version:

**Other info / logs**
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

* [ ] I've completed this form and searched the web for solutions.
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: ⁉️ Need help with HanLP?
url: https://bbs.hankcs.com/
about: Join our multilingual forum and have a free discussion.
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: 🚀Feature request
about: Suggest an idea for this project
title: ''
labels: feature request
assignees: hankcs

---

<!--
Thank you for suggesting an idea to make HanLP better.
Please fill in the template below to bypass our spam filter.
以下必填,否则直接关闭。
-->

**Describe the feature and the current behavior/state.**

**Will this change the current api? How?**

**Who will benefit with this feature?**

**Are you willing to contribute it (Yes/No):**

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Python version:
- HanLP version:

**Any other info**

* [ ] I've carefully completed this form.
38 changes: 38 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!--
Thank you for being interested in contributing to HanLP! You are awesome ✨.
⚠️Changes must be made on dev branch.
-->

# Title of Your Pull Request

## Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

## Type of Change

Please check any relevant options and delete the rest.

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

## Checklist

Check all items that apply.

- [ ] ⚠️Changes **must** be made on `dev` branch instead of `master`
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] My code follows the style guidelines of this project
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have checked my code and corrected any misspellings
86 changes: 86 additions & 0 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
name: Unit Tests

on:
push:
branches: [ "**" ]
pull_request:
branches: [ "**" ]

permissions: read-all

jobs:
build:

runs-on: ${{ matrix.os }}
env:
HANLP_HOME: ${{ github.workspace }}/data
strategy:
fail-fast: false
matrix:
os: [ ubuntu-latest, macos-latest, windows-latest ]
python-version: [ 3.6, 3.7, 3.8, 3.9, '3.10' ]
exclude:
# GHA doesn't list 3.6 and 3.7 for ubuntu-latest
- os: ubuntu-latest
python-version: "3.6"
- os: ubuntu-latest
python-version: "3.7"

# MacOS 14.4.1 for arm64 doesn't support Python < 3.8
- os: macos-latest
python-version: "3.6"
- os: macos-latest
python-version: "3.7"

include:
# MacOS 13 required for Python < 3.8
- os: macos-13
python-version: "3.6"
- os: macos-13
python-version: "3.7"

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
shell: bash
run: |
python -m pip install -e plugins/hanlp_trie
python -m pip install -e plugins/hanlp_common
python -m pip install -e .
python -m pip install pytest

- name: Cache data
uses: actions/cache@v4
with:
path: ${{ env.HANLP_HOME }}
key: hanlp-data

- name: Test with pytest
shell: bash
run: |
pytest tests
pytest plugins/hanlp_trie/tests
deploy:
needs: build
if: github.event_name == 'push' && github.ref == 'refs/heads/master'
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: |
python -m pip install setuptools wheel twine
- name: Deploy to PyPI
run: |
python setup.py sdist bdist_wheel
python -m twine upload dist/*
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
TWINE_REPOSITORY: pypi
Loading