spacy-och

the Old Chinese (och) language for the spaCy NLP library.

installation

requires spacy v3.

$ pip install spacy-och

usage

this package currently doesn't include trained models and is intended for basic NLP usage only, via nlp.blank(). it tokenizes texts by character and supports the Token.like_num and Token.is_stop attributes.

>>> import spacy
>>> nlp = spacy.blank("och")
>>> from spacy_och.examples import sentences
>>> doc = nlp(sentences[0])
>>> doc.text
子曰：「上下无常，非為邪也。進退无恆，非離群也。君子進德脩業、欲及時也，故无咎。」
>>> [t for t in doc if t.is_stop] # all stop words
[曰, ：, 非, 也, 。, 非, 也, 。, 、, 欲, 及, 也, 故, 。]

more functionality is coming soon!

developing

after cloning the repository:

$ pip install -e ".[dev]"
$ pre-commit install

building

build a source archive and distribution for a release:

$ rm -rf dist/*
$ python -m build

publish the release on test PyPI (useful for making sure everything worked):

$ python -m twine upload --repository testpypi dist/*

if everything looks ok, upload to the real PyPI:

$ python -m twine upload dist/*

license

code is licensed under the MIT license. some lookups data is derived from files licensed under the unicode data files and software license.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
src/spacy_och		src/spacy_och
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spacy-och

installation

usage

developing

building

license

About

Releases

Contributors 2

Languages

License

direct-phonology/spacy-och

Folders and files

Latest commit

History

Repository files navigation

spacy-och

installation

usage

developing

building

license

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 2

Languages