pyucollate: Unicode sorting in Python made simple.

This library is a modernized version of James K. Taubers pyuca with some small changes in the API.

As the original library it's a Python implementation of the Unicode Collation Algorithm (UCA). It passes 100% of the UCA conformance tests with a variable-weighting setting of Non-ignorable.

What do you use it for?

In short, sorting non-English strings properly.

The core of the algorithm involves multi-level comparison. For example, café comes before caff because at the primary level, the accent is ignored and the first word is treated as if it were cafe. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level.

The Unicode Collation Algorithm and pyuca also support contraction and expansion. Contraction is where multiple letters are treated as a single unit. In Spanish, ch is treated as a letter coming between c and d so that, for example, words beginning ch should sort after all other words beginnings with c. Expansion is where a single letter is treated as though it were multiple letters. In German, ä is sorted as if it were ae, i.e. after ad but before af.

How does it differ from the original library and why did you fork it?

pyuca is a well working python library, but apparently it is no longer actively maintained (see the open PRs and issues). So we decided to create our own fork with some minor changes:

We added some modern python tooling like pytest, ruff or mypy
Type hints were added to all functions and classes.
A sort()-function was added to the collator interface (just a wrapper around Python's sorted(), which uses the Collator to generate sorting-keys).

How to use it

Here is how to use the pyuca module.

pip install install git+https://github.com/SSRQ-SDS-FDS/pyucollate.git

Usage example:

from pyucollate import Collator
c = Collator()

assert c.sort(["cafe", "caff", "café"]) == ["cafe", "caff", "café"]

License

Python code is made available under an MIT license (see LICENSE). allkeys.txt is made available under the similar license defined in LICENSE-allkeys.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
pyucollate		pyucollate
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE-allkeys		LICENSE-allkeys
LICENSE-pyuca		LICENSE-pyuca
MANIFEST.in		MANIFEST.in
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

pyucollate: Unicode sorting in Python made simple.

What do you use it for?

How does it differ from the original library and why did you fork it?

How to use it

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

SSRQ-SDS-FDS/pyucollate

Folders and files

Latest commit

History

Repository files navigation

pyucollate: Unicode sorting in Python made simple.

What do you use it for?

How does it differ from the original library and why did you fork it?

How to use it

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages