Light-weight tool for normalizing whitespace and accurately tokenizing words (no regex). Multiple natural languages supported. Useful for scrapping, machine learning, and data analysis.
-
Updated
Oct 29, 2021 - Python
Light-weight tool for normalizing whitespace and accurately tokenizing words (no regex). Multiple natural languages supported. Useful for scrapping, machine learning, and data analysis.
Add a description, image, and links to the noregex topic page so that developers can more easily learn about it.
To associate your repository with the noregex topic, visit your repo's landing page and select "manage topics."