Skip to content

guilhermehuther/string_treatment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Downloads PyPI badge License: MIT

string_treatment is a library for cleaning and adjusting words with inconsistency.

Overview

This library uses string similarity from rapidfuzz to group words with similar spelling into clusters, mapping each word to the most frequent (canonical) form within its cluster.

Since the clustering process may not always be perfectly accurate, the library can generate an interactive graph to help visualize the groupings.

example graph

Installation

Install the latest stable version from PyPI:

pip install string-treatment

Example

See the testing script in the root: test_standardize.py.

About

Library in python for string treatment

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages