Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Readme #324

Merged
merged 3 commits into from
Apr 28, 2023
Merged

Updated Readme #324

merged 3 commits into from
Apr 28, 2023

Conversation

dheerajck
Copy link
Contributor

Updated readme

Added examples of WRatio, QRatio and updated score values
Added examples of string preprocessing

README.md Outdated Show resolved Hide resolved
Co-authored-by: Max Bachmann <kontakt@maxbachmann.de>
@dheerajck
Copy link
Contributor Author

Dont you think that the parameter name processor can be confusing, and something like string_preprocessor would be a better name ??

@maxbachmann
Copy link
Member

maxbachmann commented Apr 28, 2023

Dont you think that the parameter name processor can be confusing, and something like string_preprocessor would be a better name ??

I agree it is not a perfect name. The naming stems from fuzzywuzzy using the named argument processor in their process.* APIs. I added the argument to every scorer, which in hindsight wasn't a great idea. It saves the user very little typing:

Levenshtein.distance(s1, s2, processor=utils.default_process)

vs

Levenshtein.distance(utils.default_process(s1), utils.default_process(s2))

in addition the performance difference is pretty small. For short sequences <16 characters the second implementation appears a couple percent faster and for longer ones calling it internally appears to be around 10% faster. So it only makes a difference when working with very fast scorers like Prefix/Postfix/Hamming and long sequences. Even then when comparing multiple sequences your better off using the scorer with the process.* APIs.

For the process.* APIs that is a different story, since:

  1. it saves more typing
  2. I am able to call the preprocessing function in a more performant way

For these reasons I was actually playing with the thought of deprecating the processor argument in scorers.

@maxbachmann maxbachmann merged commit e1bf959 into rapidfuzz:main Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants