Ruby & JRuby gem with fast string edit distance algorithms C implementations with FFI bindings.
- Levenshtein & Damerau Levenshtein distance
- Jaro & Jaro-Winkler distance
- N-Gram distance
Tested on OSX 10.8.2 and Linux 12.10 with
- MRI Ruby 1.9.3 p385
- JRuby 1.7.2 (1.9.3 p327)
Add this line to your application's Gemfile:
gem 'hotwater'
And then execute:
$ bundle
Or install it yourself as:
$ gem install hotwater
Hotwater.levenshtein_distance("abc", "acb") # => 2
Hotwater.damerau_levenshtein_distance("abc", "acb") # => 1
# normalization based on the string sizes
# where an edit on a small string has more weight than on a longer string
Hotwater.normalized_levenshtein_distance("abc", "acb").round(4) # => 0.3333
Hotwater.normalized_damerau_levenshtein_distance("abc", "acb").round(4) # => 0.6667
Hotwater.jaro_distance("martha", "marhta").round(4) # => 0.9444
Hotwater.jaro_winkler_distance("martha", "marhta").round(4) # => 0.9611
# default is bigram
Hotwater.ngram_distance("natural", "contrary").round(4) # => 0.25
# specify trigram
Hotwater.ngram_distance("natural", "contrary", 3).round(4) # => 0.2083
- Fort it
- Install gems
$ bundle install
- Compile lib
$ rake compile
- Run specs
$ rake spec
- Clean compiler generated files
$ rake clean
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
- Some C code from the https://github.com/sunlightlabs/jellyfish project
- N-Gram ported from Apache Lucene 4.0.0 NGramDistance.java
Why Hotwater? as stated in the credits section, some of the C code comes from the jellyfish Python project. Jelly fish made me think right away about New Brunswick beaches where I have been a couple of times in the past years. There is this legend about New Brunswick having warm water beaches. I even saw a tourism promotion TV commercial selling NB as having warm water. This is a lie! :P I never experienced warm water (in the generaly accepted definition) in NB, only lots of jellyfish :D (that being said, I have enjoyed every bit of my visits in New Brunswick and I really do not care about warm water really ;)
Colin Surprenant, @colinsurprenant, http://github.com/colinsurprenant, colin.surprenant@gmail.com
Hotwater is distributed under the Apache License, Version 2.0.