Skip to content

String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...

License

Notifications You must be signed in to change notification settings

dexyk/stringosim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#stringosim

The plan for this package is to have Go implementation of different string distance/similarity functions, like Levenshtein (normalized, weighted, Damerau), Jaro-Winkler, Jaccard index, Euclidean distance, Hamming distance...

Currently it is empty. Work in progress...

##Import and installation

To get the library just run:

go get github.com/dexyk/stringosim

To use the library just import it in your code:

import "github.com/dexyk/stringosim"

##Usage

Currently only Levenshtein and Jaccard string distances are implemented.

####Levenshtein

Levenshtein distance can be calculated with default parameters (use DefaultSimilarityOptions) where cost of insert, delete and substitute operation are 1. You can also use it with other parameters by using SimilarityOptions type. Setting CaseInsensitive to true in SimilarityOptions the comparison will be done without considering character cases.

Example:

fmt.Println(stringosim.Levenshtein([]rune("stringosim"), []rune("stingobim")))

fmt.Println(stringosim.Levenshtein([]rune("stringosim"), []rune("stingobim"),
    stringosim.SimilarityOptions{
        InsertCost:     3,
        DeleteCost:     5,
        SubstituteCost: 2,
}))

fmt.Println(stringosim.Levenshtein([]rune("stringosim"), []rune("STRINGOSIM"),
    stringosim.SimilarityOptions{
        InsertCost:      3,
        DeleteCost:      4,
        SubstituteCost:  5,
        CaseInsensitive: true,
}))

####Jaccard

Jaccard distance can be calculated by setting the size of the n-gram which will be used for comparison. If the size is omitted the default value of 1 will be used.

Example:

fmt.Println(stringosim.Jaccard([]rune("stringosim"), []rune("stingobim")))

fmt.Println(stringosim.Jaccard([]rune("stringosim"), []rune("stingobim"), 2))

fmt.Println(stringosim.Jaccard([]rune("stringosim"), []rune("stingobim"), 3))

About

String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages