Skip to content
guerrillaanalytics edited this page Jun 18, 2015 · 2 revisions

This page gives a quick guide on using the functions. This guide assumes you have already installed the similarity functions onto the database.

Where are the functions installed?

You can find the functions under a schema with the name of the similarity library version e.g. Similarity_<Major version>_<minor version>_<patch version>.

What functions are available?

The following functions are provided in the SimMetrics library and exposed by the Similarity functions. For more detail on the functions and approximate string matching please see here. The various functions perform differently on different problem domains. For example some functions emphasise strings that begin with the same letters. The choice of function depends on your problem domain. For example, a name matching problem (prone to typos in the middle of the word but not the start) is different to say a barcode number matching problem where any difference completely changes the barcode.

Calling the functions

To use these functions in SQL code, simply call the function while specifying its full name. For example: SELECT SIMILARITY_1_1_0.Levenshtein('THE QUICK BROWN FOX','THE QUICK FOX')

This will apply the Levenshtein function to the two strings and return a number representing how similar the strings are.