Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update url matching to use levenshtein distance #23

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nuudles
Copy link

@nuudles nuudles commented Mar 21, 2016

This update uses the levenshtein distance algorithm to determine the best matched entry, similar to how the KeePassHTTP plugin performs. This alleviates issues such as "www.facebook.com" not matching an entry whose URL is "facebook.com".

Note that this might produce false positives, particularly if passed a URL that doesn't exist in any of the entries, but in my experiments it works quite well.

This update uses the levenshtein distance algorithm to determine the best matched entry, similar to how the KeePassHTTP plugin performs. This alleviates issues such as "www.facebook.com" not matching an entry whose URL is "facebook.com".
@mstarke
Copy link
Member

mstarke commented Mar 21, 2016

In KeePassHTTP the Levenshtein distance is used to order Login entries but not for actually retrieving them, or do I read the code wrong?. You're using it to actually match, which changes the behaviour drastically. I do not intend to move away from the original implementation and to be honest I did just port @jameshurst implementation without any changes to the actual logic. If I'm wrong, it'll be merged promptly ;)

@mstarke
Copy link
Member

mstarke commented Mar 21, 2016

I just dipped a bit deeper, the sorting is done in KeePassHTTPKit. There might be a good place to implement the levenshtein distance to align KeePassHTTPKit with KeePassHTTP.

@nuudles
Copy link
Author

nuudles commented Mar 21, 2016

Hey @mstarke! Thanks for your prompt response! Looking into it further, I think I got tripped up by their README, where they state:

URL matching: How does it work?

KeePassHttp can receive 2 different URLs, called URL and SubmitURL.

CompareToUrl = SubmitURL if set, URL otherwise

For every entry, the Levenshtein Distance of his Entry-URL (or Title, if Entry-URL is not set) to the CompareToURL is calculated.

Only the Entries with the minimal distance are returned.

Looking at their code It looks like they first filter the entries that match the scheme and URL, then further filter those entries down to only those which match the Levenshtein distance. Their initial filter is a bit more robust than the one in KeePassHTTPKit currently:

                while (listResult.Count == listCount && (origSearchHost == searchHost || searchHost.IndexOf(".") != -1))
                {
                    parms.SearchString = String.Format("^{0}$|/{0}/?", searchHost);
                    var listEntries = new PwObjectList<PwEntry>();
                    db.RootGroup.SearchEntries(parms, listEntries);
                    foreach (var le in listEntries)
                    {
                        listResult.Add(new PwEntryDatabase(le, db));
                    }
                    searchHost = searchHost.Substring(searchHost.IndexOf(".") + 1);

                    //searchHost contains no dot --> prevent possible infinite loop
                    if (searchHost == origSearchHost)
                        break;
                }
                listCount = listResult.Count;

It looks like they do searches with each split of the "." character so for http://sub.my.url.com they do a search for sub.my.url.com, my.url.com, and url.com, then whittle those down into whichever has the minimum Levenshtein distance.

That algorithm would still solve my issue where the passed in url is www.facebook.com, but my database entry has facebook.com as the url.

If it makes sense to you, I'd be happy to implement the algorithm closer to what the KeePassHTTP behavior is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants