Skip to content

Fuzzy name matching #21

@art-w

Description

@art-w

Sherlodoc uses a compressed suffix tree to index the value names and types. However the search doesn't try to correct user typos even though it would be efficient to do so on this index datastructure. (for example the query flter yields no results)

  • The search procedure happens in Db.String_automata.find and could return a list of subtrees to tolerate user typos (e.g. a missing character, a character replaced by another, or a character to remove)
  • Some care is required to ensure the correction produces understandable results... I would probably start with tolerating exactly one typo, on words of sufficient length, to avoid the typo correction being too aggressive and refine this strategy with manual testing :)
  • The Query.Name_cost would likely need adjustments to detect typo-corrected matches (but it should work even without touching this, as it'll assume that the typo-corrected word was found in the documentation comment, which introduces a penalty which will naturally push the result below exact matches with no typo correction)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions