Skip to content

This library is a simple fuzzy search with unicode normalization and an arbitrary score system.

License

Notifications You must be signed in to change notification settings

eregnier/sffuzzy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple fast fuzzy

Fuzzy library written in go.

This library is a simple fuzzy search with unicode normalization and an arbitrary score system.

Test the library

git clone https://github.com/eregnier/sffuzzy
cd ssfuzzy
make test-trace

Usage

go get github.com/eregnier/sffuzzy

Usage samples are in test.go

A minimal usage code below:

  //One shot search
  names := []string{"super man", "super noel", "super du"}
  results := fuzzy.SearchOnce("perdu", &names, fuzzy.Options{Sort: true, Limit: 5, Normalize: true})
  //Use search cache for performance
  names := []string{"super man", "super noel", "super du"}
  options := fuzzy.Options{Sort: true, Limit: 5, Normalize: true}
  cacheTargets := fuzzy.Prepare(&names, options)
  results := fuzzy.Search("perdu", cacheTargets, options)

Options

  options := fuzzy.Options{Sort: true, Normalize: true, Limit: 10}

This options structure have the following options

Prop Type Description
Sort bool Orders result depending on results score
Normalize bool Handles searches in texts with special characters. Make search more flexible / less strict
Limit int Define how many results are kept un search return value

Performances

The given sample file is a flat csv loaded as string list.

Multi thread performances are worse with my code, so I reverted

On a AMD 3600x and on a single core I fuzzy search a text in 40 ms in SearchOnce mode

When I build cache with a Prepare(data *[]string), and then I run a Search, the prepare takes about 36ms and the Search about 4ms on the data sample of ~155K lines for 320Ko.

Execution

The following code test.go

Have the following output

2020/04/29 02:07:57 TestMinimalSearch &{[{super du 8 5 1} {super man 3 3 4} {super noel 3 3 5}] 8}
2020/04/29 02:07:57 TestMinimalSearchCache &{[{super du 8 5 1} {super man 3 3 4} {super noel 3 3 5}] 8}
2020/04/29 02:07:57  + Cache search, first search is slower.
2020/04/29 02:07:57  🕑 Duration: 9.435114ms
2020/04/29 02:07:57  + Cached searches
2020/04/29 02:07:57  🕑 Duration: 3.883137ms
2020/04/29 02:07:57 [{San Francisco;United States 13 10 16} {South San Francisco;United States 12 10 22} {St. Francis;United States 12 10 19}]
2020/04/29 02:07:57  🕑 Duration: 2.100023ms
2020/04/29 02:07:57 [{Mumbai;India 11 6 0} {Mumbwa;Zambia 8 6 6} {Mount Gambier;Australia 8 6 16}]
2020/04/29 02:07:57  🕑 Duration: 3.992907ms
2020/04/29 02:07:57 [{Hong Kong;Hong Kong 16 8 0} {Xiangkhoang;Laos 13 8 3} {Mokhotlong;Lesotho 12 8 7}]
2020/04/29 02:07:57  🕑 Duration: 2.312343ms
2020/04/29 02:07:57 [{Agadez;Niger 11 6 0} {Várzea Grande;Brazil 8 6 7} {Altagracia de Orituco;Venezuela 8 6 21}]
2020/04/29 02:07:57  🕑 Duration: 2.034594ms
2020/04/29 02:07:57 [{Palmas;Brazil 10 5 0} {La Palma;Panama 10 5 0} {Las Palmas de Gran Canaria;Spain 10 5 0}]
2020/04/29 02:07:57  🕑 Duration: 4.029126ms
2020/04/29 02:07:57 [{Sucre;Bolivia 20 12 0} {Quime;Bolivia 13 7 0} {Villazón;Bolivia 13 7 0}]
2020/04/29 02:07:57  🕑 Duration: 3.910717ms
2020/04/29 02:07:57 [{Ibb;Yemen 16 8 0} {Ibb;Yemen 16 8 0} {Dhamār;Yemen 11 5 0}]
2020/04/29 02:07:57  🕑 Duration: 3.826816ms
2020/04/29 02:07:57 [{West View;United States 16 8 0} {Westview;United States 16 8 0} {Viera West;United States 14 8 3}]
2020/04/29 02:07:57  + Search all at once
2020/04/29 02:07:57  🕑 Duration: 44.714400ms
2020/04/29 02:07:57 Print plain unmarshaled json results
2020/04/29 02:07:57 [
  {
    "target": "Ōsaka;Japan",
    "score": 13,
    "matchCount": 10,
    "typos": 1
  },
  {
    "target": "Northwest Harborcreek;United States",
    "score": 5,
    "matchCount": 5,
    "typos": 29
  },
  {
    "target": "Oshakati;Namibia",
    "score": 5,
    "matchCount": 5,
    "typos": 11
  },
  {
    "target": "Colombo;Sri Lanka",
    "score": 5,
    "matchCount": 5,
    "typos": 11
  },
  {
    "target": "Coxsackie;United States",
    "score": 5,
    "matchCount": 5,
    "typos": 17
  }
]
PASS
ok  	_/home/utopman/sources/sffuzzy	0.083s

The cli

There is a simple cli bundled with this library which usage is in the executable help

go run cli.go --help
Usage: cli [--limit LIMIT] [--sort] [--normalize] SEARCH

Positional arguments:
  SEARCH                 Search terms to find in given data

Options:
  --limit LIMIT, -l LIMIT
                         Results limit, use -1 for no limit [default: 10]
  --sort, -s             Whether or not results are sorted [default: true]
  --normalize, -n        normalize search string and data string for searching. It fuzzy search with no accents/special characters [default: true]
  --help, -h             display this help and exit

Here is a sample about how to use it from this repository sources

head ../sample.csv | go run cli.go -s=1 -l=3 "kol"
[
  {
    "target": "Kolkata;India",
    "score": 8,
    "matchCount": 3,
    "typos": 0
  },
  {
    "target": "Tokyo;Japan",
    "score": 2,
    "matchCount": 2,
    "typos": 7
  },
  {
    "target": "Mexico City;Mexico",
    "score": 2,
    "matchCount": 0,
    "typos": 0
  }
]

It just takes a flat newline separated database to query in stdin. Then it returns the json results as raw indented json depending on given query string and parameters

Licence

MIT

About

This library is a simple fuzzy search with unicode normalization and an arbitrary score system.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published