Skip to content

Commit

Permalink
first clean version
Browse files Browse the repository at this point in the history
  • Loading branch information
eroux committed Jun 16, 2018
1 parent c1f4306 commit 0690926
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 17 deletions.
27 changes: 25 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,34 @@ This library is an implementation of the [SymSpellCompound](https://github.com/w

# Installation

TODO
```
pip install sympound
```

# Documentation

TODO
If you want a quick complete example, see [example.py](example.py).

### Creating the sympound object

The first step is to create an `sympound` object, the constructor takes two main arguments:
- `distancefun` is a function that will be used to compute the distance between two strings. It takes two arguments (the two strings to compare). You typically want to use a function computing the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance), but you can get more creative and use keyboard distances.
- `maxDictionaryEditDistance` is the maximum distance that will be pre-computed. Increasing this parameter will return more suggestions, but also make the memory print much larger

### adding dictionaries

Then some dictionaries can be added through the `load_dictionary` function, typically taking a file path as argument. The format of the dictionary is typically either a list of words (one per line), or a list of word and frequency (separated by a space). See [example-dict2.txt](example-dict2.txt) for an example.

A lot of computations happen at this stage and adding a large dictionary can easily take more than one minute, so we provide two functions to save the analyzed ductionaries as a pickle: `save_pickle` and `load_pickle`, both taking a file path as argument. Note that the pickled is gzipped.

### Lookup

Once the dictionaries are loaded, you can get suggestions for a string by calling `lookup_compound(str, edit_distance_max)`, where `str` is the string you want to analyze and `edit_distance_max` is the maximum distance you want suggestions for.

The function returns a sorted list of `SuggestItem`s, containing three fields:
- `term` being the suggested fixed string
- `distance` being the distance with the original string
- `count` being the frequency if given in the dictionary

# Copyright

Expand Down
16 changes: 1 addition & 15 deletions sympound/sympound.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,6 @@ def __gt__(self, si2):
def __str__(self):
return self.term + ":" + str(self.count) + ":" + str(self.distance)

def getCount(self):
return self.count

def get_hash_code(self):
return hash(self.term)

def shallow_copy(self):
return copy(self)

class DictionaryItem:
def __init__(self):
self.suggestions = []
self.count = 0

class sympound(object):
def __init__(self, distancefun, initialCapacity=16, maxDictionaryEditDistance=2, prefixLength=7, countThreshold=1, compactLevel=5):
self.distancefun = distancefun
Expand Down Expand Up @@ -271,7 +257,7 @@ def lookup(self, input_string, verbosity, edit_distance_max):
suggestions = []
break
elif verbosity == 0:
if distance < edit_distance_max2 or suggestion_count > suggestions[0].getCount():
if distance < edit_distance_max2 or suggestion_count > suggestions[0].count:
edit_distance_max2 = distance
suggestions[0] = si
continue
Expand Down

0 comments on commit 0690926

Please sign in to comment.