-
Notifications
You must be signed in to change notification settings - Fork 4
Caching
Nagyfi Richárd edited this page Mar 29, 2018
·
4 revisions
The results of most regular expressions, used by Lara (lara.parser Intents and lara.parser Extract classes) are cached. However, you can still speed the process up by:
- Not calling the same functions multiple times, but saving their output in memory (as the function's outputs themselves are not cached).
- Use
match_set()
instead ofmatch()
for Intents if you only want to find occurrences, but the score is irrelevant for you. - Using raw forms for Intents declaration. The dict passed to the instance of the Intents class during initiation is updated to inherit both the missing default values and the generated regular expressions to match the declared stems:
from lara import parser
short = {
"pelda" : [{"stem":"példa","wordclass":"noun"}]
}
long = str(parser.Intents(short)) # the contents of "short" are updated
print(long)
>>> {"pelda": [{"stem": "p\u00e9lda", "wordclass": "noun", "typo_stem": "pelda", "prefix": "", "typo_prefix": "", "affix": "", "typo_affix": "", "match_stem": true, "ignorecase": true, "boundary": true, "inc": [], "score": 1, "exc": [], "typo_score": 1, "pattern": "(?:p\\\u00e9lda{1,2})(?i)a?i?n?(?:[a\u00e1e\u00e9io\u00f3\u00f6\u0151u\u00fa\u00fc]?[djknmrst])?(?:[abjhkntv]?[a\u00e1e\u00e9io\u00f3\u00f6\u0151u\u00fa\u00fc]?[lgkntz]?)?(?:[ae][kt])?", "typo_pattern": "(?:p(?:eld|led|edl)(?:a))(?i)a?i?n?(?:[aeiou]?[djknmrst])?(?:[abjhkntv]?[aeiou]?[lgkntz]?)?(?:[ae][kt])?"}]}
Generating the missing values and the regular expressions takes time. You can save the output and reuse it for other Intents instances.
from lara import parser
from timeit import default_timer as timer
start = timer()
test = parser.Intents(entities.smalltalk())
print(test.match_set('ráérsz most?'))
end = timer()
print(end-start)
cache = str(test)
start = timer()
test = parser.Intents(cache,True) # __init__(self, new_intents={}, is_raw=False):
print(test.match_set('ráérsz most?'))
end = timer()
print(end-start)
>>> {'are_you_busy'}
>>> 0.12724693508369642
>>> {'are_you_busy'}
>>> 0.0025928461309879802
The speed gain is huge. It is highly recommended to use cached instances if multiple Intents classes are used for your Chatbot solution. Be sure to re-validate these dicts represented as strs whenever you upgrade the Lara library.