-
Notifications
You must be signed in to change notification settings - Fork 683
Regexp refactoring and optimizations #910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regexp refactoring and optimizations #910
Conversation
LaszloLango
commented
Feb 22, 2016
- Move RegExp bytecode to new files
- Fix style and comments
- Implement a RegExp cache
- Reduce RegExp bytecode size
|
||
if (ecma_compare_ecma_strings (cached_pattern_str_p, pattern_str_p)) | ||
{ | ||
return re_cache[*idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regexp flags does not affect byte code? e.g. /a/ and /a/i are the same? Even if it is true noew, it might affect future optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not good. This won't work correctly. The flags should be checked as well here. The first 2 bytes are different on your example. I'll fix it.
What I don't understand is how a regex disappears from the cache if GC deletes it? And why the cache size is 10? I would prefer a compile time default with a default of 8. |
@zherczeg, I just figured out a small number. There is no an exact reason why ten. 8 is is good. if ((cached_bytecode_p->flags >> ECMA_BYTECODE_REF_SHIFT) > 0)
{ ... }
else
{
/* mark as free, so it can be overridden if the cache is full */
free_idx = *idx;
} If there is such an element in the cache then it is marked as free and a new compiled bytecode could be inserted there. We can improve this algorithm, because this is a basic implementation. I wanted to make the search and maintain of the cache as simple (and fast) as possible. The cache saves memory in rare situations only, so I think we don't need too complicate algorithm for this, because it can cause big perf regression. |
d9eb662
to
4d02437
Compare
@zherczeg, I've updated the PR, please check. |
&& (cached_bytecode_p->flags >> ECMA_BYTECODE_REF_SHIFT) == 1) | ||
{ /* Only the cache has reference for the bytecode */ | ||
|
||
ecma_bytecode_deref (cached_bytecode_p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need a newline above.
Do you see improvement with caching? I am still thinking about an "eviction" mechanism, but it could go in a follow-up patch (or move the regex caching into a follow-up patch). |
The cache only helps in rare situations. I think it is not so common to use the same regex in a file. There are some examples in test262.
Currently we are failing on these tests with out of memory error. The cache helps a bit, but does not solve the whole problem. For example in What do you mean with "eviction" mechanism? Like a FIFO algorithm, when the cache is full? In this case I'm still thinking that only slows down the engine, but won't has enough gain to worth it. |
Yes, I think a FIFO eviction mechanism could be good. But I think this is enough for one patch. |
Move RegExp bytecode functions to a separate file. Optimize bytecode lenght on character matching. Implement a basic RegExp cache to optimize memory usage on duplicated RegExp in JS files. Also fix minor style issues and add missing comments. Improve existing comments. JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com
4d02437
to
2c72bb1
Compare