Skip to content

Regexp refactoring and optimizations #910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

LaszloLango
Copy link
Contributor

  • Move RegExp bytecode to new files
  • Fix style and comments
  • Implement a RegExp cache
  • Reduce RegExp bytecode size

@LaszloLango LaszloLango added enhancement An improvement ecma builtins Related to ECMA built-in routines style Related to coding style labels Feb 22, 2016

if (ecma_compare_ecma_strings (cached_pattern_str_p, pattern_str_p))
{
return re_cache[*idx];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regexp flags does not affect byte code? e.g. /a/ and /a/i are the same? Even if it is true noew, it might affect future optimizations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not good. This won't work correctly. The flags should be checked as well here. The first 2 bytes are different on your example. I'll fix it.

@zherczeg
Copy link
Member

What I don't understand is how a regex disappears from the cache if GC deletes it? And why the cache size is 10? I would prefer a compile time default with a default of 8.

@LaszloLango
Copy link
Contributor Author

@zherczeg, I just figured out a small number. There is no an exact reason why ten. 8 is is good.
The answer for the first question:

    if ((cached_bytecode_p->flags >> ECMA_BYTECODE_REF_SHIFT) > 0)
    { ...  }
    else
    {
      /* mark as free, so it can be overridden if the cache is full */
      free_idx = *idx;
    }

If there is such an element in the cache then it is marked as free and a new compiled bytecode could be inserted there. We can improve this algorithm, because this is a basic implementation. I wanted to make the search and maintain of the cache as simple (and fast) as possible. The cache saves memory in rare situations only, so I think we don't need too complicate algorithm for this, because it can cause big perf regression.

@LaszloLango LaszloLango force-pushed the regexp-refactoring-and-optimizations branch from d9eb662 to 4d02437 Compare February 23, 2016 14:07
@LaszloLango
Copy link
Contributor Author

@zherczeg, I've updated the PR, please check.

&& (cached_bytecode_p->flags >> ECMA_BYTECODE_REF_SHIFT) == 1)
{ /* Only the cache has reference for the bytecode */

ecma_bytecode_deref (cached_bytecode_p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need a newline above.

@zherczeg
Copy link
Member

Do you see improvement with caching? I am still thinking about an "eviction" mechanism, but it could go in a follow-up patch (or move the regex caching into a follow-up patch).

@LaszloLango
Copy link
Contributor Author

The cache only helps in rare situations. I think it is not so common to use the same regex in a file. There are some examples in test262.

  • ch15/15.10/15.10.2/15.10.2.12/S15.10.2.12_A1_T1 failed in non-strict mode
  • ch15/15.10/15.10.2/15.10.2.12/S15.10.2.12_A2_T1 failed in non-strict mode
  • ch15/15.10/15.10.2/15.10.2.12/S15.10.2.12_A3_T1 failed in non-strict mode
  • ch15/15.10/15.10.2/15.10.2.12/S15.10.2.12_A4_T1 failed in non-strict mode
  • ch15/15.10/15.10.2/15.10.2.12/S15.10.2.12_A5_T1 failed in non-strict mode
  • ch15/15.10/15.10.2/15.10.2.12/S15.10.2.12_A6_T1 failed in non-strict mode

Currently we are failing on these tests with out of memory error. The cache helps a bit, but does not solve the whole problem. For example in ch15/15.10/15.10.2/15.10.2.12/S15.10.2.12_A1_T1 there are 64 checks. Without the cache it bails out on the 47th check, but with the cache it only bails out on the 53th.

What do you mean with "eviction" mechanism? Like a FIFO algorithm, when the cache is full? In this case I'm still thinking that only slows down the engine, but won't has enough gain to worth it.

@zherczeg
Copy link
Member

Yes, I think a FIFO eviction mechanism could be good. But I think this is enough for one patch.
So LGTM. But please improve the caching in a follow-up patch.

Move RegExp bytecode functions to a separate file.
Optimize bytecode lenght on character matching.
Implement a basic RegExp cache to optimize memory
usage on duplicated RegExp in JS files. Also fix
minor style issues and add missing comments. Improve
existing comments.

JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com
@LaszloLango LaszloLango force-pushed the regexp-refactoring-and-optimizations branch from 4d02437 to 2c72bb1 Compare March 1, 2016 12:54
@LaszloLango LaszloLango merged commit 2c72bb1 into jerryscript-project:master Mar 1, 2016
@LaszloLango LaszloLango deleted the regexp-refactoring-and-optimizations branch March 2, 2016 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecma builtins Related to ECMA built-in routines enhancement An improvement style Related to coding style
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants