Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Pattern init fail in release V3.3.1 #176

Closed
LankWang17 opened this issue Mar 13, 2023 · 13 comments
Closed

Bug: Pattern init fail in release V3.3.1 #176

LankWang17 opened this issue Mar 13, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@LankWang17
Copy link

Such a pattern string will cause Reflex to hung.
Pattern pattern(
"Head\b|"
"\[System\.Net\.WebRequest\]::"
, "i");

If changed "Head\b|" to "Head|", Reflex works well.
it seems strange.

@LankWang17
Copy link
Author

Sorry, the bug string is:
Pattern pattern(
"Head\\b|"
"\\[System\\.Net\\.WebRequest\\]::"
, "i");

The html editor makes "\\" to "\".

@genivia-inc
Copy link
Member

genivia-inc commented Mar 15, 2023

Tried the pattern on MacOS with reflex 3.3.1 and I didn't see anything unusual. Not hanging in the pattern construction. Did you use the pattern to match anything? What OS are you using and what compiler?

Note that the default POSIX matcher does not support \b in the middle of a regex pattern. Only at the begin and/or end of a (sub) pattern. Use a Perl matcher to match \b anywhere. It's OK in your case, since it's at the end of the sub pattern, just before the |. (Just to let you know.)

@LankWang17
Copy link
Author

My env: Windows 10, VS2017, Win32-Debug mode. No defined HAVE_AVX2 and HAVE_AVX512BW.

@LankWang17
Copy link
Author

I used reflex project to test. It hung also. Just like this.
1678924644350

@genivia-inc
Copy link
Member

This helps to I can test with MSVC++ in x86 mode tomorrow. Strange if there is somehow a difference with other OS. Or perhaps it's a 32 bit build problem.

@LankWang17
Copy link
Author

image

@LankWang17
Copy link
Author

OK. Thank you.

@genivia-inc
Copy link
Member

I see the same thing. It is the combination of pattern option "i" with a word boundary marker \b. This exponentially grows the DFA to 2,622,046 nodes and 3,146,356 edges (!) I will further investigate the cause of this. This may take a bit of time to work on. A POSIX DFA can exponentially grow in the worst case for certain regex patterns that are unlikely to be used in practice and that should not happen in this case.

@genivia-inc
Copy link
Member

This works OK as a temporary workaround, placing a \b after the second pattern:

Pattern pattern(
"Head\\b|"
"\\[System\\.Net\\.WebRequest\\]::\\b"
, "i");

@LankWang17
Copy link
Author

This works OK as a temporary workaround, placing a \b after the second pattern:

Pattern pattern(
"Head\\b|"
"\\[System\\.Net\\.WebRequest\\]::\\b"
, "i");

OK, I'll try it.

@genivia-inc
Copy link
Member

genivia-inc commented Mar 17, 2023

I found a subtle bug in the DFA constructor. The bug is that a lot of unused DFA fragments get generated. This specific regex pattern triggers this bug. There is nothing intrinsically wrong with the pattern. It should not explode the DFA in size. Will fix it and release an update after testing.

The bug got introduced in 3.7.0 as an optimization to speed up DFA construction, but the optimization did not work correctly with option "I" for case-insensitive matching in combination with other patterns.

@genivia-inc genivia-inc added the bug Something isn't working label Mar 17, 2023
@genivia-inc
Copy link
Member

Fixed in release 3.3.2.

@genivia-inc
Copy link
Member

Thanks for reporting this performance issue. It is very useful to receive feedback. We test the software a lot. Nevertheless, it can happen that something does not work perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants