Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit amount of time/effort interegular will use when creating an example #1266

Merged
merged 2 commits into from
Mar 11, 2023

Conversation

MegaIng
Copy link
Member

@MegaIng MegaIng commented Mar 10, 2023

Use limit_depth when creating an example for a collision to prevent unbounded memory and time usage.

An example for two regex which do have a collision, but which can take a long time to find the example for are:

A: /(?:#( |\t)*(?!if|ifdef|else|elif|endif|define|set|unset|error|exit)[^\n]+|(;|\\/\\/)[^\n]*)/
B: /\\#(?:(?:(?:\\ |\t))+)?(?i:error)/i

Coming from this grammar: https://github.com/adbancroft/TunerStudioIniParser/blob/master/ts_ini_parser/grammars/pre_processor.lark

However, even with this particular change, loading that grammar still takes a few seconds since almost all regex collide with each other.

The reason for many of these collisions is that the grammar is unclear whether or not the keywords are case insensitive, so all these warnings and collisions are actually fully accurate and using #ErRor instead of #error will cause problems for the grammar.

@erezsh
Copy link
Member

erezsh commented Mar 11, 2023

So perhaps we can limit ourselves to checking a fixed amount of collisions, say 8, after which we'll just write something like "8 regex collisions reached; disabling detection. More collisions may exist."

Also, I think we can reduce the example search time to 200ms or similar, it's already quite a bit of time. But maybe for strict-mode put it even higher at 2000ms, because then we only have to call it once.

What do you think?

@MegaIng
Copy link
Member Author

MegaIng commented Mar 11, 2023

Yeah, putting a limit on the total amount of collisions is something I also considered.

For search time, I sadly don't have a good measurement of the parameter I can control -> time it takes. But I could try and see if I can figure out something.

…will be outputted, and limit max time that is being spent searching for examples.
@MegaIng
Copy link
Member Author

MegaIng commented Mar 11, 2023

Ok, I made a few improvements. It now should be relatively reliable, and I think there are a few changes I can do in just interegular to make it so that it manages to provide examples for all collisions in the above mentioned grammar very quickly, where previously it only manage to report one or two. Via testing, I calculated an approximation for max_time-> max_iterations. This should be decently reliable unless the hardware or python implementation is very slow.

@erezsh erezsh merged commit f35df9b into lark-parser:master Mar 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants