fix(commons): fix metadata regular expressions #1065
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
The PR fixes Polynomial regular expression used on uncontrolled data code scan error.
The expression in question is
/@?allure.label.(?<name>[^\s]+?)[:=](?<value>[^\s]+)/
. Due to how the default regexp engine is implemented in Node.js, it may have polynomial complexity on some inputs.Example
Here, we test the regular expression against a string
allure.label.
followed by multiple occurrences ofallure.label.!
. The engine scans the string from its beginning to the end until it finds out it doesn't match. The backtracking is then engaged, and the engine starts over, this time from the 14th char ('a' in the secondallure.label
).The process repeats until all repetitions of
allure.label
are handled, yielding theO(n*m)
complexity, where n is the total length of the string, and m is the number ofallure.label.
occurrences in the string.This can be illustrated with the following code:
Here, doubling the input length gives us x4 increase in the total parsing time.
Change
In this particular case, the regular expression's performance can be fixed by anchoring the expression to the initial position:
Now, the following two facts are true about this regular expression:
Together, those give us the complexity of
O(n)
because once an attempt to match fails, the engine continues from the next character of the input without backtracking.Behaviour changes
Before:
"foo@allure.id:10"
. A whitespace is mandatory now. We may work this around by anchoring the pattern to the@
character as well, but that would also require us to exclude it from both label names and values. Otherwise, the same polynomial complexity problem occurs.Extra changes
extractMetadataFromString
.Checklist