Fix panic in Unicode wildcard matching #17753
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The reason this bug occurs is that wildcard matching changes the anchor
assertions \A, \Z, and \z, without corresponding changes in regexec.c.
We earlier noticed that all these were being marked SIMPLE, and a
zero-width construct shouldn't really be. But it was considered too
late in the development cycle to make that change. So the plan was to
live with this bug in an experimental feature in 5.32.
But I eventually realized that the change could be effected for just the
wildcard versions, and this commit does that. If there is some issue
with making these non-SIMPLE, it will affect only the wildcard feature,
and those potential bugs are better than a known bug. I also seems
unlikely that this will introduce any bug. What removing SIMPLE does is
merely remove potential optimizations in the handling. The most general
case should work.�; it's doing an improper optimization that gets one
into trouble.
This fixes #17677