You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is perhaps an interesting case, but not a common one, to optimize.
An "empty-matching regex pattern" permits zero-length input to be matched. For example, a?b?c? and a*b*c* match a, b and c in various forms in the input, but also matches "nothingness" (empty space between characters).
In fact, I bet that most grep users won't use these patterns much, if ever, because every line in the input is returned as a match anyway. With option -o it has some use to find specific matches though, so there is a use case to optimize.
Ugrep has options to control the matching behavior when empty-matching patterns are used. Option -Y permits empty matches, like GNU grep does. So when ugrep is used as a grep replacement (hard/soft link or alias), then option -Y is enabled to emulate GNU grep. This is not efficient at all yet (at this time of writing) with ugrep to match such patterns.
Actually, GNU grep does kind-of strange things with empty-matching patterns. Again, it is not something that most people would worry about, but it's interesting nevertheless:
ggrep 'j?a?v?' Hello.java
<nothing>
but
ggrep 'j*a*v*' Hello.java
<everything>
And with option -o:
ggrep -o 'j?a?v?' tests/Hello.java
<nothing>
but
ggrep -o 'j*a*v*' Hello.java
jav
a
a
a
a
a
v
a
a
For option -o, GNU grep behaves like ugrep (without -Y) to return matches, not everything.
To optimize the default ugrep behavior to reject empty matches for e.g. a*b*c* to get actual matches, such as a, b, c, bbccc, aab and so on, the pattern matching engine can be modified as if it is matching a pattern of length 1, not 0. This will capture the pattern when an a, b, or c is found in the input while skipping over everything else.
The modification is possible in ugrep without too much effort and should improve matching speed for these kind of "empty-matching patterns". I will post a timing comparison using my dev version to demonstrate what we're talking about.
The text was updated successfully, but these errors were encountered:
Note: GNU grep counts all lines with -c in this case, as if matching the pattern '' as was explained above. Matching everything is quick, but not useful. I hope the use of this technique will improve usefulness. When ugrep is aliased to grep then option -Y will match everything like GNU grep. Not so useful, but compatible.
This is perhaps an interesting case, but not a common one, to optimize.
An "empty-matching regex pattern" permits zero-length input to be matched. For example,
a?b?c?
anda*b*c*
matcha
,b
andc
in various forms in the input, but also matches "nothingness" (empty space between characters).In fact, I bet that most grep users won't use these patterns much, if ever, because every line in the input is returned as a match anyway. With option
-o
it has some use to find specific matches though, so there is a use case to optimize.Ugrep has options to control the matching behavior when empty-matching patterns are used. Option
-Y
permits empty matches, like GNU grep does. So when ugrep is used as a grep replacement (hard/soft link or alias), then option-Y
is enabled to emulate GNU grep. This is not efficient at all yet (at this time of writing) with ugrep to match such patterns.Actually, GNU grep does kind-of strange things with empty-matching patterns. Again, it is not something that most people would worry about, but it's interesting nevertheless:
but
And with option
-o
:but
For option
-o
, GNU grep behaves like ugrep (without-Y
) to return matches, not everything.To optimize the default ugrep behavior to reject empty matches for e.g.
a*b*c*
to get actual matches, such asa
,b
,c
,bbccc
,aab
and so on, the pattern matching engine can be modified as if it is matching a pattern of length 1, not 0. This will capture the pattern when ana
,b
, orc
is found in the input while skipping over everything else.The modification is possible in ugrep without too much effort and should improve matching speed for these kind of "empty-matching patterns". I will post a timing comparison using my dev version to demonstrate what we're talking about.
The text was updated successfully, but these errors were encountered: