Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not Getting Regex Matches When Other Tools Do Match #2853

Closed
1 task done
patb-stout opened this issue Jul 15, 2024 · 5 comments
Closed
1 task done

Not Getting Regex Matches When Other Tools Do Match #2853

patb-stout opened this issue Jul 15, 2024 · 5 comments

Comments

@patb-stout
Copy link

patb-stout commented Jul 15, 2024

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 14.1.0 (rev e50df40); [features:-simd-accel,+pcre2; simd(compile):+SSE2,-SSSE3,-AVX2; simd(runtime):+SSE2,+SSSE3,+AVX2] from ripgrep-14.1.0-x86_64-pc-windows-msvc.zip

How did you install ripgrep?

Downloaded zip binary and used directly in DOS.

What operating system are you using ripgrep on?

Windows Server 2019 version 1809

Describe your bug.

From command prompt, I get no matches with rg "(".[^"]+,.[^"]+")" ./test.txt

What are the steps to reproduce the behavior?

test.txt is a UTF8 file in DOS (CRLF) line endings. Contents of text file is as follows.
"Mark",""123 Somewhere Ln, 90210"","Blue"
""abc,def"","",""
test

What is the actual behavior?

I get no matches.

What is the expected behavior?

According to regex101.com, ripgrep should have returned the following two matches.
""123 Somewhere Ln, 90210""
""abc,def""

And the capture groups should have returned the following two.
"123 Somewhere Ln, 90210"
"abc,def"

Ultimately, I was trying to use ripgrep to remove doubled up double-quotes around text qualified values. These values are double text qualifed (with double quotes) and I only want them text qualified one time per text value.

Regex Issue
The attached screen capture shows how the regex pattern properly finds between the doubled double quotes and the green shows proper matching of the capture group.

@VladimirMarkelov
Copy link

What is your shell? If you use cmd.exe, it treats ^ as escape characters, so your rg "(".[^"]+,.[^"]+")", likely, turns into rg "(".["]+,.["]+")".

@BurntSushi
Copy link
Owner

Yes, this looks like a shell issue. And regex101 isn't "another tool" comparable to ripgrep. The only way regex101 is relevant is if you've isolated every other possible explanation for the difference. Shell quoting rules is one. But there are many others.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Jul 15, 2024
@patb-stout
Copy link
Author

I used the same regex successfully with sed on Cygwin Linux and also with UltraEdit (with the Perl regex type selected) on Windows, but I do see where the ^ symbol is seen as an escape character on DOS/command prompt (cmd.exe). I will see what I can do to get it working from DOS.

@BurntSushi
Copy link
Owner

I used the same regex successfully

That's just it. You very likely didn't. Because of the shell quoting rules.

@garoto
Copy link

garoto commented Jul 16, 2024

rg (".[^^\"]+,.[^^\"]+") regex.test

Escape both carets with an extra caret and the double-quotes inside the brackets with the C escape char, the backslash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants