Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is --null-data expected to behave like --null when printing file path? #1289

Open
learnbyexample opened this issue May 30, 2019 · 5 comments
Labels
question An issue that is lacking clarity on one or more points.

Comments

@learnbyexample
Copy link

What version of ripgrep are you using?

$ rg --version
ripgrep 11.0.1 (rev 1f1cd9b467)
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)

How did you install ripgrep?

Using https://github.com/BurntSushi/ripgrep/releases/download/11.0.1/ripgrep_11.0.1_amd64.deb

What operating system are you using ripgrep on?

Ubuntu 16.04.1

Describe your question, feature request, or bug.

When --null-data option is used (but without using --null) and filenames are part of output, such file paths are terminated with NUL byte, similar to what --null does. Is this expected behavior?

$ echo 'how are you?' > f1.txt
$ echo 'who are you?' > f2.txt

$ rg --null-data -l 'are' f1.txt f2.txt
f1.txtf2.txt$ 

$ rg --null-data 'are' f1.txt f2.txt
f1.txt1:how are you?

f2.txt1:who are you?

$ rg --null-data -l 'are' f1.txt f2.txt | od -c
0000000   f   1   .   t   x   t  \0   f   2   .   t   x   t  \0
0000016
$ rg --null-data --heading 'are' f1.txt f2.txt | od -c
0000000   f   1   .   t   x   t  \0   h   o   w       a   r   e       y
0000020   o   u   ?  \n  \0  \n   f   2   .   t   x   t  \0   w   h   o
0000040       a   r   e       y   o   u   ?  \n  \0
0000053
@BurntSushi
Copy link
Owner

Interesting! Thanks for filing this.

The problem here is that when ripgrep prints file headings (instead of the full file path for each matching line), it needs to print a new line after the file path. Currently, the new line character used for that is the newline character set by the end user. In this case, the --null-data flag explicitly and intentionally sets the newline character to \x00. So that when printing the heading, the newline character used is \x00 instead of \n. The same logic applies when printing file paths, as the newline character that was set is used there as well.

I wonder what the correct behavior should be. The current behavior is indeed surprising, but consistent. Is the correct behavior to just assume \n as a line terminator when printing headings and file paths, unless --null is given?

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label May 30, 2019
@learnbyexample
Copy link
Author

Personally, I don't have a use case, so I don't have a preference and this could just be added to manual as a note. Couple of points to consider:

  • Filename and line number combination like f1.txt1: looks odd because NUL isn't visible on terminal
  • User can always add --null option to get NUL separated paths, whereas if they want newline separated ones, there isn't a way to do it (as far as I know)

For reference, as far as I've noticed, GNU grep uses -z to change line separator only for input data processing and adding the NUL after each output data match. File names are separated by newline as usual.

@johnnytemp
Copy link

johnnytemp commented Jan 22, 2020

I also experienced the similar issue - with --null-data, but not using -l. In my case, I want to have normal newline (\n) to separate results in the output, unless --null is explicitly given. Otherwise, the output will be garbage - not displayed correctly line-by-line. Even pipeline to wc -l doesn't work for that output. (Current temporary fix: can only pipeline to tr '\0' '\n', which breaks terminal coloring and some behaviors)

Changing the input data as null-terminated not just mean an I/O stream of lines terminated by null. It can also mean we want a textual file not to be separated into lines, so that we could use rg to match only first line, e.g. search the linux shebang pattern by rg --null-data -o '\A#!/.*'. It doesn't always imply user want the output to be garbage with \0 separators.

@johnnytemp
Copy link

johnnytemp commented Jan 22, 2020

About the output problem I mentioned above

I just double checked. Output without pipeline is still line-by-line, it is NOT ALWAYS garbage (or hard-to-read, I should say). I found that in the output of rg --null-data -o '\A#!/.*' (without further pipeline!), the first-character of each line (the filename part) disappeared due to a \r<some-color-escapes>\0 sequence (just before '\n') (my files is in DOS line ending). (Note: '\r' make cursor return home, and '\0' print a blank). Should a "." better don't match both \r & \n by default? (This doesn't harm linux environment)

i.e. expected correct output:

file1.sh 1:#!/bin/bash
file2.sh 1:#!/bin/bash
file3.sh 1:#!/bin/bash

become:

 ile1.sh 1:#!/bin/bash
 ile2.sh 1:#!/bin/bash
 ile2.sh 1:#!/bin/bash

In addition, I've checked that with pipeline or not, the behavior is different. With a further | cat, the output of rg is always single-lined (no \n but only '\r' & '\0'). These two behaviors seems confusing...

rg --version -> 11.0.2

@johnnytemp
Copy link

johnnytemp commented Jan 22, 2020

Info: for grep, '--null-data' or '-z' -> Treat input and output data as sequences of lines, each terminated by a zero byte. Thus, I cannot say rg is wrong, because current behavior align with grep. Just not quite convenient for the mentioned use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

3 participants