Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tre-agrep can drop records that have a match #43

Open
ghost opened this issue Mar 30, 2016 · 2 comments
Open

tre-agrep can drop records that have a match #43

ghost opened this issue Mar 30, 2016 · 2 comments

Comments

@ghost
Copy link

ghost commented Mar 30, 2016

tre-agrep has mysteriously failed to print some records that I know contain a match. I traced the program logic to a point where I knew that tre-grep knew there was a match, but no record printed. I changed this line:

printf("%.*s", record_len, record);

to

fwrite(record, record_len, 1, stdout);

and now it works. Go figure.

I have been programming in C for a long time, but I must confess that I do not think I have any experience with "%.*s" printf conversion. But it seems like it should work.

This is on Kubuntu Linux 15.04.
The original bug was encountered on the tre-agrep program that was installed using the packages supplied by Kubuntu. The "workaround" was applied to the source code as it came from the Debian package, and compiled with GCC and using glibc.

@zoulasc
Copy link

zoulasc commented Mar 30, 2016

probably has NULs in it?

christos

On Mar 29, 2016, at 11:28 PM, Guy-Shaw notifications@github.com wrote:

tre-agrep has mysteriously failed to print some records that I know contain a match. I traced the program logic to a point where I knew that tre-grep knew there was a match, but no record printed. I changed this line:

printf("%.*s", record_len, record);

to

fwrite(record, record_len, 1, stdout);

and now it works. Go figure.

I have been programming in C for a long time, but I must confess that I do not think I have any experience with "%.*s" printf conversion. But it seems like it should work.

This is on Kubuntu Linux 15.04.
The original bug was encountered on the tre-agrep program that was installed using the packages supplied by Kubuntu. The "workaround" was applied to the source code as it came from the Debian package, and compiled with GCC and using glibc.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

<sanitizer.log>

@ghost
Copy link
Author

ghost commented Mar 30, 2016

No. There are no NULs. But, there are bytes with the top bit turned on. The files I have been seeing problems with contain some Unicode code points. This, I believe is the key to why printf("%.*s", ...) prints nothing but fwrite() works fine. A casual reading of the man pages for printf(3) does not say anything about what the "%.*s" conversion is supposed to do with Unicode or with any 8-bit data.

This 8-bit data also accounts for other problems with dropped records. When there are any characters in [\x80-\xff], the call to tre_regnexec() can go off into the weeds and just not find the next delimiter, so it reads in the rest of the file as the last record.

When I use the plain (Wu and Manber) agrep, /usr/bin/agrep with -d '^From ' on a mailbox, I get the correct number of records, but tre-agrep -d '^From ' produces far fewer records. I suspect that this is not because Wu-Manber agrep is so much more sophisticated about Unicode and 8-bit cleanliness, in general; rather, it is because Wu-Manber agrep is old and simplistic and has a "bytes-is-bytes" view of all data, except that it is always line-oriented and has unpleasant limitations due to fixed buffer sizes, etc.

I cannot say that this is a bug in tre-agrep, just a caveat, because the man page says nothing one way or another about Unicode or 8-bit characters. But, I think it would be nice to make tre-agrep be able to find delimiters, even if it does not otherwise support Unicode or any 8-bit data.

-- Guy Shaw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant