Fix a bug that GZipReader#gets may return incomplete line#32
Conversation
See also: ruby/csv#117 (comment) How to reproduce with x.csv.gz in the issue comment: Zlib::GzipReader.open("x.csv.gz") do |rio| rio.gets(nil, 1024) while line = rio.gets(nil, 8192) raise line unless line.valid_encoding? end end Reported by Dimitrij Denissenko. Thanks!!!
|
@nobu What do you think about this? |
|
If the filled size equals the reading size, diff --git a/ext/zlib/zlib.c b/ext/zlib/zlib.c
index f9af18f530e..8b6b802e09d 100644
--- a/ext/zlib/zlib.c
+++ b/ext/zlib/zlib.c
@@ -4198,12 +4198,15 @@ static long
gzreader_charboundary(struct gzfile *gz, long n)
{
char *s = RSTRING_PTR(gz->z.buf);
- char *e = s + ZSTREAM_BUF_FILLED(&gz->z);
- char *p = rb_enc_left_char_head(s, s + n, e, gz->enc);
+ long f = ZSTREAM_BUF_FILLED(&gz->z);
+ int boundary = (f == n);
+ char *e = s + f;
+ char *p = rb_enc_left_char_head(s, s + n - boundary, e, gz->enc);
long l = p - s;
if (l < n) {
n = rb_enc_precise_mbclen(p, e, gz->enc);
if (MBCLEN_NEEDMORE_P(n)) {
+ l += boundary;
if ((l = gzfile_fill(gz, l + MBCLEN_NEEDMORE_LEN(n))) > 0) {
return l;
} |
|
Sorry, I didn't see that there was the patch already. |
Umm, I think that When
When
I think that it's intentional. I think that Anyway, I'm not familiar with zlib code base. If you think that your patch is right approach, could you push your patch. I'm OK with any approach that doesn't return incomplete line. |
Fix: align to the character of |
|
It seems that https://github.com/ruby/ruby/blob/b9f7286fe95827631b11342501e471e5e6f13bbb/io.c#L3751 pp = rb_enc_left_char_head(s, p-1, p, enc);File.open("/tmp/x", "w") do |output|
output.puts("あい")
end
File.open("/tmp/x") do |input|
p input.gets(nil, 4) # This uses the 4th byte (the first byte of "い") not the 5th byte
end |
See also: ruby/csv#117 (comment)
How to reproduce with x.csv.gz in the issue comment:
Reported by Dimitrij Denissenko. Thanks!!!