-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ncplane_putnstr
doesn't fully respect the maximum number of bytes
#2754
Comments
So static inline int
ncplane_putnstr(struct ncplane* n, size_t s, const char* gclustarr){
return ncplane_putnstr_yx(n, -1, -1, s, gclustarr);
} Going to // Write a series of EGCs to the current location, using the current style.
// They will be interpreted as a series of columns (according to the definition
// of ncplane_putc()). Advances the cursor by some positive number of columns
// (though not beyond the end of the plane); this number is returned on success.
// On error, a non-positive number is returned, indicating the number of columns
// which were written before the error. No more than 's' bytes will be written. See that last line:
nevermind, i see you already did all of this work, lol. i ought read the report all the way through like less of a dumbass. ok, i didn't think let me do some testing and lock down exactly what's up. i agree with you: btw, |
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
int main(void){
if(!setlocale(LC_ALL, "")){
fprintf(stderr, "no locale\n");
return EXIT_FAILURE;
}
const char u[] = "abcd\xaa\xaa\xaa";
size_t r = 0;
mbstate_t m = {0};
do{
wchar_t w;
size_t s = mbrtowc(&w, u + r, MB_CUR_MAX, &m);
fprintf(stderr, "R: %zu S: %zu\n", r, s);
if(s == (ssize_t)-1){
fprintf(stderr, "nope!\n");
return EXIT_FAILURE;
}
r += s;
}while(1);
fprintf(stderr, "yep!\n");
return EXIT_SUCCESS;
}```
```sh
[schwarzgerat](0) $ ./a
R: 0 S: 1
R: 1 S: 1
R: 2 S: 1
R: 3 S: 1
R: 4 S: 18446744073709551615
nope!
[schwarzgerat](1) $ so that's as expected--we get four valid results |
i don't think this is a bug in |
glust: abcd cols: 1 wcs: 1
glust: bcd cols: 1 wcs: 1
glust: cd cols: 1 wcs: 1
utf8_egc_len:107:invalid UTF8: |
|
hrmmm this looks like a bug in |
heh this could be uncharitably interpreted as a security bug, technically |
GOT THAT RET: 1
GOT THAT RET: 1
glust: abcd cols: 1 wcs: 1
post: 97 ret: 0 cols: 1 wcs: 1 offset: 0
GOT THAT RET: 1
GOT THAT RET: 1
glust: bcd cols: 1 wcs: 1
post: 98 ret: 1 cols: 1 wcs: 1 offset: 1
GOT THAT RET: 1
GOT THAT RET: 1
glust: cd cols: 1 wcs: 1
post: 99 ret: 2 cols: 1 wcs: 1 offset: 2
GOT THAT RET: 1
utf8_egc_len:107:invalid UTF8:
post: 100 ret: 3 cols: -1 wcs: 1 offset: 3 |
yep. we're doing an extra cycle through the well, this might be a perf win, too, nice. |
ok, i see what's going on now.
let's say you had u[schwarzgerat](0) $ unicode X̄
U+0058 LATIN CAPITAL LETTER X
UTF-8: 58 UTF-16BE: 0058 Decimal: X Octal: \0130
X (x)
Lowercase: 0078
Category: Lu (Letter, Uppercase); East Asian width: Na (narrow)
Unicode block: 0000..007F; Basic Latin
Bidi: L (Left-to-Right)
U+0304 COMBINING MACRON
UTF-8: cc 84 UTF-16BE: 0304 Decimal: ̄ Octal: \01404
̄
Category: Mn (Mark, Non-Spacing); East Asian width: A (ambiguous)
Unicode block: 0300..036F; Combining Diacritical Marks
Bidi: NSM (Non-Spacing Mark)
Combining: 230 (Above)
[schwarzgerat](0) $ here, we absolutely want to read six bytes to output four (for "abcX̄"). so you're calling with invalid input. since yeah, this is not a notcurses bug. now, whether some function which binds input size is necessary is a valid question... |
ahhh it's another user error on my part, i feel dumb now LOL SORRY >< in my defence it didn't make sense to want to limit the number of bytes output -- maybe egcs or columns but not bytes, at least to me. I was looking through the putstr functions to see if there was something that took a length, so I probably skimmed through the comment too quickly when i saw something that had a size_t parameter x_x i do feel like it would be useful to have such a function tho (one that bounds the number of bytes read)... thoughts? |
yeah i'm not sure this is the most useful bound either. surely we want something that limits the number of columns, assuming we have that? |
hrmm maybe. we start getting into a complex landscape of features, though (this is probably just as valid as my comment above, though) |
i feel like if i'm doing any sort of string manipulation i tend to go for |
I've run into this issue a couple of times. I'm working on a text editor that uses hybrid rope/piece-table like structures a lot which may reference small substrings anywhere in a large body of text. Reads with no explicit bounds are obviously problematic in this sort of situation as anything could follow the desired bytes in memory. Adding terminators requires rearranging the entire memory layout and voids most of the advantages of the design. So, because of this fairly subtle detail, I'm forced to copy pretty much anything used in a notcurses call to a temporary, null terminated, buffer. It's not a big deal, but seems like a rather inflexible API. |
thanks for the feedback, you two. i think i think i'll go ahead and create a bug to add this functionality. i appreciate the comments from real world users! |
if you have any guidelines for which parts of the API to change (or which parts not to change), I'd be willing to make a PR to add this! |
i've assigned #2756 to myself, but if you send in a PR, i'd be delighted to see you do it. run any naming by me, and see my notes there about language wrappers etc. |
Ok, so there's the
putNstr
family of functions that let you specify the maximum number of bytes, right? The issue is, that limit is not really respected. first, the repro:The expected behaviour is that I see
abcd
printed, but what actually happens is that I seeabc
,putnstr
returns-3
, and I get an error message about invalid UTF-8.I managed to trace down the bug:
ncplane_putnstr_yx
callsncplane_putegc_yx
-- without telling it how many bytes it can read from the bufferncplane_putegc_yx
callsutf8_egc_len
with the same problemutf8_egc_len
callsmbrtowc
, and tells it that it can read up toMB_LEN_MAX
bytes from the pointer (which on my system is6
).Of course, we know that by the time the
gcluster
pointer is pointing atd
, it should really only be allowed to read 1 more byte. This behaviour might actually be platform specific, and it might be macOS's implementation of mbrtowc overzealously reading more bytes even though by right it should stop once it sees a byte <= 0x7F.The fix should be fairly straightforward, but might touch quite a bit of code, since we'd need
utf8_egc_len
to know the maximum number of bytes -- and do something likemin(MB_LEN_MAX, n)
.tbh if you think that's an OK solution, i'm willing to PR that. For now my fix is to overallocate 1 byte and manually NULL-terminate my buffers so mbrtowc doesn't go looking where it shouldn't. (the garbage bytes in this case were coming from uninitialised memory that i didn't wanna memset because why bother)
edit: this reproduces on linux too, so I guess it's (surprisingly) not apple fucking up.
Please include the following data:
The text was updated successfully, but these errors were encountered: