Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary check in jsonsl_util_unescape_ex #35

Merged
merged 1 commit into from
Jul 6, 2017

Conversation

ajdavis
Copy link
Contributor

@ajdavis ajdavis commented Jun 30, 2017

Found by a Coverity scan. The Coverity analysis is:

    	cond_between: Condition uescval > 0xDFFF, taking false branch. Now the value of uescval is between 0xD800 and 0xDFFF.
    	cond_at_least: Condition uescval < 0xD800, taking false branch. Now the value of uescval is at least 0xD800.
1414        } else if (uescval < 0xD800 || uescval > 0xDFFF) {
1415            *oflags |= JSONSL_SPECIALf_NONASCII;
1416            out = jsonsl__writeutf8(uescval, out) - 1;
1417
      dead_error_condition: The condition uescval > 0xD7FF must be true.
    	At condition uescval > 0xD7FF, the value of uescval must be between 0xD800 and 0xDFFF.
1418        } else if (uescval > 0xD7FF && uescval < 0xDC00) {
1419            *oflags |= JSONSL_SPECIALf_NONASCII;
1420            last_codepoint = (uint16_t)uescval;
1421            out--;
1422        } else {
1423            UNESCAPE_BAIL(INVALID_CODEPOINT, 2);
1424        }

By the time JSONSL checks if a u-escape value is less than 0xDC00, it
already knows it is greater than 0xD7FF.
@mnunberg
Copy link
Owner

mnunberg commented Jul 2, 2017

Hrm. Should we just delete this branch? If this code was dead until now, then removing this branch would modify behavior. Can you think of some test where this would be true?

@mikedld
Copy link

mikedld commented Jul 2, 2017

I guess the issue here is just a superfluous check.

jsonsl__get_uescape_16 returns values in range [0x0000..0xFFFF]. First condition (uescval < 0xD800 || uescval > 0xDFFF) checks that the value is within [0x0000..0xD7FF] or [0xE000..0xFFFF], second (uescval > 0xD7FF && uescval < 0xDC00) checks that the value is within [0xD800..0xDFFF]. Since second condition is only being executed if the first one fails, we could be certain that value is not within [0x0000..0xD7FF] at this point, hence checking if it's greater than 0xD7FF makes no difference (it's always true).

Of course, if one were to change the behavior of jsonsl__get_uescape_16 to say return -1 (or some other magic value) on error (the return type is currently int), this would then break the code if we remove the check.

@mikedld
Copy link

mikedld commented Jul 2, 2017

I'm slow :( jsonsl__get_uescape_16 already returns -1 on error, so removing the check is not right.

@mikedld
Copy link

mikedld commented Jul 2, 2017

... and -1 is handled above these conditions...

@ajdavis
Copy link
Contributor Author

ajdavis commented Jul 3, 2017

I think the change is correct - that is, the change has no effect on the behavior, because the check that I'm removing is always true. The logic branch as a whole is still useful for any value from 0xD800 and 0xDC00 inclusive.

Here's a summary of the old code:

if (last_codepoint) {
    /* ... */
} else if (uescval < 0xD800 || uescval > 0xDFFF) {
    /* ... */
} else
    /* we only reach this "else" if !(uescval < 0xD800), therefore */
    /* uescval > 0xD7FF, so the first check below is not needed */
if (uescval > 0xD7FF && uescval < 0xDC00) {
    /* uescval between 0xD800 and 0xDC00, this is the start of a surrogate pair */
    /* this branch is still useful! */
} else {
    /* uescval between 0xDC00 and 0xDFFF, apparently that's bad */
    UNESCAPE_BAIL(INVALID_CODEPOINT, 2);
}

In short, deleting the particular clause uescval > 0xD7FF has no effect, which is the fact that Coverity reported to me. However, the second clause uescval < 0xDC00 is useful, and the logic branch it leads to is useful. A test where this would be true is if uescval == 0xDC00.

This branch is tested in a few different ways in unescape.c. For example, this test hits the logic branch:

/* Try with a surrogate pair */
escaped = "\\uD834\\uDD1E";
exp = "\xf0\x9d\x84\x9e";
res = jsonsl_util_unescape(escaped, out_s, strlen(escaped), strtable, &err);
assert(res == 4);
assert(0 == memcmp(exp, out_s, 4));

I just stepped through this test to ensure it hits the branch I changed, no matter whether I keep the unneeded test or not.

Anyway, this doesn't really matter! =) If it's controversial then we can close this PR. My sole motivation is to resolve a Coverity warning in libbson, while keeping libbson's copy of JSONSL in sync with this repository. Alternatively I can just tell Coverity to ignore this warning.

@mnunberg
Copy link
Owner

mnunberg commented Jul 6, 2017

@ajdavis I'm with you here. I think the change makes sense.
@mikedld Thoughts?

As for me, I had to spend about 2-3 days reading the unicode specs to manage to write this code. I remember it being mostly correct at the time =)

@mikedld
Copy link

mikedld commented Jul 6, 2017

@mnunberg yeah please ignore me. Unless someone adds -2 as possible jsonsl__get_uescape_16 return value and forgets to adjust the code here, it's all fine. Asserting the expected uescval range might be a good idea.

@mnunberg
Copy link
Owner

mnunberg commented Jul 6, 2017

I chanced upon this: http://unicodebook.readthedocs.io/unicode_encodings.html#utf-16-surrogate-pairs

If a unit (defined as 16 bits) falls within the surrogate range (0xD800-0xDBFF, 0xDC00-0xDFFF) then it must follow surrogate encoding rules. In other words, a character in those ranges must be a high surrogate without anything preceding it, or a low surrogate with a high surrogate preceding it.

The check > 0xD7FF is necessarily true because the branch before it excludes any non-surrogate units.

I'll merge this. I was mistakenly assuming that the branch was dead, but it's only part of the check which is superfluous.

@mnunberg mnunberg merged commit b204cc0 into mnunberg:master Jul 6, 2017
@ajdavis
Copy link
Contributor Author

ajdavis commented Jul 6, 2017

Thanks to both of you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants