-
Notifications
You must be signed in to change notification settings - Fork 454
[CDRIVER-6017] Tweak visitor behavior when finding corrupt BSON #2019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[CDRIVER-6017] Tweak visitor behavior when finding corrupt BSON #2019
Conversation
We now invoke `visit_corrupt` in more cases where `bson_visit_all` cannot properly decode elements, rather than just stopping with the boolean error code.
* | ||
* Passing this as a flag has no effect. | ||
*/ | ||
BSON_VALIDATE_CORRUPT = (1 << 30), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse BSON_VALIDATE_UTF8
instead? It already "has no effect" as a behavior control flag and already describes the very UTF-8 validity checks which BSON_VALIDATE_CORRUPT
is meant to describe (appears redundant).
s/BSON_VALIDATE_CORRUPT/BSON_VALIDATE_UTF8/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but VISIT_CORRUPT
is now (as of this PR) also used for when a subdocument is invalid (previously this was also just return true;
, meaning that bson_validate
was also missing this case).
I considered adding a .visit_invalid_utf8
callback, which is potentially useful in general, but there's ambiguity in the semantics of "corruption", because the spec technically says that BSON strings are UTF-8, so a non-utf-8 string could arguably be considered corruption.
@@ -2025,6 +2025,7 @@ bson_iter_visit_all (bson_iter_t *iter, /* INOUT */ | |||
while (_bson_iter_next_internal (iter, 0, &key, &bson_type, &unsupported)) { | |||
if (*key && !bson_utf8_validate (key, strlen (key), false)) { | |||
iter->err_off = iter->off; | |||
VISIT_CORRUPT (iter, data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we take this opportunity to extend test suite coverage of the .visit_corrupt
callback function to account for these new checks? atm there only seems to be a single test case. Perhaps the VALIDATE_TEST
macro can be extended to also check .visit_corrupt
callback behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change results in rejecting inserts that previously succeeded:
bson_t to_insert = BSON_INITIALIZER;
ASSERT (bson_append_utf8 (&to_insert, "v", 1, "a\xFF\b", 3));
bool ok = mongoc_collection_insert_one (coll, &to_insert, NULL, NULL, &error);
ASSERT_OR_PRINT (ok, error);
// Before PR : OK
// After PR : invalid document for insert: corrupt BSON
Write helpers validate UTF-8 by default. However, this validation does not appear documented, and appears to have always been broken.
I would rather remove the UTF-8 validation in write helpers. I expect that would be a performance improvement, and remove the (low?) risk of breaking applications inserting invalid UTF-8.
Notably: UTF-8 key validation appeared to work (and is tested in PHP mongodb/mongo-php-driver#1819 (comment)).
It seems like most drivers do not validate UTF-8 on write. See survey + slack discussion.
Proposal:
- Do not validate UTF-8 values (but validate keys) if
BSON_VALIDATE_UTF8
is not set. - Remove
BSON_VALIDATE_UTF8
from the default write helper flags (since UTF-8 validation never worked and the write helpers did not document expecting UTF-8 validation). - Do validate UTF-8 in
bson_validate
ifBSON_VALIDATE_UTF8
is set (to match documented behavior ofbson_validate
).
This is a minimal fix for some edge cases around BSON iteration/visitation and text validation. A larger change could be made to make the APIs more complete and easier to use, but this is enough to fix a bug related to UTF-8 validation.
We now invoke
visit_corrupt
in more cases wherebson_visit_all
cannot properly decode elements, rather than just stopping with the boolean error code.This also adds an error code to
bson_validate
for encountering corruption. Previously, it would just seterror->code = 0
and the message tocorrupt BSON
, which is counter-intuitive to those that expecterror->code == 0
to indicate success.See also: CDRIVER-4448