Skip to content

GODRIVER-2252 Don't call UnmarshalBSON for pointer values if the BSON field value is empty. #833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

matthewdale
Copy link
Collaborator

@matthewdale matthewdale commented Jan 7, 2022

GODRIVER-2252

When unmarshaling BSON, If the target Go value is a pointer, has a custom UnmarshalBSON function, and the BSON field value is empty, set the Go value to the zero value of the pointer (nil) and don't call UnmarshalBSON. UnmarshalBSON has no way to change the pointer value from within the function (only the value at the pointer address), so it can't set the pointer to nil itself. Since the most common Go value for an empty BSON field value is nil, we set the Go value to nil and don't call UnmarshalBSON. This behavior matches the behavior of the Go "encoding/json" unmarshaler when the target Go value is a pointer, has a custom UnmarshalJSON function, and the JSON field value is null.

See this StackOverflow post for a discussion about the same behavior of the Go "encoding/json" unmarshaler.

Copy link
Contributor

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting bug... Copying the behavior of encoding/json seems good to me.

@matthewdale matthewdale requested a review from benjirewis January 7, 2022 19:18
Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. The test cases with nil, zero, and non-zero pointers look like a good addition.

// field value is "nil", we set "nil" here and don't call UnmarshalBSON. This behavior matches
// the behavior of the Go "encoding/json" unmarshaler when the target Go value is a pointer and
// the JSON field value is "null".
if val.Kind() == reflect.Ptr && len(src) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check on L1505 checks whether the target value can be changed. Should this be moved below that block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea. Will update.

return err
}

// If the target Go value is a pointer and the BSON field value is empty, set the value to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No suggested change, but noting for consideration.

The original example marshaled the value to BSON Null. This seems like the desired behavior when the Go value is a pointer type:

  • nil marhals to BSON Null.
  • BSON Null unmarshals to nil.

From https://bsonspec.org/spec.html there three values that would unmarshal with an empty value:

  • BSON Null
  • Min Key
  • Max Key

I do not know if there is any use case for unmarshaling a Min Key or Max Key.

If Min Key or Max Key were unmarshaled to a Go value with a pointer type, setting the target to nil seems reasonable.

Copy link
Collaborator Author

@matthewdale matthewdale Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great find! I did some testing and found that if you define a custom type with base type primitive.MinKey or primitive.MaxKey, you lose the special BSON marshaler behavior for the primitive.MinKey and primitive.MaxKey types. That means it's not possible to define a custom UnmarshalBSON function for the primitive.MinKey or primitive.MaxKey types while maintaining the special marshaling behavior (you can't define additional functions for the primitive.MinKey or primitive.MaxKey types outside of the primitive package).

For example, if you encode a struct containing a *primitive.MinKey, you get a BSON document with field type "\xff" (the "min key" field type):

type myStruct struct {
	Min *primitive.MinKey
}
func main() {
	b, _ := bson.Marshal(myStruct{Min: &primitive.MinKey{}})
	fmt.Printf("%+q\n", b)
}

// Prints "\n\x00\x00\x00\xffmin\x00\x00"

Notice the "\xff" before "min", which is the encoding for the "min key" type.

If you define a new type like myMinKey based on primitive.MinKey and encode a similar struct, the field is encoded as a "document" type instead of a "min key" type.

type myMinKey primitive.MinKey

type myStruct struct {
	Min *myMinKey
}

func main() {	
	b, _ := bson.Marshal(myStruct{Min: &myMinKey{}})
	fmt.Printf("%+q\n", b)
}

// Prints "\x0f\x00\x00\x00\x03min\x00\x05\x00\x00\x00\x00\x00"

Notice the "\x03" before "min", which is the BSON encoding for a "document" type. Also, the "\x05\x00\x00\x00" after "min" is the encoding for an empty document value, not the zero-length value of a "min key" type.

tldr: While the "min key" and "max key" BSON types use a zero-length value like the "null" type, a user cannot define a custom UnmarshalBSON function for the primitive.MinKey/primitive.MaxKey types while maintaining the special marshaling/unmarshaling behavior, so the change here shouldn't impact those types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, interesting. TIL that a custom type definition of primitive.MinKey or primitive.MaxKey changes the marshal behavior. Thanks for the explanation!

got := reflect.New(tc.sType).Interface()
err := UnmarshalWithRegistry(reg, tc.data, got)
err := UnmarshalWithRegistry(DefaultRegistry, tc.data, got)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this uses the DefaultRegistry, can TestUnmarshalWithRegistry be removed in favor of TestUnmarshal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd actually prefer to leave that there for coverage. Currently it's redundant, but there's nothing that says Unmarshal needs to continue calling UnmarshalWithRegistry with DefaultRegistry. If that ever changed, we'd potentially lose coverage of UnmarshalWithRegistry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Leaving TestUnmarshalWithRegistry seems preferable.

Copy link
Contributor

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mod Kevin's comments

@matthewdale matthewdale requested a review from kevinAlbs January 11, 2022 19:15
Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matthewdale matthewdale merged commit c918d8c into mongodb:master Jan 13, 2022
matthewdale added a commit that referenced this pull request Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants