Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(basicnode): Increase test coverage for int and map types #454

Merged
merged 6 commits into from
Aug 25, 2022

Conversation

dustmop
Copy link
Contributor

@dustmop dustmop commented Aug 10, 2022

As part of the work I've been doing on go-datalark I encountered a panic happening in AssignNode. The issue was caused by a surprising interaction between union nodes that were built as typed, but were then being copied into struct nodes that were being built by representation. Turns out the fix was just needing to implement the AssembleKey and AssembleValue for the union repr assembler. To document this behavior, better understand the APIs involved here, and generally increase test coverage, I added a number of basic, straight-forward tests to basicnode and bindnode.

go test -coverprofile cover.out ./node/... && go tool cover -html=cover.out

Increases node/basicnode/map.go's coverage from 42.9% to 65.6%.

These tests build a struct with a union field. They all work except when the union is built as a typed node, and the top-level struct is built via representation. The union is copied using representation, and its assembler is incomplete, so the test fails.
@rvagg
Copy link
Member

rvagg commented Aug 11, 2022

I wonder whether we have this wrong at a lower level:

type Animal struct {
	Name   String
	Action Behavior
}
type Behavior union {
	| Movement "movement:"
	| Sound    "sound:"
} representation stringprefix
type Movement string
	schemaType = ts.TypeByName("Animal")
	proto := bindnode.Prototype(nil, schemaType).Representation()

// ...

	animalNode := b.Build()
	actual = printer.Sprint(animalNode)

	expect = `struct<Animal>{
	Name: string<String>{"eel"}
	Action: union<Behavior>{string<Movement>{"swim"}}
}`
	qt.Check(t, expect, qt.Equals, actual)

is kind of surprising to me and I wonder if it's as it should be. If I'm building using the NodePrototype from the Representation() of the TypedPrototype, then I'd assume I should be building purely at the representation level and not have any of this typing except for relevant validation. So I would expect the printer to see a plain Map which contains an "Action" field with the value "movement:swim". But instead we're seeing the printed typed form.

Perhaps bindnode, through taking shortcuts for efficiency reasons, is not properly keeping representation-level things at the representation-level (note in the code there's a lot of switching back and forth from the *Repr forms of things so we can reuse logic).

But then back to what you're trying to fix here:

	schemaType = ts.TypeByName("Animal")
	proto := bindnode.Prototype(nil, schemaType).Representation()
// ...
	schemaType := ts.TypeByName("Behavior")
	typedProto := bindnode.Prototype(nil, schemaType)
	b := typedProto.NewBuilder()
//...
	a := ma.AssembleKey()
	must.NotError(a.AssignString("Movement"))
	a = ma.AssembleValue()
	must.NotError(a.AssignString("swim"))
	must.NotError(ma.Finish())
	behaviorNode := b.Build()
//...
	a = ma.AssembleKey()
	must.NotError(a.AssignString("Action"))
	a = ma.AssembleValue()
	must.NotError(a.AssignNode(behaviorNode))

With the last line being the error being addressed here. But we're mixing up a representation form of Animal with a typed form of Behavior. It's not clear to me that this should be allowed. At the representation level, I would expect Animal to only accept a String value for its "Movement" entry.

So this works with the current code:

	must.NotError(a.AssignNode(behaviorNode.(schema.TypedNode).Representation()))

Which seems appropriate to me. Whether it should be cleverer than that is the question - perhaps it should be able to work with both forms and just figure it out? I'm not sure. I'll have more of a think about this and try and get some input from others.

@dustmop
Copy link
Contributor Author

dustmop commented Aug 11, 2022

With the last line being the error being addressed here. But we're mixing up a representation form of Animal with a typed form of Behavior. It's not clear to me that this should be allowed. At the representation level, I would expect Animal to only accept a String value for its "Movement" entry.

So this works with the current code:

	must.NotError(a.AssignNode(behaviorNode.(schema.TypedNode).Representation()))

Which seems appropriate to me. Whether it should be cleverer than that is the question - perhaps it should be able to work with both forms and just figure it out? I'm not sure. I'll have more of a think about this and try and get some input from others.

Sure thing. For some background on the motivation here, I've been working off of this document: https://github.com/ipld/go-datalark/blob/master/docs/using-complex.md#kitchen-sink-constructed-by-mix-of-levels-explicitly (specifically, this test is covering the behavior from the "Kitchen Sink constructed by Mix of Levels (Explicitly)") written by @warpfork. My job has been to bridge those requirements with the functionality provided by bindnode, using starlark as the translation layer. Perhaps my code is making incorrect assumptions about how it should do that, but overall the requirements are not totally clear to me, and my understanding of the design of IPLD is still early. I would be interested in getting some clarity on how these pieces should fit together. Specifically, is the linked code sample impossible? Or should the starlark layer be automatically inserting calling to Representation() in order to use bindnode as it was intended? To your earlier point, is the printer buggy for not showing the representation form, or is my code bugging and dropping important type information somewhere?

@warpfork
Copy link
Collaborator

On the build calls returning the type level value, even when the builder was for the representation form and fed values accordingly: Yeah, that's the way it is.

The main reason for this is: it's always pretty clear to gear-shift down (e.g. on a typed node, ask for the .Representation(); on an ADL node, as for the .Substrate())... but there's no standard API for gear-shifting up.

(I agree that this is kind of surprising -- it sort of feels like it "breaks" the conventions about "stay on the level you're on" -- but don't know how else we could do it.)

@rvagg
Copy link
Member

rvagg commented Aug 12, 2022

OK, so as per @warpfork's comment above the printer thing is OK—although weird, the current intent is that Build() return what the original prototype intended, regardless of whether you were constructing it on a lower layer (Representation in this case). So that's working as expected (probably needs even more docs around it cause it's kind of surprising).

But on the other point, setting a type level Node onto a representation level Assembler where they don't meet the requirements is not something we should be enabling (if we can help it). I can see how https://github.com/ipld/go-datalark/blob/master/docs/using-complex.md#kitchen-sink-constructed-by-mix-of-levels-explicitly might lead to wanting that here but I don't think that's going to be a happy move in terms of the integrity of the layering. If it really is needed for datalark then I think it is probably going to have to manage it over there instead?

The layering in go-ipld-prime means that a graph is going to have different properties when viewed at different layers, and the builders likewise are the same. A typed stringprefix union appears as a map Node at the type level, but at the representation layer it appears as a string Node. In theory there's also another possible layer on top of that if we were to make an ADL out of it so it could appear as something else. When building with a prototype's Assembler, that should also operate at the layer that it's supposed to. So a stringprefix union's Assembler should assemble as a map, but the prototype's representation layer form should assemble as a string.

There is a possibility of inserting <magic> so it'll just figure it out, but for the most part we've been trying to eschew as much magic as possible to keep the integrity of the layering. Apparently we have some attempted magic of this kind in the codegen, but we should probably be winding that back.

I know it pushed the complexity back on you, but can you see a path to achieving it upstream from here?

Aside from the bindnode repr.go changes this seems mostly good to me, as unhelpful as that might be for what you're trying to achieve. Sorry.

// test that AssignNode will fail if called twice, it expects an empty
// node to assign to
func TestMapCantAssignNodeTwice(t *testing.T) {
b := basicnode.Prototype.Map.NewBuilder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for almost all of these (I pick just one instance at random to comment on), I wonder if we can immediately generalize them to TestTheTrait(t *testing.T, np datamodel.NodePrototype) ?

It would need:

  • the reusable test would need comments on it on what kind of NodePrototype it's expecting (in this case: a map)
  • another quick func of boilerplate for TestTheTrait(t *testing.T) { sharedtests.TestTheTrait(t, basicnode.Prototype.Map) }
  • ... that's about it?

Not sure if you saw it, but we have the go-ipld-prime//node/tests package for some of these already.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's also very likely that this will run into issues being applied on more than one node implementation, especially in the area of error message string matching... but I'd like to be aspirational about that!)

Comment on lines 1034 to 1063
case schema.UnionRepresentation_Stringprefix:
// NOTE: duplicated in node.go, unionAssembler.AssembleValue
// consider refactoring these
name := w.curKey.val.String()
var idx int
var mtyp schema.Type
for i, member := range w.schemaType.Members() {
if member.Name() == name {
idx = i
mtyp = member
break
}
}
if mtyp == nil {
return _errorAssembler{fmt.Errorf("bindnode TODO: missing member %s in %s", name, w.schemaType.Name())}
}

goType := w.val.Field(idx).Type().Elem()
valPtr := reflect.New(goType)
finish := func() error {
unionSetMember(w.val, idx, valPtr)
return nil
}
return &_assembler{
cfg: w.cfg,
schemaType: mtyp,
val: valPtr.Elem(),
finish: finish,
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the related switch cases added to AssembleKey and Finish) doesn't belong here. If someone is working at the representation level, and the strategy is stringprefix, the only thing it should accept is AssignString, or AssignNode given something that reports Kind() == Kind_String.

@warpfork
Copy link
Collaborator

Yep, I think I holistically agree with @rvagg . I do think giving our starlark bindings over in the go-datalark repo some magic "DWIM" sauce is a good and fun idea -- but we should do that over there, and we should keep the behavior of the lower level components in this repo as un-clever as possible. "APIs should stick to one level at a time; gearshifting requires explicit steps" seems like a good heuristic.

(Confessedly, this can be a little confusing right now when looking at the code, because... well, there are bad examples. We haven't always followed that heuristic. In particular, I know I scattered a bunch of code in various AssignNode implementations in the past which would look for concrete golang types, and magically gearshift the data to the level the builder was working on. I think these were a mistake, now: they made things more confusing (and some of that old code was also written in a very bug-prone way, but that's another matter). The direction we should move is less magic, and stick-to-one-leve-at-a-time, IMO. In other words, prioritize clarity, even at the expense of having to report more errors and reject more data. We can layer the "DWIM" stuff back on above that, as necessary.)

@warpfork
Copy link
Collaborator

The increases in test coverage are wonderful and always welcome! I'd still love to merge those parts.

@dustmop
Copy link
Contributor Author

dustmop commented Aug 13, 2022

Ok, so is the plan to revert the changes to node/bindnode/repr.go as well as node/bindnode/struct_test.go, and just move forward with the int_test.go and map_test.go tests?

I admit I'm a bit lost on how to implement the fixes for go-datalark. I'm not sure what sort of "magic" is being discussed, and could use some clarity there. I had assumed AssignNode was just the right way to copy already constructed nodes, but if that's not correct, what is it? Does datalark need to do a conversion from a typed node to a representation node, using a call to Representation?

@warpfork
Copy link
Collaborator

Yeah, for datalark, exactly, I think that's what datalark should do -- if a datalark constructor is called, and knows its working on representation level, and it's been handed a node that's already all processed (e.g., an ipld.Node, as opposed to just other starlark primitives)... then it can detect that situation, do a feature-detection for .Representation() ipld.Node, call that in order to shift gears down to representation, and proceed with that.

(I'm okay with putting that logic in datalark because it's a higher level. I'd probably thumbs up making functions in go-ipld-prime that do that kind of detection and "DTRT" logic too -- just not putting them in node implementations themselves; that pushes it too deep / makes more work for every node implementation / creates more places for things to go wrong and get weird if all the node implementations don't perfectly implement all the edge cases.)

@dustmop dustmop changed the title fix(bindnode): UnionRepr assembler works for Stringprefix test(basicnode): Increase test coverage for int and map types Aug 16, 2022
@dustmop
Copy link
Contributor Author

dustmop commented Aug 16, 2022

Reverted as specified, looking for a merge when you get a chance.

@rvagg rvagg merged commit 8fd3dea into ipld:master Aug 25, 2022
@rvagg
Copy link
Member

rvagg commented Aug 25, 2022

thanks @dustmop, we should try and generalise some of this to cover the other node impls as @warpfork said, but for now this is better than not having it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🎉 Done
Development

Successfully merging this pull request may close these issues.

3 participants