Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dag-jose implementation using IPLD schema/code generation #23

Merged
merged 27 commits into from
Nov 26, 2021

Conversation

smrz2001
Copy link
Collaborator

@smrz2001 smrz2001 commented Nov 13, 2021

dag-jose implementation using IPLD schema/code generation - #4

Description

This PR reimplements the existing dag-jose codec using code generated using an IPLD schema.

The primary motivation for this was to move away from custom, handwired IPLD Node assemblers that were harder to debug.

Input data can have "general" and "flattened" serialization, encoded data always has "general" serialization.

How Has This Been Tested?

  • Unit tested
  • Tested successfully against all fixtures from this PR.
  • ipfs dag put --store-codec dag-jose and ipfs dag get work correctly

Definition of Done

Before submitting this PR, please make sure:

  • I have added relevant tests for new or updated functionality
  • My code follows conventions, is well commented, and easy to understand
  • My code builds and tests pass without any errors or warnings
  • I have tagged the relevant reviewers
  • I have updated the READMEs of affected packages
  • I have made corresponding changes to the documentation
  • The changes have been communicated to interested parties

TODOs (for subsequent PRs):

  • Encode Compact Serialization JOSE objects

References:

Comment on lines 75 to 79
// `link` is not encoded as part of DAG-JOSE because it is not included in the DAG-JOSE spec but is included
// here in the schema because it is required when decoding/encoding from/to other encodings (e.g. DAG-JSON).
// If `payload` is present during decode, `link` is added with contents matching `payload`. If `link` is present
// during encode, it is validated against `payload` and then ignored.
schema.SpawnStructField("link", "Link", true, false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just annotating as I go in reading)

This initially seemed strange (and would be, in almost any other use of schemas)... but as long as I keep reminding myself that this isn't really a general purpose schema, it's just meant to used inside one specific codec which is doing special logical operations such as this one described in the comment... then it seems fine.

(And I definitely appreciate the comment being here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- wait, sorry. I don't understand what's going on here at all.

I initially assumed these definitions matched what's in the dag-jose spec, but they don't? There's no link fields in this structure as it is stated there?

This is related to my confusion in the other comment in https://github.com/ceramicnetwork/go-dag-jose/pull/23/files?diff=unified&w=0#r749816426 .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still wrapping my head around how builders/assemblers can/should be used 😕

I was trying to match the schema from the spec but needed link in there to allow "copying" into the dag-jose builder from another encoding that included link, e.g. dag-json. That way, I could accept it into the builder then drop it on the floor during encode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I could do would be to use the same type of logic I used to "unflatten" nodes (i.e. "flattened" node -> reconstructed map -> "general" node) to skip link fields and then remove them from the schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what's up now. Is "DecodedJOSE" in the code here a combination of the "DecodedJWE" and "DecodedJWS" types as stated in the schema?

I think I understand the desire there (to minimize code?) but this seems relatively risky. It means we need the "figure out which one a message is, and reject the other fields" in application logic again. We see some of that with the code around the "link" field, but I sorta suspect that's not all of it, because there's a lot more fields in the "DecodedJWE" struct spec than in the "DecodedJWS" struct spec.

I wonder if this wouldn't be safer and get more work "done for free" by just making both those types fully, and when decoding, just try decoding into each type. If one fails, try the other; if they both fail, then decode fails.

Copy link
Contributor

@warpfork warpfork Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I'd want to test here, in any case, is to make sure we know what fields actually show up when iterating over the data model. As soon as I can find some time, I'll propose some more fixtures which probe this. I want to make sure that this implementation and the JS implementation are aligned on that matter, because if not, higher level things like Selectors will start producing nasty surprises. It's possible that the combination of the dag-jose and the dag-json fixtures in ipld/codec-fixtures#11 test that, but if so, it's pretty indirectly and a bit hard to call it "legible", so I'd like to add more fixtures which make it clearer. (And get them exercised from this PR, too, so we're sure.)

(I'm a little scared that something in the implementation detail of thwacking these two structures together will "leak". Maybe it's nothing, but some fixtures shining a light directly on the question of "what paths do we visit, when walking the returned nodes?" would answer it definitively.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that makes sense. I like the idea of being able to fully rely on the schema to validate the types instead of having an ugly mixture of some schema-based rules and some application ones.

I started out with distinct types but then later combined them to reduce cardinality. It kept things simpler and easier to reason about while prototyping (and that's also how it was done in the prior version of the code, presumably for the same reason lol). I seem to recall some other hurdle I ran into when going with that decision but I've been through so many experiments I can't remember what exactly it was.

Now that I feel much more comfortable with IPLD node assemblers though, I don't think it will be difficult to make and switch to distinct types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link will continue to be a special case. We can't encode it (to maintain compatibility with JS encodings) and I also didn't want to not return it from decode so that I could keep behavior identical to JS decode.

I do like your thought of not conflating encode/decode and presentation, so having a separate presentation layer could be clearer and more helpful. That's starting to sound like an ADL lol (IIUC).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipld/ipld#151 are the fixtures I propose to help clarify this.

Comment on lines 34 to 38
// DAG-CBOR is a superset of DAG-JOSE and can be used to decode valid DAG-JOSE objects. Use DAG-CBOR decoding but do
// not allow IPLD Links. See: https://specs.ipld.io/block-layer/codecs/dag-jose.html
err := dagcbor.DecodeOptions{
AllowLinks: false,
}.Decode(joseBuilder, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say that this seems unnecessary (because if you're decoding it into something with a schema, that should already enforce this rule and more, right?)... but then remembered the "link" field, which has these special rules about how it should be synthesized. So, okay, maybe it makes sense to do this.

But it draws attention to the fact we seem to be using the types as both a decoding target, and as something we return to people to look at... despite those being logically not the same thing.

I can't tell if this is strange looking but fine in both spec and code, or code worth worrying about but still spec fine; or if this hints towards something in the spec that's weird enough that it makes me want to review the spec again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg, I'm not sure you have any free time, but if you do and could have a look at how this link property is derived and if that makes sense to you, I'd kinda appreciate a doublecheck.

Copy link
Collaborator Author

@smrz2001 smrz2001 Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm quite unsatisfied with the contortions I've had to make to match the behavior from the spec. I'm sure much of that is probably my inability to figure out how best to leverage code generation and the general Node assembly framework.

I just pushed what is likely to be the last significant code update to the main body of the codec that adds support for accepting "flattened" serialization. Wanted to put it up quickly (though not quite refined yet) so you and @rvagg can look at that too.

I have a feeling that some of the utility methods I've added to force the codec into submission might not pass muster 😅 Please let me know if there are better ways to do what I'm trying to achieve, mainly to re-assemble constructed nodes into a different shape (e.g. from a "flattened" node to a "general" node).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link property is derived like this:

  1. take the bytes in the payload (this should be a CID)
  2. encode them as a CID Link (in js land that's a CID instance)
  3. put it at the link property

Comment on lines 28 to 33
// If the passed `NodeAssembler` is not of type `_DecodedJOSE__ReprBuilder`, create and use a
// `_DecodedJOSE__ReprBuilder`.
joseBuilder, alreadyJose := na.(*_DecodedJOSE__ReprBuilder)
if !alreadyJose {
joseBuilder = Type.DecodedJOSE__Repr.NewBuilder().(*_DecodedJOSE__ReprBuilder)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This successfully detects and shortcuts if someone hands in the builder form...

But not if they hand in the assembler, for whatever reason.

I had to double-check myself quick on how this works, and so came out with a gist: https://play.golang.org/p/zdvRuU3WY1g . Moral of the gist is: you can certainly make code that works on either type, but unfortunately when doing the assertions to see what was handed in, you might want to do two of them, so either builder or assembler form will both hit the fastpath.

(My apologies for this being a little confusing. A lot of this area was designed with "maximum fast" in mind, and the ease-of-use is correspondingly... well. I wish I could think of a way to make this simpler!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, gotcha, will update, ty.

Copy link
Collaborator Author

@smrz2001 smrz2001 Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks like the coercion doesn't work with the way the builder/assembler are generated 😞

type _DecodedJOSE__ReprBuilder struct {
	_DecodedJOSE__ReprAssembler
}
type _DecodedJOSE__ReprAssembler struct {
	thing int // n.b. same name "thing", same int type
}
./prog.go:37:41: cannot convert assembler (type *_DecodedJOSE__ReprAssembler) to type *_DecodedJOSE__ReprBuilder

versus the way it's in the example you sent:

type _DecodedJOSE__ReprBuilder struct {
	thing int
}
type _DecodedJOSE__ReprAssembler struct {
	thing int // n.b. same name "thing", same int type
}

Or is there some other way to do this that I'm missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works like this but I'm not sure if it like it:

builder := &_DecodedJOSE__ReprBuilder{*assembler}
_ = builder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, you know what? It might not be necessary to coerce the assembler into a builder.

The fastpath would be to pass either a JOSE builder or a JOSE assembler on to CBOR decode.

Could I do something like this?

	copyRequired := false
	jwsBuilder, castOk := na.(*_DecodedJWS__ReprBuilder)
	if !castOk {
		// This could still be `_DecodedJWS__ReprAssembler`, so check for that.
		_, castOk := na.(*_DecodedJWS__ReprAssembler)
		if !castOk {
			// No fastpath possible, just create a new `_DecodedJWE__ReprBuilder`, use it, then copy the built node into
			// the assembler the caller passed in.
			jwsBuilder = Type.DecodedJWS__Repr.NewBuilder().(*_DecodedJWS__ReprBuilder)
			copyRequired = true
		}
	}

	if err := dagcbor.Decode(jwsBuilder, r); err != nil {
		return err
	}

warpfork and others added 2 commits November 18, 2021 21:56
* Introduce tests using fixtures from the ipld/ipld repo.

* tweak some fixture hunk names.

See commit message in the linked commit in the ipld/ipld submodule.

(Calling both the dagjose and the dagjson hunks "data", as it was
previously, suggested incorrect things: they are not that similar.
The dagjson is the datamodel view, and that is *not* the same thing
as a transliteration of the tokens that you'd get if you parsed the
dag-jose as dag-cbor.)
Comment on lines 70 to 73
n, err := ipld.Decode(fixtureDataBinary, Decode)
if err != nil {
t.Fatalf("%s", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have a logically identical n from the parent scope -- you can just keep using it here and skip these lines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right, will do.

Comment on lines 75 to 91
encoder, err := linkSystem.EncoderChooser(dagJOSELink)
if err != nil {
t.Fatalf("could not choose an encoder: %v", err)
}
hasher, err := linkSystem.HasherChooser(dagJOSELink)
if err != nil {
t.Fatalf("could not choose a hasher: %v", err)
}
err = encoder(n, hasher)
if err != nil {
t.Fatalf("%s", err)
}
lnk := dagJOSELink.BuildLink(hasher.Sum(nil))
cidLink, ok := lnk.(cidlink.Link)
if !ok {
t.Fatalf("%s", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoa whoa whoa. I'm pretty sure this is all just linkSystem.ComputeLink isn't it? You can do way less work than this :P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, copy-pasta failure there. Fixed.

if !ok {
t.Fatalf("%s", err)
}
fixtureCidString := strings.Replace(string(fixtureCid.Hunk.Body), "\n", "", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic nit, but I'd tend to use strings.TrimSpace for this. Needs many fewer arguments.

@smrz2001 smrz2001 merged commit 0029d76 into main Nov 26, 2021
@smrz2001 smrz2001 deleted the feat/dagjose-schema branch November 26, 2021 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants