Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Separate canonicalized_body from TransparencyLogEntry? #72

Closed
wlynch opened this issue Mar 23, 2023 · 7 comments · Fixed by #74
Closed

FR: Separate canonicalized_body from TransparencyLogEntry? #72

wlynch opened this issue Mar 23, 2023 · 7 comments · Fixed by #74

Comments

@wlynch
Copy link
Member

wlynch commented Mar 23, 2023

@asraa @haydentherapper and I had a discussion over at sigstore/rekor#1390 (comment) that I figured is worth opening up an issue here for discussion.

I'm currently implementing offline verification for Gitsign. Currently this is done by serializing all the fields in TransparencyLogEntry individually to their own OIDs, then reconstructing the message at verification time. An idea was suggested to encode the serialized version of TransparencyLogEntry instead, but the message as defined today is problematic because of canonicalized_body.

canonicalized_body was added to TransparencyLogEntry in #10 to let clients verify data without needing to reconstruct the body themselves due to concerns around normalization of DSSE encoding (#9).

However, for Gitsign we would like to omit this body and reconstruct it.
Gitsign has constraints on the signature format to stay in-line with other Git signatures. If we store a serialized version of the TransparencyLogEntry with canonicalized_body, we end up doubling the size of the signature for data that we already store elsewhere and can reconstruct.

I'd like to propose modifying TransparencyLogEntry to allow only storing the minimal data needed. Few options I can see:

  1. We break out canonicalized_body from TransparencyLogEntry and create a new wrapper message that combines them. e.g.

    // name tbd - open to bikeshedding
    message TransparencyLogEntryBundle {
      TransparencyLogEntry entry = 1;
      bytes canonicalized_body = 2;
    }

    This way the TLE + canonicalized_body can still be included in the full bundle, but applications that only need the minimal data in TransparencyLogEntry can just use that.

  2. We add as part of the spec that it's valid for canonicalized_body field to be unset, and if it is it's the responsibility of the client to construct it during verification. This unfortunately has the side effect of effectively undoing the changes in Add Rekor body to TransparencyLogEntry #10, since clients will need to be able handle cases where the canonicalized_body is not present.

  3. We let Gitsign be special and document our off-label usage of TransparencyLogEntry there. I'm a little hesitant about this since a) we'd wouldn't be able to treat TransparencyLogEntry like an opaque envelope and may need to track changes overtime and b) if gitsign is running into this my guess is other signing tools may as well.

Also open to other ideas!

cc @znewman01 @codysoyland

@znewman01
Copy link
Contributor

I like option 1. It's breaking-ish so we'd need a version bump.

CC @kommendorkapten (other maintainers are already CC'd).

@haydentherapper
Copy link
Collaborator

So sorry for the delayed response - +1 to going with 1, and then having a single OID for Rekor.

@feelepxyz
Copy link
Member

Sounds like we would be moving this around in the full bundle which would be a breaking change. This is a somewhat painful breaking change for npm as it means old npm CLI clients with an older version of sigstore-js would not be able to verify new bundles. We could probably handle it in the next month as provenance support is still in private beta.

I forget where this was discussed, but remember some ideas around a SET v2 format that would address the pain around reconstructing the entry for verification?

Could we introduce this change in a non-breaking way?

cc @bdehamer

@codysoyland
Copy link
Member

  1. We break out canonicalized_body from TransparencyLogEntry and create a new wrapper message that combines them. e.g.
    // name tbd - open to bikeshedding
    message TransparencyLogEntryBundle {
      TransparencyLogEntry entry = 1;
      bytes canonicalized_body = 2;
    }
    This way the TLE + canonicalized_body can still be included in the full bundle, but applications that only need the minimal data in TransparencyLogEntry can just use that.

How would that work, as this suggestion does not make any fields optional? It seems like the canonicalized_body is still not optional in this case.

  1. We add as part of the spec that it's valid for canonicalized_body field to be unset, and if it is it's the responsibility of the client to construct it during verification. This unfortunately has the side effect of effectively undoing the changes in Add Rekor body to TransparencyLogEntry #10, since clients will need to be able handle cases where the canonicalized_body is not present.

I much prefer this option as it's backwards-compatible, and it seems like the caveat you mention is true in the case of option 1 also, unless I misunderstand. My intent with #9 was to make it optional anyway (no fields had the optional keyword at that time), with the expectation that it must be present if the SET cannot be produced deterministically. For example, the intoto type contains a hash of the user-provided DSSE envelope JSON byte array, which is not canonicalized (as in the DSSE JSON may contain extra whitespace, even though the inner structure is canonicalized), so we need to have the canonicalized body to produce that hash. If the intoto type was revised to produce that hash on a canonicalized value, that would eliminate the need for the canonicalized_body field. I would prefer to remove it altogether, but it is impossible to compute the SET on some types without it.

@wlynch
Copy link
Member Author

wlynch commented Mar 30, 2023

How would that work, as this suggestion does not make any fields optional? It seems like the canonicalized_body is still not optional in this case.

The idea was to separate out the message types depending on the usage. So for Gitsign we could include just the minimal TransparencyLogEntry, but for clients that need the full LogEntry + canonicalized body they would use TransparencyLogEntryBundle.

My intent with #9 was to make it optional anyway (no fields had the optional keyword at that time), with the expectation that it must be present if the SET cannot be produced deterministically.

Good to know! If this is the case then we could probably just clarify the usage of the field in the doc comment and continue to use the existing LogEntry type.

@znewman01
Copy link
Contributor

I still like (1) better conceptually, but if transitioning is going to be painful (sounds like it would be) (2) will work. Thanks for the feedback, @feelepxyz

@wlynch want to prep a PR?

@kommendorkapten
Copy link
Member

I too think we should avoid a breaking change now, I rather wait for a the new SET v2 in Rekor, and then we can make a breaking change and bump the version. As then we can make a more robust change, that would be more long lived and thought through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants