Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add predicates for human reviews #151

Closed
wants to merge 9 commits into from

Conversation

adityasaky
Copy link
Member

@adityasaky adityasaky commented Mar 13, 2023

This is still a WIP. Crev needs a major update but I took a pass at defining a generic human review attestation type followed by system specific ones that extend the generic predicate.

Fixes #77

@adityasaky adityasaky force-pushed the human-review-attestations branch 5 times, most recently from b05e4dc to 2af4cac Compare March 16, 2023 19:55
@adityasaky adityasaky marked this pull request as ready for review March 16, 2023 19:55
spec/predicates/human-review/human-review.md Outdated Show resolved Hide resolved
spec/predicates/human-review/human-review.md Outdated Show resolved Hide resolved
spec/predicates/human-review/human-review.md Outdated Show resolved Hide resolved
spec/predicates/human-review/vcs.md Outdated Show resolved Hide resolved
spec/predicates/human-review/vcs.md Outdated Show resolved Hide resolved
spec/predicates/human-review/vcs.md Outdated Show resolved Hide resolved
spec/predicates/human-review/crev.md Outdated Show resolved Hide resolved
```json
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to discuss what the subject is (that it should point to some code in a VCS) and probably to discuss if the review covers the referenced change or a review of the entire codebase? Maybe that could be another field in the predicate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point, I considered supporting a range of commits that comprise a changeset. I'll take a pass at tweaking it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too complex to have the subject point to the tip and include starting commit for the range in the predicate?

Copy link
Contributor

@TomHennen TomHennen Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think git & commit ranges are hard to reason about. Perhaps:

  1. The predicate could contain multiple subjects, one per gitCommit (if it's attesting to commit level info)
  2. If attesting to the entirety of a repo it could use gitTree (and maybe the head commit)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we leaned into ITE-4 to make the subject of a github-specific attestation a pull request itself? I think it could work but we'd then have to identify how we link the commits in the PR to that being used as a source for the build etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been thinking pretty hard about this on our end, but it really comes down to what you're trying to do.

My guess is that people will want to look at a specific commit/merge and ask if it was reviewed. The attestation should probably provide a link to the review (ITE-4 can probably help there, but maybe it can just be a URI?). Then I suspect that some users will want to perform that same check for each commit in the log.

If that's true then I think the subject should be the commit where the PR was merged. IMO we'd only want the attestation for the final commit because any review or approval probably applies to the sum of the changes and not the individual commits in the PR.

That means that for a branch to be considered reviewed you want each commit the branch pointed to to have a review that applies to it, but you don't necessarily care about the intermediate commits since the branch never pointed to them individually and you likely don't want people to use those commits directly anyways.

What goal do the users of this predicate have in mind "how do I know if the all the code in this repo is safe?", "how can I identify which things haven't been reviewed so I can take a closer look at them?", "I'd like to make sure I trust the folks that provided all reviews on this repo so I need to know who they are", "I'd like to make sure the code in this repo was reviewed for problem X". "I'd like to have confidence that, at some point, the code in this repo was evaluated for problems X, Y, Z, but I don't need to ensure that each individual commit meets that bar".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. For a v0.1.0, I suggest we leave it as applying to one or more commits, up to the user. In automated systems, I foresee it being issued at the tip of the feature branch, and I think this works from the "was this change reviewed? give me a CR issued for this PR matching the tip of the PR's branch".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we always allow multiple subjects, we should have no trouble supporting more granular use cases such as a per-commit scenario.

@TomHennen
Copy link
Contributor

Have we given any thought tho how this interacts with #124 and this testifysec attestation. cc @colek42

@adityasaky adityasaky requested a review from a team as a code owner May 25, 2023 17:15
@adityasaky adityasaky requested a review from TomHennen May 25, 2023 17:29
@adityasaky
Copy link
Member Author

Have we given any thought tho how this interacts with #124 and #124 (comment). cc @colek42

I think they're complementary. The review predicate can apply to sources recorded in links in a prior step, say from a git clone, and we can extend that with other notions about the repository using testifysec's git attestor.

@adityasaky adityasaky force-pushed the human-review-attestations branch 3 times, most recently from e51c218 to 8971878 Compare May 26, 2023 17:46
Copy link
Contributor

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for getting this started, @adityasaky ! I have a general question, and apologies if this has already been addressed: what's the reasoning for needing three separate predicates for this? The human review and VCS review predicate are practically identical and frankly don't capture much information. I really like the Crev format, and wonder if we can either a) simply use the Crev format in the other use cases, or b) extract the most important properties a consumer wants out of a human review attestation, and develop a single format that will be compatible with all three use cases (Crev, generic, and VCS). If we go with option b, I suspect the predicate will end up looking a lot like Crev, possibly with a few extra fields.

spec/predicates/human-review-vcs.md Outdated Show resolved Hide resolved
spec/predicates/human-review.md Outdated Show resolved Hide resolved
@marcelamelara
Copy link
Contributor

CC'ing @pdxjohnny as this in-toto predicate/topic may be of interest re: code ingestion validation.

Signed-off-by: Aditya Sirish <aditya@saky.in>
* Use RD for VCS review type reviewers
* Show source vs binary subject for crev predicate

Signed-off-by: Aditya Sirish <aditya@saky.in>
Signed-off-by: Aditya Sirish <aditya@saky.in>
spec/predicates/human-review-crev.md Outdated Show resolved Hide resolved

Type URI: (tentative) https://in-toto.io/attestation/human-review/crev/v0.1

Version: Do we use Crev's versioning? Currently -1? What about their URI?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a resolution for this? If we don't know maybe just stick with v0.1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reach out to the crev developers again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a note for now. The crev developers are open to us reimplementing crev without necessarily being backwards compatible. IMO we should start in a place of compatibility (or close-to-compatibility) and see where we go from there. For now, I've added a note about crev's versioning and updated to using in-toto specific URIs.

spec/predicates/human-review-vcs.md Outdated Show resolved Hide resolved
Signed-off-by: Aditya Sirish <aditya@saky.in>
@adityasaky adityasaky requested a review from pxp928 July 28, 2023 19:36
Copy link
Contributor

@TomHennen TomHennen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor changes.

I think that's it besides resolving the question of including the version in the type?

Signed-off-by: Aditya Sirish <aditya@saky.in>
Signed-off-by: Aditya Sirish <aditya@saky.in>
@adityasaky
Copy link
Member Author

I've added a new field to the VCS one called target that records the base branch and its state. This is, IMO, valuable when judging a changeset as it matters where it's being submitted to and when.


`reviewers` _array of ResourceDescriptor_, _required_

Indicates uniquely the identity of one or more reviewers. MUST NOT be used over

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source of the attestation is always the pull request product, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on source? Do you mean the subject? The subject of the attestation identifies the tip of the PR feature branch.

"gitCommit": "330500b54433de4f6f9575676b67738b98ba5e54",
},
},
"reviewLink": "https://github.com/in-toto/in-toto/pull/503#pullrequestreview-1341209941",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec doesn't describe the actual change set that was reviewed, and therefore places a lot of the load on this reviewLink.

For a given PR, the diffs reviewed and voted upon by each reviewer regularly change (unless rules are enforced to dismiss "stale" reviews).

Mechanically this review link is reverencing the critical "provenance" data maintained by the PR product (GitHub in this case).
Pull request data is saved as evidence justifying the existence of the new repo version.

PR data is not quite as nice as signed attestation statements like this though.
PR data can change over time, either by-design (EG: by rerunning checks on the same refs) or maliciously if an attacker gains access to the DB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to infer the diff from the subject (which identifies the tip of the reviewed topic branch) and the base recorded in the predicate. I leaned away from including the full diff inline due to length considerations. reviewLink is included ATM for completeness, any information that isn't considered vital to the review process for the changeset but may still be necessary.

You make a good point about the iterative nature of reviews and that's where a tighter alignment with GitHub mechanics would be helpful to answer questions like when the attestation is signed and generated. If an attestation is generated at merge time, for example, it captures the final states which is probably what we care about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: target needs to be expanded to handle current tip of the base branch and the merge base for diff inference.

Signed-off-by: Aditya Sirish <aditya@saky.in>
@marcelamelara
Copy link
Contributor

marcelamelara commented Sep 15, 2023

@adityasaky Can you help clarify this? As I re-review these predicates, I have a few questions and one major concern around Crev specifically.

First, I really like the intent of the project and would definitely like to support it in in-toto, and I noticed that they actually support more SW supply chain info beyond the source code review proofs that this predicate introduces. Specifically, they also seem to support package-level reviews as well as "meta-reviews" that they call trust proofs. Do you intend to add predicates for these other types of Crev claims at a later point in time? AFAIU, the package review proof format is similar, if not identical to the code review proof format. So, I wonder if it would be worth creating a separate, more encompassing Crev "dependency review" predicate that can cover both types of dependency reviews?

Second, my major concern. The project do not seem to have an authoritative spec anywhere, i.e. I found a spec for a review proof in a few different places: main repo vs cargo-crev. According to the main Crev project repo, the Rust implementation is the most-well maintained, up-to-date version of Crev, but I have no idea if the other implementations are truly consistent or anything. I am still in support of having a predicate for Crev, but I would feel a lot more comfortable if we can be unambiguously clear on our end about which specific version of the Crev review proof formats the predicate follows.

EDIT: I realized at the end of my review that this predicate does already seem to cover the package reviews as well, though this isn't always clear in the text of the predicate spec. I'd suggest being very clear about this in the predicate spec. My other questions and concerns still stand.

Copy link
Contributor

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes @adityasaky ! This review only covers the Crev predicate, and I'll come back to review the VCS predicate.

protos/README.md Outdated Show resolved Hide resolved
protos/README.md Outdated Show resolved Hide resolved
spec/predicates/README.md Outdated Show resolved Hide resolved
spec/predicates/README.md Outdated Show resolved Hide resolved
spec/predicates/README.md Outdated Show resolved Hide resolved
Comment on lines +69 to +72
Identifies the reviewer. This has some meaning for crev's trust proliferation
aspects, but the identity of the reviewer can also be mapped based on in-toto's
functionary handling. `idType` is used to determine the contents of `reviewer`.
The `url` is a reference to the reviewer's crev-proofs repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question and a suggestion. Q: What are possible idType values, and who determines these? Rec: I'd add in here that this field is intended to correspond to the from field in the Code Review Proofs format when the idType matches the ID for Crev.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of a disconnect here, because the crev code review type includes from but missing from the package review type. Same for date. I suspect date may be omitted as a package-wide review is meant as an ongoing document with updates as new versions / advisories emerge. On the other hand, I'm not sure about the from field. I wonder if in the in-toto context we want to drop it and lean entirely on the signature. That takes us to a broader conversation about how in-toto supports crev, though.

spec/predicates/human-review-crev.md Show resolved Hide resolved
spec/predicates/human-review-crev.md Show resolved Hide resolved
spec/predicates/human-review-crev.md Outdated Show resolved Hide resolved
Comment on lines +91 to +92
Optional field with any other comments a reviewer may have about the
dependency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which consumers, if any, of this predicate need to be able to parse/care about the contents of the comment field? This would be helpful info.

Copy link
Contributor

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this is going. @adityasaky I mostly have clarifying comments for the VCS review predicate.


Version: 0.1.0

## Purpose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the Crev dependency review predicate is available as well, I think it would be helpful to add a short comment here about the complementary relationship between that and this predicate.

GitHub allows repository owners to create protection rules that require a
threshold of approvals before a pull request can be merged. This predicate can
be used to express meeting such a policy. In this case, GitHub would issue a
signed attestation identifying all the reviewers for a pull request. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that this sort of attestation wouldn't be generated until the threshold is met, right?

Suggested change
signed attestation identifying all the reviewers for a pull request. The
signed attestation identifying all accepting reviewers for a pull request. The

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this may be getting too much into the weeds.... the current GitHub rule for requiring approvals before a PR can be merged reads "When enabled, pull requests targeting a matching branch require a number of approvals and no changes requested before they can be merged" (emphasis mine).

I've personally experienced this: even though I had the threshold number of needed approvals, a third reviewer had requested changes in the past and not re-reviewed the PR. It wasn't until their review was either dismissed or updated that the PR was able to be merged. This may be a GH-specific rule, but what is the expectation for generating this attestation in this sort of case? The current text only describes how the attestation subject is chosen wrt a PR merge rule, but it may be worth clarifying the exact point or operation upon which this attestation should be generated (e.g., after the approved PR has been merged).

Copy link
Member Author

@adityasaky adityasaky Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great questions and have (somewhat) come up in the SLSA Source track doc as well. In my mind, it comes down to when the attestation is created and signed + what it's associated with. I authored this take on the VCS review predicate thinking it'd be generated at merge time. But this obfuscates cases where stale approvals aren't dismissed (which is I think the more common configuration). @zachariahcox has made this point in the Source track doc if I'm not mistaken, and his take on a similar predicate identifies associates each individual review of the changeset with the commit at the tip of the topic branch. Settling when the predicate is generated will help us decide what and how its contents must be presented IMO.

spec/predicates/human-review-vcs.md Outdated Show resolved Hide resolved
Comment on lines +110 to +111
reviewers, the attestation is re-created for each new review and the time
reflects that of the latest review.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to the old attestations that are superseded by the more recent review attestations? I'm wondering if we should add the ability to link to prior attestations related to the same subject PR to maintain full visibility into the review history.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: #151 (comment)

I think it depends on the intended policies. I'm not sure we have policies / use cases that require full review history, but I'm not opposed to including the information. This is currently written with the admittedly simpler rule k of n approve changes in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a v0.1, I think the k out of n assumption is actually fine. I would just make it explicit, if it isn't stated anywhere yet.

Signed-off-by: Aditya Sirish <aditya@saky.in>
Signed-off-by: Aditya Sirish <aditya@saky.in>
Copy link
Contributor

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some minor nits, but LGTM otherwise. Once the final outstanding comments are resolved, I think this is ready for a v0.1 release.


Indicates time of review creation. `timestamp` in the original crev
specification.

`thoroughness` _enum_, _required_

Describes how thorough the reviewer was. Must be set to one of `low`, `medium`,
or `high`.
`high`, or `none`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, what does a none value tell a consumer of the attestation? Could this be interpreted as an "endorsement" of a particular package?

the review, understanding of the source code, and a final rating.
crev enables social review of popular open source software dependencies. A crev
review includes information such as the thoroughness of the review,
understanding of the source code, and a final rating. The ratings for these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being very nitpicky here, though I think this conveys the expectations more clearly.

Suggested change
understanding of the source code, and a final rating. The ratings for these
understanding of a package's source code, and a final rating. The ratings for these

dependency, and can be performed by one or more of several actors in the supply
chain. The developer importing a new dependency can perform the review or a
dedicated security team can be tasked with it.
process of reviewing and verifying the source code of a particular dependency,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
process of reviewing and verifying the source code of a particular dependency,
process of reviewing and verifying the source code of a particular open-source package,

Comment on lines +58 to +64
"predicate": {
"reviewers": ["<ResourceDescriptor>", ...],
"targetTip": "<ResourceDescriptor>",
"mergeBase": "<ResourceDescriptor>",
"reviewLink": "<URI OF REVIEW>",
"reviewTime": "<TIMESTAMP>",
"annotations": {...}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this falls under the annotations (or could better fit into the CREV predicate) but, I wonder if there's some quantitative that would be useful to include here. For example, number of lines of code added/removed/modified. I've seen some projects require additional reviewers for larger feature work, and fewer reviewers for minor bug fixes. It may be hard to craft a policy around a metric like that but, most other options I can think of turn VCS specific pretty quickly (things like commit messages that link to specific issues with a label or tag for [feat] or [bug]).

@marcelamelara
Copy link
Contributor

@adityasaky Where have we landed on this? At the last attestation maintainer's meeting, we decided to close this issue if it's stalled while we wait for the SLSA Source Track to progress.

@marcelamelara
Copy link
Contributor

I propose that we close this PR since we haven't seen any updates in over a year, unless @adityasaky or someone else is able to pick this back up and get it across this finish line. We can always reopen this topic when there's more momentum behind it again.

@TomHennen
Copy link
Contributor

SGTM

@adityasaky
Copy link
Member Author

I think it's reasonable to close this. I think we might see something like the VCS predicate get proposed soon through slsa-framework/slsa#1161, so I'll preserve this branch in case we want to build off that.

@adityasaky adityasaky closed this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Defining a generalized predicate format for "human reviews" of artifacts
6 participants