Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workstream: Source Track #956

Closed
1 task
kpk47 opened this issue Aug 22, 2023 · 9 comments
Closed
1 task

Workstream: Source Track #956

kpk47 opened this issue Aug 22, 2023 · 9 comments
Assignees
Labels
source-track workstream Major effort comprising multiple sub-issues

Comments

@kpk47
Copy link
Contributor

kpk47 commented Aug 22, 2023

This is a tracking issue for creating a Source track. The main idea is to cover properties of how the source code was developed. The exact thrust of this track --- i.e. the threats mitigated by this track --- are still TBD.

Workstream shepherd: Kris K (@kpk47)

Sub-issues:

  • TODO

SLSA v0.1 set requirements on source code management that we removed from v1.0. We should reintroduce those requirements (or something similar) in a Source Track. We will also need to create a format for any source attestations.

A few questions to start discussion:

  1. What does the Source level attach to? Is it a project, a repo, a commit, or something else?
  2. Which version control systems do we need to consider?
  3. What sort of guarantees should the source track make? Traceability (i.e. the source came from this repo), transparency (i.e. this code was written by this organization, person, etc), quality (i.e. the source is trustworthy because x y z), others?
  4. Will the source track have the same requirements for open and closed source projects? The same standards of evidence for meeting those requirements?
@joshuagl
Copy link
Member

cc @TomHennen who has done a lot of thinking about source attestations in the past.

@steiza
Copy link

steiza commented Aug 24, 2023

This is great! I'm going to take a stab at answering these questions.

  1. What does the Source level attach to? Is it a project, a repo, a commit, or something else?

I think the top-level object should be a list of changesets. A changeset can be made up of one or more commits. A changeset should have a 1-1 relationship with a review process (e.g. pull request in GitHub terminology or merge request in GitLab terminology), although a changeset might not have a review if it was pushed directly to a remote. A changeset has a 1-1 relationship with a repository, but the changesets described in the source attestations for an artifact could come from more than one repository.

  1. Which version control systems do we need to consider?

Pragmatically, I would focus on git. At the very least, a graph-based distributed version control system. I worry that we might not be able to come up with a specification that is meaningful and covers all the version control systems out there.

  1. What sort of guarantees should the source track make?

Generally, I'd lean towards objective statements that don't change over time. Instead of "no vulnerabilities found at X time" do things like list components for later determination of vulnerabilities, or a specific command that was run and the output it produced.

We should frame these in terms of what verification policies we'd want to run, and then figure out how to represent them such that the policy can be evaluated.

The shortest list of things I'd like to verify is things like:

  • All contributions have one or more strongly authenticated authors
    • "authenticated" here could mean a platform's authentication system, or commit signing (although commit signing raises a bunch of questions about PKI)
  • All contributions went through any review process
  • All contributions were approved by at least X people (this can get complicated when a single contribution has multiple authors)
  • All contributions had no outstanding request for additional changes from a reviewer
  • All contributions passed some automated check I care about (e.g. Developer Certificate of Origin, or https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features)
  1. Will the source track have the same requirements for open and closed source projects?

I think the source attestation representation will be the same for open and closed source projects. The policies that people evaluate against those source attestations will be different for every organization, including open source and closed source projects.


Some additional questions:

  1. What range of time will source attestations cover?

Changes since the last release? Changes over the past X units of time? Changes going all the way back to the start of a repository? How do you onboard an existing repository to conform to a more strict policy?

  1. How do we decide what contributions to include?

Conceptually, I want all the contributions associated with any commit that has a line of code in the release.

But there's a bunch of corner-cases here. What if I make a contribution that includes 4 commits, and later the branch history is rewritten to not include any of those commits? What if the lines of code in that contribution are all replaced by later contributions before the release is cut?

@zachariahcox
Copy link
Contributor

I super excited to see this spin up!! I have a few additional questions and thoughts to add to the discussion in no particular order.

Attestations are explicitly from the perspective of the code-hosting server

This is a more interesting topic for a distributed version control system like git, where all repos are considered equally valid.
Source attestations need to say "as far as we, the version control server product, can tell, version abc123 of repo X has the following set of attributes: [...]".

This seems to mean:

  • To consume source provenance attestations, your team must extend the circle of trust to include part of the code-hosting service.
  • This clearly elevates the trusted, hosted copy of the repo above others. For any specific team, that extended trust boundary has some consequences as to which parts of the SDLC process become more trustworthy than others.
  • One hosting service may or may not be able to independently verify the contents of another service's attestation. EG: Azure Repos cannot independently verify that GitHub's pull request process was followed perfectly.

Who "contributed" a line of code?

Reasonable options:

  1. The identity named in the commit message's "author" or "committer" fields?
  2. The identity from the DCO sign-off statement in the commit message?
  3. The identity used to authenticate the push?
  4. The identity associated with the commit's signature?
  5. The IP address of the machine that uploaded the commit?

From a team's perspective, the answer could be any combination of those.
A team may have a set of certs used to sign all commits that, for them, represents the source of trust.
For OSS repos, the "author" email address may be the source of truth, representing the actor who sent the patch to the list serv, etc.

From the server's security perspective, the answer seems unequivocally "3": regardless of the contents of the payload, the server "blames" the authenticated user who connected to the server to upload the changes.

For git, there are no security controls to enforce the contents of a commit, but authentication with the server requires control of an account.
This means the security rules you care most about should be conducted server-side.
(As a general statement about non-git VCS, the "commits" are at best created on developer laptops and server-side rules will still play an important role.)

Because attestations are from the server's perspective, it seems the server should be picky that the "authenticated pusher" is the relevant identity.

What kind of commits can be trusted?

Due do the uncontrollable nature of commit contents, it's reasonable for the server to assign more trust to the commits it creates itself (where it controls the process) and less trust to commits it receives from the internet.

For most mature dev teams, code review is a required part of the change process and the most important commits created by the server are the proposed merge commits between the topic branch and the target branch.
On GitHub, these are the commits created by the pull request system and pointed-to by refs/pull/* refs.
It's not a great idea to allow customers / attackers to build the proposed merge commits themselves -- reviews of the code shown the PR are only relevant because we trust the process that creates the diff we display.

These kinds of merge commits are a great candidate to consider for attestations.
They have known, stable process for their construction, they represent the "net changes" proposed by the topic branch and are associated with all the metadata available to the pull request.

Is the "pusher" a contributor?

In Git, a single push may contain many objects and ref updates, creating many new versions of the repo at once.
If your team uses a git flow model where lots of changes are made to topic branches and incorporated into protected branches via pull requests, then the vast majority of these new "versions" are iterations towards the next proposed version: they are never intended to pass the final set of policies required to be deployable on their own.

In such a setup, most human-user pushes contribute to peer-reviewable changesets, and are not necessarily useful on their own.

Accepting a change set causes the server's service principal to produce the approved "contribution" commit.
It's really this commit, associated with the "changeset review context," that represents the thing we want to consume and write policy against.

Important scenarios to validate

I like @steiza 's summary☝️.
For git repos specifically, I'd also add recommend adding a non-deployment-policy scenario:

  • All commits were squash-merged into a single new commit representing a peer-reviewed change set.

This rule restricts the kinds of commits that make it out of developer-controlled refs and onto "production" refs.
This is mostly relevant for CI and developer laptop-checkout policies: anywhere where you might fetch more than a single sha from a ref. Deploy policies will typically only fetch a single sha.

@MarkLodato
Copy link
Member

Thanks @steiza and @zachariahcox! This is a great start. My team has thought about this topic quite significantly and I'm happy to share those thoughts as well (matches fairly well to what has been said above).

But before I do that, let's figure out the best way to collaborate here. Should we discuss these ideas via GitHub Issues? GitHub Wiki? Google Doc? Something else?

My inclination is to use a google doc or wiki so that people can have threads and iterate on ideas. A comment thread is pretty hard to follow.


Either way, I think we might want to break this down along the following lines:

  1. What high-level, hand-wavy guarantees might we care about, and how do we organize them into a meaningful set of levels? For example (building on ideas from @steiza):

    • All contributions can be traced to one or more strongly authenticated authors
    • All contributions went through multi-party review and approval (there are probably degrees of strength here, including number of reviewers, changes after approval, whether it can be bypassed, etc.)
    • All source code was retained for at least X period of time (for incident response, investigations, auditing, etc.)
    • All contributions passed some automated check I care about (e.g. DCO)

    Eventually we'll want to aggregate these into a single "theme" for the track. But that might come later.

  2. How do we translate those high-level ideas into concrete requirements? Here is where we would answer the nitty gritty questions, such as:

    • What is the subject of the thing that has a level (a commit, a repo, etc.)?
    • What range of time will source attestations cover?
    • How do we decide what contributions to include?
    • Who "contributed" a line of code?
    • What about changes contributed by a robot?
    • Who attests to this information? The code-hosting server?
    • How does this information propagate (attestation formats, storage, and APIs)?

These two pieces will necessarily influence each other, but they can happen in parallel. The reason I think it might be valuable to split them is that it's hard to have conversations at two very different levels of abstraction.

@MarkLodato
Copy link
Member

I created a doc here to get us started. It's empty now but we can start to fill it in:
https://docs.google.com/document/d/1nVkvRsxFef2OgVm2CumyLDbt-yYX-A5wtg6dF_sthwo/edit

@MarkLodato
Copy link
Member

I added v0.1 to the doc as one possible starting point to build on. It would be helpful to know (via docs comments) what is problematic, unclear, or missing from that ladder.

Feel free to also start a new section that doesn't build on v0.1, or that changes it significantly.

@melba-lopez
Copy link
Contributor

just want folks to remember the original thread on source that @marcelamelara and I originally were thinking through #463

@MarkLodato MarkLodato added the workstream Major effort comprising multiple sub-issues label Oct 10, 2023
@MarkLodato MarkLodato changed the title Source Track Tracking issue Project: Source Track Oct 10, 2023
@MarkLodato MarkLodato changed the title Project: Source Track Workstream: Source Track Oct 17, 2023
@kpk47
Copy link
Contributor Author

kpk47 commented Jan 20, 2024

I've digested the brainstorming doc into a draft specification for the Source Track. Please take a look and comment: https://docs.google.com/document/d/1sKNvZzjdpL4OC5H7VdPLPGG0G3XFJc3i5q144mhOnP8/edit.

I intend to accept comments on the Google Doc for 2 weeks, at which point I will turn the draft into a formal proposal in the slsa-framework/slsa-proposals repo.

@zachariahcox
Copy link
Contributor

marking this one as closed for now.
By my read, the biggest unresolved idea left in this issue is the concept of contributor.

  • We have made a lot of progress on the source track since this issue was opened, and are settling on "revision" as being the unit of attestation. This opens the door to SCPs decorating their attestations of those revisions with any amount of information, including high-fidelity contributor info.
  • We have made progress on the roles of bot accounts within the source track -- quick summary: they're just actors like any human from our perspective. If an organization chooses to add or subtract weight from their input, that is up to them and the downstream consumers of their claims.

@github-project-automation github-project-automation bot moved this from Let's close it. to Done in SLSA Source Track Sep 30, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Issue triage Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source-track workstream Major effort comprising multiple sub-issues
Projects
Status: Done
Status: Done
Development

No branches or pull requests

7 participants
@MarkLodato @steiza @kpk47 @joshuagl @zachariahcox @melba-lopez and others