Skip to content

chore: allow independent extraction of repo and commit from provenance #708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

benmss
Copy link
Member

@benmss benmss commented Apr 16, 2024

This PR allows the provenance extractor to return partial results when either the repo URL or commit cannot be found. Tests have been updated to ensure this change works as intended while retaining the previous non-overlapping functionality, i.e. returning both, or neither on critical errors.

This change does not apply to SLSA v1 as these use the repo URL as a key to access the correct commit. To support this in SLSA v1 would require the repo URL be provided through some other means before provenance extraction begins.

Updates #707

Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 16, 2024
@benmss benmss marked this pull request as ready for review April 16, 2024 04:18
@benmss benmss requested review from behnazh-w and tromai as code owners April 16, 2024 04:18
@nicallen nicallen self-requested a review April 16, 2024 05:02
@@ -53,7 +53,7 @@ def extract_repo_and_commit_from_provenance(payload: InTotoPayload) -> tuple[str
logger.debug(error)
Copy link
Member

@tromai tromai Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have changed the implementation of _extract_from_* functions, there are 2 duplicated logic within the public extract_repo_and_comit_from_provenance, which are already handled in _extract_from_* functions:

  1. catching JsonError
    except JsonError as error:
    logger.debug(error)
    raise ProvenanceError("JSON exception while extracting from provenance.") from error
  2. raising ProvenanceError when both repo and commit are not available:
    if not (repo or commit):
    msg = (
    f"Extraction from provenance not supported for versions: "
    f"predicate_type {predicate_type}, in-toto {str(type(payload))}."
    )
    logger.debug(msg)
    raise ProvenanceError(msg)

I think it's okay to remove catching JsonError from extract_repo_and_comit_from_provenance function as we are not treating missing repo url and/or commit as a critical error anymore (for example, a witness provenance can have commit hash but no repo).

At the moment, I can see some differences in handling missing repo url and/or commit hash between _extract_from_* functions:

  • Provenance has repo and no commit hash:
    • _extract_from_slsa_v1 raises ProvenanceError.
    • other _extract_from_* functions will return (<repo_url>, <empty_string_for_commit_hash>)
  • Provenance has repo and commit hash: all of them return (<repo_url>, <commit_hash>)
  • Provenance has no repo and commit hash:
    • _extract_from_slsa_v1 raises ProvenanceError. This is because for slsa provenance v1 we need the repo url for obtaining the commit hash.
    • other _extract_from_* functions will return (<empty_string_for_repo_url>, <commit_hash>)
  • Provenance has no repo and no commit hash: all of them will raise ProvenanceError.

For me, even though most of the differences are implementation details, they could make it difficult for us to maintain because we have to take care of the differences in behaviors within extract_repo_and_comit_from_provenance:

  • An empty string is used to represent repo url or commit hash is missing
  • A raised ProvenanceError is also used to represent repo_url or commit hash is missing

I think to improve it we must decide on the what to use for indicating:

  • A critical error that prevent us from reading provenances for repo url and commit (e.g. the type of provenance is not supported) -> I think we can raise ProvenanceError
  • We support and can read the content of the provenance but repo url and/or commit hash are not available, as long as it's different from the usage of ProvenanceError above.

I don't have any concrete ideas on the implementation details but as long as the issues I mentioned above is resolve, it's fine by me. Please let me know what you think about it.

…t to return None over raising errors.

Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
tromai
tromai previously approved these changes Apr 18, 2024
@tromai tromai dismissed their stale review April 18, 2024 05:58

I had a small extra question after another look at the code.

Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
@benmss benmss merged commit e214326 into staging Apr 18, 2024
art1f1c3R pushed a commit that referenced this pull request Nov 29, 2024
#708)

Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
@benmss benmss deleted the 707-make-repo-commit-extraction-from-provenance-independent branch April 8, 2025 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants