-
Notifications
You must be signed in to change notification settings - Fork 28
chore: allow independent extraction of repo and commit from provenance #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: allow independent extraction of repo and commit from provenance #708
Conversation
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
@@ -53,7 +53,7 @@ def extract_repo_and_commit_from_provenance(payload: InTotoPayload) -> tuple[str | |||
logger.debug(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have changed the implementation of _extract_from_*
functions, there are 2 duplicated logic within the public extract_repo_and_comit_from_provenance
, which are already handled in _extract_from_*
functions:
- catching
JsonError
macaron/src/macaron/repo_finder/provenance_extractor.py
Lines 52 to 54 in 6cae4fc
except JsonError as error: logger.debug(error) raise ProvenanceError("JSON exception while extracting from provenance.") from error - raising
ProvenanceError
when both repo and commit are not available:
macaron/src/macaron/repo_finder/provenance_extractor.py
Lines 56 to 62 in 6cae4fc
if not (repo or commit): msg = ( f"Extraction from provenance not supported for versions: " f"predicate_type {predicate_type}, in-toto {str(type(payload))}." ) logger.debug(msg) raise ProvenanceError(msg)
I think it's okay to remove catching JsonError
from extract_repo_and_comit_from_provenance
function as we are not treating missing repo url and/or commit as a critical error anymore (for example, a witness provenance can have commit hash but no repo).
At the moment, I can see some differences in handling missing repo url and/or commit hash between _extract_from_*
functions:
- Provenance has repo and no commit hash:
_extract_from_slsa_v1
raisesProvenanceError
.- other
_extract_from_*
functions will return(<repo_url>, <empty_string_for_commit_hash>)
- Provenance has repo and commit hash: all of them return
(<repo_url>, <commit_hash>)
- Provenance has no repo and commit hash:
_extract_from_slsa_v1
raisesProvenanceError
. This is because for slsa provenance v1 we need the repo url for obtaining the commit hash.- other
_extract_from_*
functions will return(<empty_string_for_repo_url>, <commit_hash>)
- Provenance has no repo and no commit hash: all of them will raise
ProvenanceError
.
For me, even though most of the differences are implementation details, they could make it difficult for us to maintain because we have to take care of the differences in behaviors within extract_repo_and_comit_from_provenance
:
- An empty string is used to represent repo url or commit hash is missing
- A raised
ProvenanceError
is also used to represent repo_url or commit hash is missing
I think to improve it we must decide on the what to use for indicating:
- A critical error that prevent us from reading provenances for repo url and commit (e.g. the type of provenance is not supported) -> I think we can raise
ProvenanceError
- We support and can read the content of the provenance but repo url and/or commit hash are not available, as long as it's different from the usage of
ProvenanceError
above.
I don't have any concrete ideas on the implementation details but as long as the issues I mentioned above is resolve, it's fine by me. Please let me know what you think about it.
…t to return None over raising errors. Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
I had a small extra question after another look at the code.
Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
#708) Signed-off-by: Ben Selwyn-Smith <benselwynsmith@googlemail.com>
This PR allows the provenance extractor to return partial results when either the repo URL or commit cannot be found. Tests have been updated to ensure this change works as intended while retaining the previous non-overlapping functionality, i.e. returning both, or neither on critical errors.
This change does not apply to SLSA v1 as these use the repo URL as a key to access the correct commit. To support this in SLSA v1 would require the repo URL be provided through some other means before provenance extraction begins.
Updates #707