vendor/ verification #121
Description
dep needs a way of verifying that the contents of vendor/
are what's expected. This serves three purposes:
- Performance: without a verification mechanism, the only safe choice dep can make is to drop and recreate vendor in its entirety on every operation. This is wasteful and slow; any operation that avoids unnecessary disk writes to determine if vendor needs updating will be faster, and would be a huge boon.
- Transparency:
dep status
is very limited in what it can report aboutvendor/
if it can't verify thatvendor
is actually what's expected. - Security: without a verification mechanism, we can't know that the code we're building with is the code we expect it should be. We haven't articulated a security model yet, but this is obviously a significant part of it.
The simple, obvious solution is to deterministically hash (SHA256, I imagine) the file trees of dependencies, then store the resulting digest in the lock. However, the need to perform vendor stripping (#120) complicates this, as that creates perfectly valid situations in which the code under vendor/
might differ from the upstream source. The more choice around stripping modes we create, the more complex and error-prone this challenge becomes.
Or, we might simply rely on the hashing performed by the underlying VCS - indeed, we already record "revisions" in the lock, so we're already sorta doing this. Relying on it is triply disqualified, though:
- It suffers from the same issues with vendor pruning as above.
- On the a security front, git, hg, bzr, and svn all rely on the broken SHA1 for basic integrity checks. (Even when git was first released, they made it explicitly clear that their security model is not based on the internal use of SHA1)
- It may bind us to VCS being the underlying source for code, which is not a corner we want to paint ourselves into.
The only path forward I've come up with so far is to compute and record hashes for individual files contained in dependencies (perhaps before stripping, perhaps after). This would allow us to be more adaptive about assessing package integrity - e.g., we can know that it's OK if certain files are missing.
The downside is that these lists of file hash digests would be potentially huge (e.g., if you import k8s), though that might be mitigated by placing them under vendor/ rather than directly in the lock - sdboyer/gps#69. Also, without the larger security model in place, I don't know if disaggregating hashing to the individual file level might compromise some broader notion of integrity that will prove important.
We might also record just the project unit-level hash against some future day where we've reinvented GOPATH and are no longer relying on vendor.