Skip to content

deterministic source archives #2948

Open
@Ericson2314

Description

@Ericson2314

Tarballs contain more information than we need (e.g. users, groups, fine-grained permissions, timestamps), and also allows representing the same information in multiple ways (e.g. order of directory contents, files defined twice). The basic problems this creates is that files cannot be deterministically assembled into an archive. In practice this means:

  • Directory registries cannot be verified against lockfiles as well
  • Packages may accidentally depend on permissions only supported on some platforms
  • Sources besides registries cannot be mirrored (distinct from what sorts of sources can serve as mirrors)
  • Users may unintentionally leak information about their current system when publishing packages.

None of these is terribly pressing on its own, but hopefully they are worthy of a solution in aggregate.

The solution is first carefully deciding which metadata we wish to support---the information our archives will contain, and then picking a canonical form for every possible archive containing that information. A thornier question is whether existing uploads should be normalized according to the chosen schema.

For backwards comparability, it is probably best to stick with some subset tar. This is what Debian does. Where an extraneous field cannot be elided, it should be constrained to some fixed value. Either the most expressive posix tar variant could be used, or the most minimal format that supports the information in question.

Other options might be git's tree objects or Nix's NAR. The Merkle DAG used by the former can lead to better error messages and free dedup, but SHA1 is dubiously secure. The latter can be hashed however we like, but still runs into backwards-compat.

CC @eternaleye

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted`Command-packageS-triageStatus: This issue is waiting on initial triage.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions