Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: cargo-sbom #3553

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions text/3553-cargo-sbom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
- Feature Name: `cargo-sbom`
- Start Date: 2023-11-01
- RFC PR: [rust-lang/rfcs#3553](https://github.com/rust-lang/rfcs/pull/3553)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

This RFC adds an option to Cargo that emits information for building a Software Bill of Materials (SBOM) precursor file alongside compiled artifacts. Similar to how Cargo "dep-info" (.d) files, this change emits SBOM data in a Cargo-specific format alongside outputs in the `target` directory. External tools or Cargo subcommands can consume this file and transform it into an SBOM such as SPDX or CycloneDX.

# Motivation
[motivation]: #motivation

An SBOM (software bill of materials) is a list of all components and dependencies used to build a piece of software. The two leading SBOM formats being adopted by industry are SPDX and CycloneDX. Both are still evolving and have multiple specification versions & data formats (JSON, XML).

New government initiatives aimed at improving the security of the software supply chain such as the US "Executive Order 14028: Improving the Nation's Cybersecurity" or the EU "Cyber Resilience Act" require a Software Bill of Materials. Generating accurate SBOMs with Cargo is currently difficult because, depending on target selection or activated features, the dependencies may be different.

For workspaces that generate multiple compiled artifacts, each artifact may have different dependencies referenced. Existing tools (see prior art section) attempt to approximate the correct dependency set, however precise dependency information for each compiled artifact is difficult without built-in Cargo support. Generating the SBOM information at the same time as the compiled artifact allows precise dependency information to be emitted for each compiled artifact.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

The generation of SBOM information is controlled by Cargo's configuration. To enable SBOM generation, set the following configuration:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest changing the name SBOM. The current text is a bit misleading, for example “enable SBOM generation”. Though, I don't have a better name in mind :(


```toml
[build]
sbom = true
```

Or use the environment variable `CARGO_BUILD_SBOM=true`.

If enabled, an SBOM file will be placed next to each compiled artifact for `bin`, `staticlib`, `cdylib` crate types in the `target` directory with the name `<artifact>.cargo-sbom.json`. The SBOM will contain information about dependencies used to build the compiled artifact.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to deal with duplicate artifact name. Cargo doesn't really handle the issue at this moment. In SBOM it is more unacceptable and must be resolved. See rust-lang/cargo#13709 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated SBOM could link back to the artifact that it corresponds to in some way.

To be useful for cargo auditable use case, it needs to be generated before the final binary, so things like a hash of the binary aren't workable. I think a field in the JSON indicating which file name it corresponds to is best.


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The SBOM file generated by Cargo is *not* intended as a final SBOM artifact, but rather a precursor. Post-processing tooling can use the information produced here as part of building a final SBOM.

The SBOM file will be written to disk before `rustc` is executed for the each artifact. This enables [`RUSTC_WORKSPACE_WRAPPER`](https://doc.rust-lang.org/cargo/reference/config.html#buildrustc-workspace-wrapper) to point at a program that can utilize the SBOM file to embed information into the binary if desired. The environment variable `CARGO_SBOM_PATH` will be set to the full path to the SBOM file.

## Format
The format will use JSON, but the exact format is not specified in this RFC. Additional fields can be added as needed. The JSON will include a `format-version` in case breaking changes are necessary.

### Resolved Dependency Tree
The SBOM file will include the following information for each crate. Note that a crate refers to a [single build unit (library or executable)](https://doc.rust-lang.org/cargo/appendix/glossary.html#crate).
A [package](https://doc.rust-lang.org/cargo/appendix/glossary.html#package) may contain multiple crates.
- Package ID
- Name
- Version
- Source (registry / git / path etc.)
- Checksum (if available)
- Dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there are two ways of looking at dependencies: what each package needs, and the final resolved graph.

For example, if one package depends on rand with features = ["std", "getrandom"], and another with features = ["std", "simd_support"], the final resolved features will be ["std", "getrandom", "simd_support"]. Depending on the use case you may need either or both representations (direct package dependencies and the resolved graph).

cargo metadata exposes both (under "packages" and "resolve" fields), but inaccurately:

I think it would be best for the SBOM to also expose both, accurately this time.

- Type (normal, build)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to distinguish build dependencies from normal ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intent here is so that dependencies that were only used to build build scripts could be easily filtered out. However, it's also possible that this field could be removed if it's easy enough to build dependencies based on crate-type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to know whether e.g. OpenSSL was used at build time to compute something once, or is actually included in the generated binary. That determines whether you need to patch it or even take it offline ASAP because of a new critical CVE or not.

- Activated features

If a crate is used as both a normal dependency and a build dependency that is separately compiled, then separate entries will exist in the dependency tree with the correct activated features listed for each instance.

Checksum is an optional field, since only crates from registries have checksums. If a checksum is needed for a crate that comes from a path dependency for example, it will be up to the post-processing tool to produce an appropriate value.

If further information is needed (such as license), then the post-processing tool can use `cargo metadata` or another mechanism to find it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not required for the MVP, I think it would be best if the SBOM file could contain all the info from cargo metadata eventually. It is desirable for SBOMs to list the licenses of the dependencies. And there are situations where you can run the build and get the SBOM file, but cannot run cargo metadata: rust-secure-code/cargo-auditable#128

Alternatively this could be addressed by evolving cargo metadata, but having to only deal with one input file and one data format would be easier on the post-processing tools.


### Resolved build configuration
- Rust toolchain version
- `RUSTFLAGS`
- Current build profile name
- Selected profile values

Comment on lines +66 to +67
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Selected profile values
- Selected profile values
- Two resolved dependency trees: one for normal dependencies, another for build dependencies, matching the behavior of [cargo features v2](https://rust-lang.github.io/rfcs/2957-cargo-features2.html)

## Security

Cargo's SBOM file provides a report of the components and dependencies used by cargo to build a software artifact.

Cargo does not defend against malicious components or dependencies changing the SBOM, or accidentally or maliciously concealing themselves from the SBOM. In particular, components or dependencies added by build scripts or external tools might not be accurately represented in the SBOM file produced by Cargo. Ideally, tools should provide their own SBOMs, and build scripts should modify the SBOM via supported cargo interfaces (see future possibilities).

# Drawbacks
[drawbacks]: #drawbacks

It introduces yet another SBOM format. However, the format is specifically designed to be used as an intermediate, to be converted to an industry-standard format by external tooling.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Since there is no consensus on a single SBOM format within the software industry, and existing formats are still evolving, Cargo should not pick an existing SBOM format. If Cargo were to use existing SBOM formats, multiple formats (and multiple versions of each format) would need to be supported. The task of generating a specific SBOM format is best left to applications outside Cargo or Cargo extension. Cargo is not in a position to produce an SBOM that is compliant with internal and external regulations.

Unfortunately it's difficult to extract accurate SBOM information with existing options. Using the `Cargo.lock` file or `cargo metadata` overincludes dependencies. Additionally, since Cargo has many different commands that produce compiled artifacts (build, test, bench, etc.) and each of these commands take arguments that can affect the dependency list it's difficult to ensure that the correct dependency list is used.

Adding an option to `cargo metadata` to support resolver v2 would help with overinclusion of dependencies, but still makes it difficult to ensure the exact set of features, command-line arguments, and other options match the command used to produce the final artifact. The CLI would need substantial changes to allow users to control package and build target selection. Additionally, since build scripts (`build.rs`) may impact the output, `cargo metadata` may need to execute them.

Another alternative is to extract information by setting the `RUSTC_WRAPPER` environment variable, then capture feature flags and dependencies via a wrapper tool. This would require the wrapper tool to parse the rustc command line arguments to capture the set of feature flags and referenced dependencies. This approach would prevent other uses of `RUSTC_WRAPPER`, as well as being potentially fragile.

# Prior art
[prior-art]: #prior-art

* [RFC2801](https://github.com/rust-lang/rfcs/pull/2801): Proposes embedding dependency information directly into the binary. Implemented as the `cargo auditable` extension.
* [cargo-auditable](https://github.com/rust-secure-code/cargo-auditable): Cargo extension that embeds a subset of the information described in this RFC directly into the binary. The JSON format used by this RFC could be based on the cargo-auditable format.
* [cargo-cyclonedx](https://github.com/CycloneDX/cyclonedx-rust-cargo): Cargo extension to generate a CycloneDX SBOM.
* [cargo-bom](https://github.com/sensorfu/cargo-bom): Cargo extension to generate a BOM in an ASCII format including license information.
* [cargo build-plan (#5579)](https://github.com/rust-lang/cargo/issues/5579): Provides an option to emit a JSON representation of the commands to execute, without actually running them. This option has poor integration with `build.rs` and was [planned for deletion](https://github.com/rust-lang/cargo/issues/7614) in 2018.
* [cargo unit graph (#8002)](https://github.com/rust-lang/cargo/issues/8002): Very similar to what this RFC intends on writing to disk. However, since unit-graph runs wihtout a build, it cannot take `build.rs` output into account.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

The exact specifics about what will be included in the SBOM and the specific JSON format are subject to change during the implementation of the RFC.

# Future possibilities
[future-possibilities]: #future-possibilities

Additional fields can be added to the SBOM without a breaking change as new requirements are identified such as:
* Environment variables read
* Additional profile flags
* External tool versions (linker, C compiler, OS)

## Build scripts
Build scripts could communicate back to Cargo to inject additional dependencies into the SBOM. For example, if a crate builds `C` code and then links with it, it would emit a build script instruction that causes Cargo to read in a file describing the `C` dependency.
```
cargo::sbom=<PATH>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Cargo is not able to handle multiple values under the same instruction key, should we explicitly call out that multiple paths must be joined via std::env::join_paths?

````
Cargo would then include the additional dependency information in the SBOM graph.

## Embedding dependency information into binaries
The implementation of [RFC2801](https://github.com/rust-lang/rfcs/pull/2801) could be based on the information provided by this RFC.