|
| 1 | +# Data Schemas |
| 2 | + |
| 3 | +Cargo reads and writes user and machine facing data formats, like |
| 4 | +- `Cargo.toml`, read and written on `cargo package` |
| 5 | +- `Cargo.lock`, read and written |
| 6 | +- `.cargo/config.toml`, read-only |
| 7 | +- `cargo metadata` output |
| 8 | +- `cargo build --message-format` output |
| 9 | + |
| 10 | +## Schema Design |
| 11 | + |
| 12 | +Generally, |
| 13 | +- Fields should be kebab case |
| 14 | + - `#[serde(rename_all = "kebab-case")]` should be applied defensively |
| 15 | +- Fields should only be present when needed, saving space and parse time |
| 16 | + - Also, we can always switch to always outputting the fields but its harder to stop outputting them |
| 17 | + - `#[serde(skip_serializing_if = "Default::default")]` should be applied liberally |
| 18 | +- For output, prefer [jsonlines](https://jsonlines.org/) as it allows streaming output and flexibility to mix content (e.g. adding diagnostics to output that didn't prevously have it |
| 19 | +- `#[serde(deny_unknown_fields)]` should not be used to allow evolution of formats, including feature gating |
| 20 | + |
| 21 | +## Schema Evolution Strategies |
| 22 | + |
| 23 | +When changing a schema for data that is read, some options include: |
| 24 | +- Adding new fields is relatively safe |
| 25 | + - If the field must not be ignored when present, |
| 26 | + have a transition period where it is invalid to use on stable Cargo before stabilizing it or |
| 27 | + error if its used before supported within the schema version |
| 28 | + (e.g. `edition` requires a minimum `package.rust-version`, if present) |
| 29 | +- Adding new values to a field is relatively safe |
| 30 | + - Unstable values should fail on stable Cargo |
| 31 | +- Version the structure and interpretation of the data (e.g. the `edition` field or `package.resolver` which has an `edition` fallback) |
| 32 | + |
| 33 | +Note: some formats that are read are also written back out |
| 34 | +(e.g. `cargo package` generating a `Cargo.toml` file) |
| 35 | +and those strategies need to be considered as well. |
| 36 | + |
| 37 | +When changing a schema for data that is written, some options include: |
| 38 | +- Add new fields if the presence can be ignored |
| 39 | +- Infer permission from the users use of the new schema (e.g. a new alias for an `enum` variant) |
| 40 | +- Version the structure and interpretation of the format |
| 41 | + - Defaulting to the latest version with a warning that behavior may change (e.g. `cargo metadata --format-version`, `edition` in cargo script) |
| 42 | + - Defaulting to the first version, eventually warning the user of the implicit stale behavior (e.g. `package.edition` in `Cargo.toml`) |
| 43 | + - Without a default (e.g. `package.rust-version`, or a command-line flag like `--format-version`) |
| 44 | + |
| 45 | +Note: While `serde` makes it easy to support data formats that add new fields, |
| 46 | +new data types or supported values for a field are more difficult to future-proof |
| 47 | +against. |
0 commit comments