Skip to content

Commit fdaccd4

Browse files
authored
docs(contrib): Start guidelines for schema design (#15037)
### What does this PR try to resolve? This was inspired by a recent Cargo team discussion on whether we should generally elide default values. This will also help with https://rust-lang.github.io/rust-project-goals/2025h1/cargo-plumbing.html Case studies in schema design: - #14506 - #10543 ### How should we test and review this PR? ### Additional information
2 parents a4c0d39 + b9cad3f commit fdaccd4

File tree

2 files changed

+48
-0
lines changed

2 files changed

+48
-0
lines changed

src/doc/contrib/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
- [Architecture](./implementation/architecture.md)
1515
- [New packages](./implementation/packages.md)
1616
- [New subcommands](./implementation/subcommands.md)
17+
- [Data Schemas](./implementation/schemas.md)
1718
- [Console Output](./implementation/console.md)
1819
- [Filesystem](./implementation/filesystem.md)
1920
- [Formatting](./implementation/formatting.md)
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Data Schemas
2+
3+
Cargo reads and writes user and machine facing data formats, like
4+
- `Cargo.toml`, read and written on `cargo package`
5+
- `Cargo.lock`, read and written
6+
- `.cargo/config.toml`, read-only
7+
- `cargo metadata` output
8+
- `cargo build --message-format` output
9+
10+
## Schema Design
11+
12+
Generally,
13+
- Fields should be kebab case
14+
- `#[serde(rename_all = "kebab-case")]` should be applied defensively
15+
- Fields should only be present when needed, saving space and parse time
16+
- Also, we can always switch to always outputting the fields but its harder to stop outputting them
17+
- `#[serde(skip_serializing_if = "Default::default")]` should be applied liberally
18+
- For output, prefer [jsonlines](https://jsonlines.org/) as it allows streaming output and flexibility to mix content (e.g. adding diagnostics to output that didn't prevously have it
19+
- `#[serde(deny_unknown_fields)]` should not be used to allow evolution of formats, including feature gating
20+
21+
## Schema Evolution Strategies
22+
23+
When changing a schema for data that is read, some options include:
24+
- Adding new fields is relatively safe
25+
- If the field must not be ignored when present,
26+
have a transition period where it is invalid to use on stable Cargo before stabilizing it or
27+
error if its used before supported within the schema version
28+
(e.g. `edition` requires a minimum `package.rust-version`, if present)
29+
- Adding new values to a field is relatively safe
30+
- Unstable values should fail on stable Cargo
31+
- Version the structure and interpretation of the data (e.g. the `edition` field or `package.resolver` which has an `edition` fallback)
32+
33+
Note: some formats that are read are also written back out
34+
(e.g. `cargo package` generating a `Cargo.toml` file)
35+
and those strategies need to be considered as well.
36+
37+
When changing a schema for data that is written, some options include:
38+
- Add new fields if the presence can be ignored
39+
- Infer permission from the users use of the new schema (e.g. a new alias for an `enum` variant)
40+
- Version the structure and interpretation of the format
41+
- Defaulting to the latest version with a warning that behavior may change (e.g. `cargo metadata --format-version`, `edition` in cargo script)
42+
- Defaulting to the first version, eventually warning the user of the implicit stale behavior (e.g. `package.edition` in `Cargo.toml`)
43+
- Without a default (e.g. `package.rust-version`, or a command-line flag like `--format-version`)
44+
45+
Note: While `serde` makes it easy to support data formats that add new fields,
46+
new data types or supported values for a field are more difficult to future-proof
47+
against.

0 commit comments

Comments
 (0)