Skip to content

rustdoc: adjust static file names for better cache configuration #98413

Closed
@jsha

Description

@jsha

rustdoc has a number of static files that should really be long-cached by the browser for best loading performance: the fonts, the CSS, storage.js, main.js, and so on. We should add a hash to their names so that services can be more confident in setting long cache headers for them.

Right now we categorize things into Unversioned, ToolchainSpecific, and InvocationSpecific:

enum SharedResource<'a> {
/// This file will never change, no matter what toolchain is used to build it.
///
/// It does not have a resource suffix.
Unversioned { name: &'static str },
/// This file may change depending on the toolchain.
///
/// It has a resource suffix.
ToolchainSpecific { basename: &'static str },
/// This file may change for any crate within a build, or based on the CLI arguments.
///
/// This differs from normal invocation-specific files because it has a resource suffix.
InvocationSpecific { basename: &'a str },
}

Unversioned is used just for the font files. ToolchainSpecific is used for the CSS, the images, and most of the JS. InvocationSpecific is used for search-indexN.NN.N.js, source-filesN.NN.N.js, cratesN.NN.N.js, the JS that contains the list of implementors on trait pages, and the JS that contains the list of additional sidebar items (siblings in a module).

Unversioned gets no infix. ToolchainSpecific gets a version suffix, like main1.63.0.js (from main.js). InvocationSpecific gets the same version suffix.

Unversioned and ToolchainSpecific files should be infinitely cacheable. Right now, that's not the case for ToolchainSpecific, because multiple toolchains have the same version infix. For instance, every nightly build right now creates a main1.63.0.js, but it's potentially different each night. That means https://doc.rust-lang.org/nightly/main1.63.0.js potentially changes every night, and can't be long-cached. Since docs.rs uses the nightly toolchain, the main1.63.0.js it produces for a crate today may be different than the one it produces for a crate it builds tomorrow.

docs.rs has special code to recognize the ToolchainSpecific files and rename them to contain a date and a hash, like https://docs.rs/main-20220517-1.63.0-nightly-4c5f6e627.js. But doc.rust-lang.org doesn't have that code, and as a result is less able to cache things that should be cached. And anyone who self-hosts docs is on their own.

I propose that we change our file naming scheme. All Unversioned and ToolchainSpecific files should be emitted to a subdirectory s/<hash>/, where <hash> is calculated over the contents of all of those files together. This makes it easy to configure a web server to set Cache-Control headers for everything under that subdirectory.

Advantage: this makes calculating URLs for such resources easy, especially when the calculation is done in JS. Disadvantage: if one file changes, the whole hash changes, potentially requiring the user to load more files when navigating between crates generated with different rustdoc versions.

Alternately, we could add a hash of each individual file to that file's name. That makes calculating URLs harder, but means better reuse of cached data across different nightly versions.

/cc @rust-lang/rustdoc @rust-lang/docs-rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-docs-rsRelevant to the docs-rs subteam, which will review and decide on the PR/issue.T-infraRelevant to the infrastructure team, which will review and decide on the PR/issue.T-rustdocRelevant to the rustdoc team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions