Closed
Description
I've recently gone through our publishing logic and would like to document my findings and propose some changes. This was originally raised in the context of a cargo publish --dry-run
option in #1517, but I think some refactoring here would help with proposed enhancements such as background jobs (#1466) and direct client uploads to S3.
Currently (edit: updated 2019-11-13)
Our publishing logic currently follows the following sequence:
- Check metadata length against the global max crate size
- Decode metadata
- Check for non-empty: description, license, authors
- Verify user is authenticated
- Obtain database connection and enter transaction
- Ensure user has a verified email address
- Validate URLs if present: homepage, documentation, repository (
NewCrate::validate
) - Ensure name is not reserved (
NewCrate::ensure_name_not_reserved
) - If crate is not present, insert it and add the user as an owner (
NewCrate::save_new_crate
) - If this is a brand new crate, check the rate limit (
NewCrate::create_or_update
) - If crate already existed, update it (
NewCrate::create_or_update
) - Check that the user has publish rights on the crate
- Check that the new name is identical to the existing name (sans-canonicalization)
- Check that Content-Length header exists and doesn't exceed the crate specific max
- Validate license if specified (
NewVersion::validate_license
viaNewVersion::new
) - Check if the version already exists (
NewVersion::save
) - Insert version and add authors to the version (
NewVersion::save
) - Iterate over deps (
models::dependency::add_dependencies
)- Check that not an alternate registry dependency
- Check that crate exists in the database
- Enforce "no wildcard" constraint
- Handle package renames
- Insert deps into database
- Return vec of git::Dependency
- Update keywords (
Keyword::update_crate
) - Update categories, returning a list of ignored categories for warning (
Category::update_crate
) - Update badges, returning a list for warnings (
Badge::update_crate
)- Validate deserialization to our enum, collecting invalid ones
- Update database
- Use database to obtain max_version of the crate (for response)
- If readme provided, enqueue rendering and upload as a background job
- Proposed --dry-run check
- Upload crate (
uploaders::upload_crate
)- Read remaining request body
- Verify tarball
- Upload crate
- Calculate crate tarball hash
- Enque index update
- Encode response
- Commit database transaction
Background job: Render and upload README
Defined in render::render_and_upload_readme
- Render README (render::readme_to_html)
- Obtain connection
- Record README rendered_at for version (Version::record_readme_rendering)
- Upload the rendered README (uploaders::upload_readme)
Background job: Update Index
Defined in git::add_crate
- Determine file path from crate name
- Append line of JSON data to file in registry
- Commit and push
Proposed
Notes
- We enforce a 50MB max in nginx
- We should add a configuration entry for the global max size of the metadata (we currently use max tarball size several places)
- A few guidelines I tried to follow:
- Identify and reject invalid requests as quickly as possible.
- Minimize the work done while holding a database connection, especially after entering the main transaction.
- The final main transaction may need to repeat some queries to ensure it doesn't rely on data obtained outside of the transaction.
Verify Headers
- Verify user is authenticated
- Check that Content-Length header exists and doesn't exceed global max tarball + global max metadata + 2 size fields
Verify Request Body
- Check metadata length against the global max metadata size
- Read in metadata
- Read in tarball size, verify tarball size + metadata size + 2 size fields == Content-Length
- Decode metadata
- Check for non-empty: description, license, authors
- Validate URLs if present: homepage, documentation, repository
- Validate license if specified
- Iterate over deps
- Enforce "no wildcard" constraint on deps
- Check that not an alternate registry dependency
- Validate deserialization of badges into enum, collect invalid ones
- Read remaining request body
- Verify tarball
- Calculate crate tarball hash (for registry)
With database, outside of main transaction
- Obtain database connection
- Ensure user has a verified email address
- Ensure name is not reserved
- Obtain a list of valid and invalid categories
- Ensure that all deps exist
- If crate exists
- Check that the new name is identical to the existing name (sans-canonicalization)
- Verify tarball doesn't exceed the crate specific max
- Check that the user has publish rights on the crate
- If crate didn't exist
- Check the rate limit
- Verify tarball doesn't exceed default max
- Check if the version already exists
- --dry-run check
Start writing within the transaction
- Enter database transaction
- If crate didn't exist then insert and add the user as an owner
- If crate was present, update it (TODO: review what fields on the crate we update under which circumstances. How do we deal with prerleases (Incorrect metadata coming from last published version #1389) and backports?)
- Insert version (abort if exists) and add authors to the version
- Record README rendered_at for version
- Iterate over deps
- Handle package renames
- Insert deps into database
- Update keywords
- Update categories
- Update badges
- Use database to obtain max_version of the crate (for response)
- Iterate over deps to get a vec of git::Dependency
- Upload crate
- Background jobs
- If readme provided, enqueue rendering and upload as a background job
- Enqueue index update
- Commit database transaction
- Encode response