Skip to content

General outline of proposed updates to publishing logic #1522

Closed
@jtgeibel

Description

@jtgeibel

I've recently gone through our publishing logic and would like to document my findings and propose some changes. This was originally raised in the context of a cargo publish --dry-run option in #1517, but I think some refactoring here would help with proposed enhancements such as background jobs (#1466) and direct client uploads to S3.

Currently (edit: updated 2019-11-13)

Our publishing logic currently follows the following sequence:

  • Check metadata length against the global max crate size
  • Decode metadata
  • Check for non-empty: description, license, authors
  • Verify user is authenticated
  • Obtain database connection and enter transaction
  • Ensure user has a verified email address
  • Validate URLs if present: homepage, documentation, repository (NewCrate::validate)
  • Ensure name is not reserved (NewCrate::ensure_name_not_reserved)
  • If crate is not present, insert it and add the user as an owner (NewCrate::save_new_crate)
  • If this is a brand new crate, check the rate limit (NewCrate::create_or_update)
  • If crate already existed, update it (NewCrate::create_or_update)
  • Check that the user has publish rights on the crate
  • Check that the new name is identical to the existing name (sans-canonicalization)
  • Check that Content-Length header exists and doesn't exceed the crate specific max
  • Validate license if specified (NewVersion::validate_license via NewVersion::new)
  • Check if the version already exists (NewVersion::save)
  • Insert version and add authors to the version (NewVersion::save)
  • Iterate over deps (models::dependency::add_dependencies)
    • Check that not an alternate registry dependency
    • Check that crate exists in the database
    • Enforce "no wildcard" constraint
    • Handle package renames
    • Insert deps into database
    • Return vec of git::Dependency
  • Update keywords (Keyword::update_crate)
  • Update categories, returning a list of ignored categories for warning (Category::update_crate)
  • Update badges, returning a list for warnings (Badge::update_crate)
    • Validate deserialization to our enum, collecting invalid ones
    • Update database
  • Use database to obtain max_version of the crate (for response)
  • If readme provided, enqueue rendering and upload as a background job
  • Proposed --dry-run check
  • Upload crate (uploaders::upload_crate)
    • Read remaining request body
    • Verify tarball
    • Upload crate
    • Calculate crate tarball hash
  • Enque index update
  • Encode response
  • Commit database transaction

Background job: Render and upload README

Defined in render::render_and_upload_readme

  • Render README (render::readme_to_html)
  • Obtain connection
  • Record README rendered_at for version (Version::record_readme_rendering)
  • Upload the rendered README (uploaders::upload_readme)

Background job: Update Index

Defined in git::add_crate

  • Determine file path from crate name
  • Append line of JSON data to file in registry
  • Commit and push

Proposed

Notes

  • We enforce a 50MB max in nginx
  • We should add a configuration entry for the global max size of the metadata (we currently use max tarball size several places)
  • A few guidelines I tried to follow:
    • Identify and reject invalid requests as quickly as possible.
    • Minimize the work done while holding a database connection, especially after entering the main transaction.
    • The final main transaction may need to repeat some queries to ensure it doesn't rely on data obtained outside of the transaction.

Verify Headers

  • Verify user is authenticated
  • Check that Content-Length header exists and doesn't exceed global max tarball + global max metadata + 2 size fields

Verify Request Body

  • Check metadata length against the global max metadata size
  • Read in metadata
  • Read in tarball size, verify tarball size + metadata size + 2 size fields == Content-Length
  • Decode metadata
  • Check for non-empty: description, license, authors
  • Validate URLs if present: homepage, documentation, repository
  • Validate license if specified
  • Iterate over deps
    • Enforce "no wildcard" constraint on deps
    • Check that not an alternate registry dependency
  • Validate deserialization of badges into enum, collect invalid ones
  • Read remaining request body
  • Verify tarball
  • Calculate crate tarball hash (for registry)

With database, outside of main transaction

  • Obtain database connection
  • Ensure user has a verified email address
  • Ensure name is not reserved
  • Obtain a list of valid and invalid categories
  • Ensure that all deps exist
  • If crate exists
    • Check that the new name is identical to the existing name (sans-canonicalization)
    • Verify tarball doesn't exceed the crate specific max
    • Check that the user has publish rights on the crate
  • If crate didn't exist
    • Check the rate limit
    • Verify tarball doesn't exceed default max
  • Check if the version already exists
  • --dry-run check

Start writing within the transaction

  • Enter database transaction
  • If crate didn't exist then insert and add the user as an owner
  • If crate was present, update it (TODO: review what fields on the crate we update under which circumstances. How do we deal with prerleases (Incorrect metadata coming from last published version #1389) and backports?)
  • Insert version (abort if exists) and add authors to the version
  • Record README rendered_at for version
  • Iterate over deps
    • Handle package renames
    • Insert deps into database
  • Update keywords
  • Update categories
  • Update badges
  • Use database to obtain max_version of the crate (for response)
  • Iterate over deps to get a vec of git::Dependency
  • Upload crate
  • Background jobs
    • If readme provided, enqueue rendering and upload as a background job
    • Enqueue index update
  • Commit database transaction
  • Encode response

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-backend ⚙️A-publishC-internal 🔧Category: Nonessential work that would make the codebase more consistent or clear

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions