Genericize OAuth profiles#13477
Conversation
|
☔ The latest upstream changes made this pull request unmergeable. Please resolve the merge conflicts. |
|
This is a draft PR for work toward resolving two RFCs:
This is primarily here for asynchronous review of what I'm proposing as the resolution of the RFCs. The work is still in progress. |
| -- was created. Batched by id range, idempotent via ON CONFLICT DO NOTHING. | ||
|
|
||
| SET LOCAL lock_timeout = '10s'; | ||
| SET LOCAL statement_timeout = '120s'; |
There was a problem hiding this comment.
Could you elaborate on why this commit is needed? After we merged #12792, we ran the SQL in https://github.com/rust-lang/crates.io/blob/150d165e4d5ecda24b60f146589db4def7fd34bf/migrations/data_oauth_github.sql outside of the migration framework that runs on deploy, so the backfill has already been handled.
| //! `users.gh_*` to `oauth_github`. That requires every user with | ||
| //! `gh_id > 0` to have a matching `oauth_github` row. The backfill migration | ||
| //! copies the data, and this module verifies the invariant on startup -- | ||
| //! if any user is missing a row the app refuses to start. |
There was a problem hiding this comment.
Yeah this is not the kind of thing we'd want to fail to start the app for
| let github = RealGitHubClient::new(client); | ||
| let github = Box::new(github); | ||
| let github: std::sync::Arc<dyn crates_io_github::GitHubClient> = | ||
| std::sync::Arc::new(RealGitHubClient::new(client.clone())); |
There was a problem hiding this comment.
IMO this commit should be pulled out into a separate PR from anything having to do with usernames-- we want to introduce the crates.io username first, separately from adding any other oauth provider.
| -- it back would require distinguishing backfilled rows from rows inserted | ||
| -- by login, which we cannot do. The `oauth_github` table itself is dropped | ||
| -- by login, which we can't do. The `oauth_github` table itself is dropped | ||
| -- by the earlier migration 2026-01-20-162913_oauth_github_table/down.sql. |
There was a problem hiding this comment.
Yeah any comment corrections should be made in a separate commit, possibly a separate PR
| //! | ||
| //! Then run just these tests: | ||
| //! | ||
| //! cargo test --test integration docker_integration |
There was a problem hiding this comment.
I don't use docker; could you elaborate on why separate tests are needed?
| # Image for running the crates.io server in integration tests. | ||
| # Debug build for fast compilation and full debug output. | ||
|
|
||
| ARG RUST_VERSION=1.94.1 |
There was a problem hiding this comment.
Again I don't use docker, but this looks significantly different than the existing docker file at https://github.com/rust-lang/crates.io/blob/main/backend.Dockerfile, could you explain why this one is different?
Identity handling is hardwired to GitHub via app.github and app.github_oauth. This adds a provider abstraction so adding a new provider is just a new impl. Two new types: - OAuthProvider trait: async methods for the OAuth dance (authorize_url, exchange_code, fetch_user_info) and a stable name() used for routing oauth_<name> table lookups. - UserInfo: provider-agnostic user profile returned by fetch_user_info. account_id is String because providers disagree on ID shape (GitHub: i64, Bitbucket: UUID, GitLab: numeric but can exceed i64 on some instances). GitHubProvider wraps the existing BasicClient and GitHubClient so no identity plumbing below it changes. Team-membership code still calls GitHubClient directly. App.github changes from Box to Arc so providers and App can share the same client instance. build_github_oauth_client() is extracted to deduplicate the OAuth endpoint URLs between App.github_oauth and a future GitHubProvider construction site. The provider isn't wired up to a registry or routed through yet -- that lands in the next commit.
Adds a ProviderRegistry attached to App.oauth_providers, mapping a provider name to its OAuthProvider impl. The session controller (in the next commit) resolves providers by name, with unknown names returning 404. GitHubProvider is registered at server startup as the only entry for now. The registry's name() lookup matches OAuthProvider::name(), which is also the table suffix (oauth_<name>) used by callers that need to read or write provider-specific identity data. Test setup constructs an empty registry; tests that exercise the session controller will add provider mocks in subsequent commits.
session::begin and session::authorize now dispatch through app.oauth_providers instead of hardcoded app.github_oauth and app.github. The ?provider= query param selects a registered provider (defaults to "github" so the frontend keeps working). Provider name is carried in server-side session state, not as a callback query param, so it can't be spoofed. User lookups on the identity path now read from oauth_github instead of users.gh_*. Writes stay dual because the gh_* columns are NOT NULL; stopping writes is a follow-up after a schema change makes them nullable. Config rename: gh_token_encryption -> oauth_token_encryption. The old env var (GITHUB_TOKEN_ENCRYPTION_KEY) is accepted as a deprecated alias with a WARN log. Team membership still goes through GitHubClient directly -- stays GitHub-bound until Tier 4. Adds regression test confirming sync_admins still matches users via oauth_github.account_id after the identity read cutover.
Add Dockerfile.integration and a docker-compose `integration` service that builds the server in debug mode and runs it against the cargo_registry_test database on port 9888. RUST_LOG=debug is on by default so container logs are useful for debugging. src/tests/docker_integration.rs has tests that hit the live container: session begin/authorize flows, provider selection (default and explicit github, unknown provider 404), CSRF rejection, and seeded-user identity checks via /api/v1/users/:login and /api/v1/me with a forged session cookie. Tests skip automatically when the container isn't up. Also tightens a few doc comments and contractions caught during review.
The genericize-profiles work needs a way to identify which OAuth
provider a user or credential row is associated with. A native
Postgres ENUM gives readable values in raw SQL ('github' vs. an
opaque integer) and lets the database enforce the closed set of
supported providers.
Starts with a single variant 'github'; future providers add
variants via ALTER TYPE ... ADD VALUE.
Each user needs a way to declare which OAuth provider they consider their primary identity — the one whose username/avatar appear in public profile views and whose credentials are used for API token encryption. With only GitHub today every user defaults to 'github', but the column is the prerequisite for multi-provider support. Migration: NOT NULL DEFAULT 'github', which PG 11+ applies as a metadata-only operation (no table rewrite). Rust side adds the OAuthProviderId enum with FromSql/ToSql for the native PG enum, FromStr for string-based lookups, and a round-trip integration test that exercises both the DB default path and explicit ToSql.
9c6eadc to
30855bb
Compare
|
Re: the Docker tests — these run against the real running server (HTTP layer, real session cookies, real DB round-trips for the OAuth flow), which the existing mock-based tests can't cover. They skip automatically when the container isn't up, so Re: |
The differences I'm referring to specifically are things like the renovate comments to keep versions in the Dockerfile up to date. Why isn't the new Dockerfile here following that pattern established in this repo? |
No description provided.