Skip to content

Conversation

@carols10cents
Copy link
Member

No description provided.

@carols10cents carols10cents force-pushed the more-logins branch 2 times, most recently from 7d91bcc to e072a0c Compare February 18, 2025 02:32
@carols10cents carols10cents force-pushed the more-logins branch 6 times, most recently from a31b6fa to 6c9d8ff Compare March 7, 2025 21:29
That has a `provider` column that will (for now) always be set to 0,
which corresponds to `AccountProvider::Github`.

The table's primary key is (provider, account_id), which corresponds to
(0, gh_id). This constraint will mean a particular GitHub/GitLab/etc
account, identified from the provider by an ID, may only be associated
with one crates.io user record, but a crates.io user record could
(eventually) have *both* a GitHub *and* a GitLab account associated with
it (or two GitHub accounts, even!)

This is the first step of many to eventually allow for crates.io
accounts linked to other OAuth providers in addition/instead of GitHub.

No code aside from one test is reading from the linked accounts table at
this time.

No backfill has been done yet.

No handling of creating/associating multiple OAuth accounts with one
crates.io account has been done yet.
Right now, this will always have the same value as gh_login on the users
table and login on the linked accounts table. All of these values will
get updated when we get a new gh_login value.

Eventually, we're going to have to decouple the concept of crates.io
"username" from the logins of the various account(s), which you may or
may not want to update when you update your GitHub or other login, and
which may or may not conflict with other users' crates.io usernames.

But for now, set up for that future and defer the harder decisions until
later by making this field always get the same value as gh_login.

This adds a `username` field to the JSON views returned from the API but
does not use that field anywhere yet.

Question: should team owner JSON also have a field named `username` as
done in this commit? it's a little weird for a team to have a username
because it's not a user. but it's consistent. something more generic
like `name` might also be confusing. something more specific like
`crates_io_identifier` is more verbose but less confusable. shrug
Copy link

@blm768 blm768 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few thoughts on the finer details of the database migration, but the approach seems sound overall.

pub struct LinkedAccount {
pub user_id: i32,
pub provider: AccountProvider,
pub account_id: i32, // corresponds to user.gh_id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could very well be a problem to deal with later, but I could imagine having a provider that represents its IDs some other way, e.g. strings or UUIDs. We probably don't want to commit to such a design yet, but perhaps we'll eventually need some kind of "table-per-provider" approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good point. Hmm. I'm trying to think through whether I'd rather start with an oauth_github table and then add, say, oauth_gitlab, etc tables, or if i'd rather start with what's here and then need to migrate to a table-per-provider later....

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it'll probably be hard to know what we need until we need it. The good news is that we shouldn't be boxing ourselves into a hole that's impossible to escape regardless of which approach we choose.

@@ -0,0 +1,7 @@
INSERT INTO linked_accounts (user_id, provider, account_id, access_token, login, avatar)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll likely want a benchmark on this to see if it takes long enough that it needs to be broken into smaller batches, but my instinct says this approach is probably fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm starting to think I want a more general sort of harness for running SQL in a loop of batches of N until it returns 0 results, because a few of these migrations are going to need to do that sort of thing...

@@ -0,0 +1,6 @@
ALTER TABLE users
-- The column needs to be nullable for this migration to be fast; can be changed to non-nullable
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we probably could get away with making this non-null with a default of '', but Postgres can handle changes in nullability without a table rewrite, so I don't know that it would really matter in practice.

-- after backfill of all records.
ADD COLUMN username VARCHAR;

CREATE INDEX lower_username ON users (lower(username));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need a benchmark, but I suspect we'd save some work by creating this after the column has been populated.

@carols10cents
Copy link
Member Author

Thank you for the review @blm768 !

@blm768
Copy link

blm768 commented Nov 20, 2025

Glad to help! Most of my experience with large database migrations is with MySQL, not PostgreSQL, so I very well could have missed some things, but as far as I can tell, most of what I learned with MySQL seems to port over well enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants