-
Notifications
You must be signed in to change notification settings - Fork 678
WIP sekrit project #10610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP sekrit project #10610
Conversation
7d91bcc to
e072a0c
Compare
a31b6fa to
6c9d8ff
Compare
That has a `provider` column that will (for now) always be set to 0, which corresponds to `AccountProvider::Github`. The table's primary key is (provider, account_id), which corresponds to (0, gh_id). This constraint will mean a particular GitHub/GitLab/etc account, identified from the provider by an ID, may only be associated with one crates.io user record, but a crates.io user record could (eventually) have *both* a GitHub *and* a GitLab account associated with it (or two GitHub accounts, even!) This is the first step of many to eventually allow for crates.io accounts linked to other OAuth providers in addition/instead of GitHub. No code aside from one test is reading from the linked accounts table at this time. No backfill has been done yet. No handling of creating/associating multiple OAuth accounts with one crates.io account has been done yet.
Right now, this will always have the same value as gh_login on the users table and login on the linked accounts table. All of these values will get updated when we get a new gh_login value. Eventually, we're going to have to decouple the concept of crates.io "username" from the logins of the various account(s), which you may or may not want to update when you update your GitHub or other login, and which may or may not conflict with other users' crates.io usernames. But for now, set up for that future and defer the harder decisions until later by making this field always get the same value as gh_login. This adds a `username` field to the JSON views returned from the API but does not use that field anywhere yet. Question: should team owner JSON also have a field named `username` as done in this commit? it's a little weird for a team to have a username because it's not a user. but it's consistent. something more generic like `name` might also be confusing. something more specific like `crates_io_identifier` is more verbose but less confusable. shrug
6c9d8ff to
1e9f6a3
Compare
1e9f6a3 to
83fac29
Compare
blm768
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few thoughts on the finer details of the database migration, but the approach seems sound overall.
| pub struct LinkedAccount { | ||
| pub user_id: i32, | ||
| pub provider: AccountProvider, | ||
| pub account_id: i32, // corresponds to user.gh_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could very well be a problem to deal with later, but I could imagine having a provider that represents its IDs some other way, e.g. strings or UUIDs. We probably don't want to commit to such a design yet, but perhaps we'll eventually need some kind of "table-per-provider" approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good point. Hmm. I'm trying to think through whether I'd rather start with an oauth_github table and then add, say, oauth_gitlab, etc tables, or if i'd rather start with what's here and then need to migrate to a table-per-provider later....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it'll probably be hard to know what we need until we need it. The good news is that we shouldn't be boxing ourselves into a hole that's impossible to escape regardless of which approach we choose.
| @@ -0,0 +1,7 @@ | |||
| INSERT INTO linked_accounts (user_id, provider, account_id, access_token, login, avatar) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll likely want a benchmark on this to see if it takes long enough that it needs to be broken into smaller batches, but my instinct says this approach is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm starting to think I want a more general sort of harness for running SQL in a loop of batches of N until it returns 0 results, because a few of these migrations are going to need to do that sort of thing...
| @@ -0,0 +1,6 @@ | |||
| ALTER TABLE users | |||
| -- The column needs to be nullable for this migration to be fast; can be changed to non-nullable | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we probably could get away with making this non-null with a default of '', but Postgres can handle changes in nullability without a table rewrite, so I don't know that it would really matter in practice.
| -- after backfill of all records. | ||
| ADD COLUMN username VARCHAR; | ||
|
|
||
| CREATE INDEX lower_username ON users (lower(username)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need a benchmark, but I suspect we'd save some work by creating this after the column has been populated.
|
Thank you for the review @blm768 ! |
|
Glad to help! Most of my experience with large database migrations is with MySQL, not PostgreSQL, so I very well could have missed some things, but as far as I can tell, most of what I learned with MySQL seems to port over well enough. |
No description provided.