-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add migration type for modifying columns #49
Add migration type for modifying columns #49
Conversation
163d3c1
to
72bcbd3
Compare
72bcbd3
to
2c24ce7
Compare
ModifyColumnMigration cursorSql updateSql batchSize -> do | ||
logMigration | ||
ts <- getTransactionSettings | ||
setTransactionSettings $ ts { tsIsolationLevel = Serializable } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alone is not sufficient. There also needs to be a tsRestartPredicate
set that restarts the transaction when a db exception with a SerializationFailure
error code is received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure if we need serializable here. @marco44 what do you think?
The deal is that we want to modify a column by:
- Adding a new column.
- Adding a trigger that will copy values (with appropriate conversion) from the old column to the new column on insert/update.
Then, in a separate migration:
- Copy data (with appropriate conversion) for existing rows from the old column to the new one, in batches.
And the question is, should the data be copied in serializable transactions or is the default isolation level sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serializable only works when all transactions working on the same data are set as serializable. If you really want to be sure no one touches the records while you're reading them, select for update is the way to go I think: you'll end up always reading "the latest version" of a record, and preventing anyone from touching them while you haven't committed. So I don't think it's what you're hoping for. For more details : https://wiki.postgresql.org/wiki/SSI (or just ask me)
But here, i'm not sure I understand the algorithm. We create a cursor using cursorSql, which is a query to get the PKs of all candidate records, right ?
If yes, then we just don't care about isolation: the cursor, when you start it, will get a "snapshot" of the DB like it is at the beginning of the query, what you're going to fetch is all records that match this query at the precise moment you run this query. So if we have new records in the meantime, those are going to be managed by the trigger. The only corner case I see, is if an update changes a PK value while we fix the records.
ed9d40f
to
94a5ef0
Compare
Ok, I think that I've addressed everything now but one thing. I remember that we talked about adding vacuum in between batches. Let me know if that is still something that we should do. |
75c6fd4
to
b7e09b4
Compare
b7e09b4
to
300b678
Compare
-- | ||
-- SQL that will be used for the cursor. | ||
-- | ||
-- Function that takes a list of primary keys provided by the cursor SQL and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it doesn't have to be primary keys right? As long as you can uniquely identify a column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique not null, i suppose (just to be pedantic, but that's important)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-reading my comment I have to make a correction: I mean ROW, not COLUMN.
But I'm not sure how not null
applies here. You either uniquely identify a record or don't, there's no space for NULL
s in that. Maybe you meant that column that one would/could reference has UNIQUE NOT NULL
constraint, but I believe that is what uniquely identify a row/record implies in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique not null is just the sql constraint. unique is a dangerous term when talking about sql databases, because null = null
returns null, which is neither true nor false, except if you really are in a predicate context, where it finally becomes false. so i supposed you were talking about the constraint, which has to be unique not null if you really want to enforce the column to be unique (in the usual sense of unique). So as I said, that's me being pedantic, but i just wanted to be sure the constraint, in the end, is unique not null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this what people mean when they say a candidate key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah more or less, i didn't want to confuse people. that could also work with superkeys, but those are relational algebra terms, like candidate keys. Anything that can identify uniquely a record. I just said unique not null, just because "unique" is tricky in SQL, I wanted to be sure that everybody was on the same page.
@jsynacek are we ok to merge this? I'd really like to unblock https://github.com/scrive/kontrakcja/pull/4311 |
- min batch size of 1000 - no logging of batch size (debug logging?) - reformatted comment - no redundant commit
@matobet Done. |
No description provided.