Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migration type for modifying columns #49

Merged
merged 8 commits into from
Aug 2, 2022

Conversation

jsynacek
Copy link
Contributor

No description provided.

@jsynacek jsynacek marked this pull request as draft April 26, 2021 15:13
@jsynacek jsynacek force-pushed the dev-jsynacek-core3249-modify-column-migration branch from 163d3c1 to 72bcbd3 Compare April 27, 2021 12:11
@jsynacek jsynacek requested a review from arybczak April 27, 2021 12:12
@jsynacek jsynacek marked this pull request as ready for review April 27, 2021 12:12
@jsynacek jsynacek marked this pull request as draft April 27, 2021 12:12
@jsynacek jsynacek force-pushed the dev-jsynacek-core3249-modify-column-migration branch from 72bcbd3 to 2c24ce7 Compare April 27, 2021 12:55
@jsynacek jsynacek marked this pull request as ready for review April 27, 2021 12:55
test/Main.hs Outdated Show resolved Hide resolved
ModifyColumnMigration cursorSql updateSql batchSize -> do
logMigration
ts <- getTransactionSettings
setTransactionSettings $ ts { tsIsolationLevel = Serializable }
Copy link
Collaborator

@arybczak arybczak Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alone is not sufficient. There also needs to be a tsRestartPredicate set that restarts the transaction when a db exception with a SerializationFailure error code is received.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure if we need serializable here. @marco44 what do you think?

The deal is that we want to modify a column by:

  1. Adding a new column.
  2. Adding a trigger that will copy values (with appropriate conversion) from the old column to the new column on insert/update.

Then, in a separate migration:

  1. Copy data (with appropriate conversion) for existing rows from the old column to the new one, in batches.

And the question is, should the data be copied in serializable transactions or is the default isolation level sufficient?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serializable only works when all transactions working on the same data are set as serializable. If you really want to be sure no one touches the records while you're reading them, select for update is the way to go I think: you'll end up always reading "the latest version" of a record, and preventing anyone from touching them while you haven't committed. So I don't think it's what you're hoping for. For more details : https://wiki.postgresql.org/wiki/SSI (or just ask me)

But here, i'm not sure I understand the algorithm. We create a cursor using cursorSql, which is a query to get the PKs of all candidate records, right ?

If yes, then we just don't care about isolation: the cursor, when you start it, will get a "snapshot" of the DB like it is at the beginning of the query, what you're going to fetch is all records that match this query at the precise moment you run this query. So if we have new records in the meantime, those are going to be managed by the trigger. The only corner case I see, is if an update changes a PK value while we fix the records.

src/Database/PostgreSQL/PQTypes/Model/Migration.hs Outdated Show resolved Hide resolved
@jsynacek jsynacek changed the title CORE-3249: Add migration type for modifying columns Add migration type for modifying columns Jul 11, 2022
@jsynacek jsynacek force-pushed the dev-jsynacek-core3249-modify-column-migration branch from ed9d40f to 94a5ef0 Compare July 11, 2022 08:08
@jsynacek
Copy link
Contributor Author

Ok, I think that I've addressed everything now but one thing. I remember that we talked about adding vacuum in between batches. Let me know if that is still something that we should do.

@jsynacek jsynacek requested a review from arybczak July 11, 2022 08:09
@jsynacek jsynacek force-pushed the dev-jsynacek-core3249-modify-column-migration branch from 75c6fd4 to b7e09b4 Compare July 12, 2022 09:17
@jsynacek jsynacek force-pushed the dev-jsynacek-core3249-modify-column-migration branch from b7e09b4 to 300b678 Compare July 12, 2022 09:19
@zlondrej zlondrej self-requested a review July 15, 2022 14:51
--
-- SQL that will be used for the cursor.
--
-- Function that takes a list of primary keys provided by the cursor SQL and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it doesn't have to be primary keys right? As long as you can uniquely identify a column.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique not null, i suppose (just to be pedantic, but that's important)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-reading my comment I have to make a correction: I mean ROW, not COLUMN.

But I'm not sure how not null applies here. You either uniquely identify a record or don't, there's no space for NULLs in that. Maybe you meant that column that one would/could reference has UNIQUE NOT NULL constraint, but I believe that is what uniquely identify a row/record implies in that case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique not null is just the sql constraint. unique is a dangerous term when talking about sql databases, because null = null returns null, which is neither true nor false, except if you really are in a predicate context, where it finally becomes false. so i supposed you were talking about the constraint, which has to be unique not null if you really want to enforce the column to be unique (in the usual sense of unique). So as I said, that's me being pedantic, but i just wanted to be sure the constraint, in the end, is unique not null

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this what people mean when they say a candidate key

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah more or less, i didn't want to confuse people. that could also work with superkeys, but those are relational algebra terms, like candidate keys. Anything that can identify uniquely a record. I just said unique not null, just because "unique" is tricky in SQL, I wanted to be sure that everybody was on the same page.

@matobet
Copy link

matobet commented Jul 27, 2022

@jsynacek are we ok to merge this? I'd really like to unblock https://github.com/scrive/kontrakcja/pull/4311

- min batch size of 1000
- no logging of batch size (debug logging?)
- reformatted comment
- no redundant commit
@arybczak arybczak merged commit ddb7b71 into master Aug 2, 2022
@arybczak arybczak deleted the dev-jsynacek-core3249-modify-column-migration branch August 2, 2022 07:27
@jsynacek
Copy link
Contributor Author

jsynacek commented Aug 2, 2022

@matobet Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants