Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API, Core: Multi-Table transactions #6757

Closed
wants to merge 1 commit into from

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Feb 6, 2023

No description provided.

@nastra nastra marked this pull request as draft February 6, 2023 11:53
@nastra nastra force-pushed the multi-table-tx-with-jdbc branch 3 times, most recently from 230f411 to 118bbaf Compare February 6, 2023 15:38
@jackye1995 jackye1995 self-requested a review February 6, 2023 18:01
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome to see @nastra I'll hold off on reviewing the code itself until it's out of draft. I did have more fundamental questions on behavior, especially with multi-table transactions in the presence of branches, and how the isolation level checks are performed.

Comment on lines 23 to 53
enum IsolationLevel {

/**
* Table can change underneath but still have to pass normal table validation. Loading table
* data when it's referenced.
*/
SNAPSHOT,

/**
* All tables participating in the transaction must be in the same state when committing
* compared to when the transaction started. Loading table data when it is referenced.
*/
SERIALIZABLE_WITH_LOADING_READS,

/**
* All tables participating in the transaction must be in the same state when committing
* compared to when the transaction started. Loading table data as of TX start time.
*/
SERIALIZABLE_WITH_FIXED_READS,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still draft, but I wanted to verify for multi-table transactions shouldn't the isolation level enforcement be done at the branch level and not just the table level?

For example, for serializable with fixed reads I would expect the validation that the branch is in the same state even if another branch is written to.

    CatalogTransaction catalogTx = catalog.startTransaction(isolationLevel);
    Catalog txCatalog = catalogTx.asCatalog();
    catalog.loadTable(table1).newFastAppend().appendFile(FILE_B).toBranch("branch1").commit();
    catalog.loadTable(table2).newFastAppend().appendFile(FILE_C).toBranch("branch2").commit();

Let's say the append to table 1 above happens concurrently with an UPDATE/DELETE which matches but the update/delete happens on "branch3", then the multi-table transaction should still succeed.

Copy link
Contributor Author

@nastra nastra Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should actually be most likely at the branch level (but I'd have to revisit and think about this). However, I started to work on this before writing to branches (#5234) was committed, hence it's not being reflected here.

Comment on lines 68 to 70
TableMetadata tableMetadata =
((BaseTable) origin().loadTable(affectedTable)).operations().current();
if (tableMetadata.lastUpdatedMillis() >= txStartTime().toEpochMilli()) {
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to my point above, I think checking tableMetadata last updated is too broad when performing the isolation level checks, I think it needs to be performed at the branch level. Unless we're establishing that multi-table transactions can only happen on each table's main branch but that feels too restrictive. Also not sure if we can just do a timestamp check (clock skew between the last updated time set and the start time of the transaction).

Copy link
Contributor Author

@nastra nastra Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, I started working on this before writing to branches was possible (#5234), hence I didn't consider this in this initial version, simply because I missed that this feature made it into master

@rdblue rdblue changed the title API,Core: Multi-Table transactions API, Core: Multi-Table transactions Feb 6, 2023
@rdblue
Copy link
Contributor

rdblue commented Feb 6, 2023

@nastra, can you write up a design doc?

@nastra nastra force-pushed the multi-table-tx-with-jdbc branch 2 times, most recently from 88156cf to 8b6e944 Compare February 27, 2023 11:22
@nastra nastra force-pushed the multi-table-tx-with-jdbc branch from 8b6e944 to 371ae2f Compare February 27, 2023 11:24
@nastra
Copy link
Contributor Author

nastra commented Feb 27, 2023

Closing this in favor of #6948

@nastra nastra closed this Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants