-
Notifications
You must be signed in to change notification settings - Fork 28
feat: update transaction part in lance partitioning spec #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Discussed a bit offline, my current biggest concern is that once we enter this mode, user can never read a partition table directly and has to always go through namespace. I thought about maybe the implementation of the spec will have the responsibility to keep those datasets up to date asynchronously, but that is also a lot of complexity, maybe this approach is inevitable. Another direction I think we have not really explored, is actually partition level locking. What if we only add the locking mechanism in |
|
Actually, I think there could be a better way. Today, we already have the concept of an external manifest store in Lance dataset, because S3 was not atomic in write and we used Dynamo. Here we are essentially using another Lance table as the external manifest store, and with that we gain ability to commit to multiple tables as well. Now, the key idea is that, instead of using CDF, we will change Now we have achieved both and avoided using CDF: We will be able to atomically write to multiple partitions by atomically update What do you think? |
|
First Question
Second Question
|
|
Add serial commit to lance lance-format/lance#5662 |
19ba3f9 to
b73d587
Compare
b73d587 to
da9d0a6
Compare
I updated the spec based on this design. With only one minor difference by changing "new entry for each partitioned table commit" to "make |
|
The SPEC changes describe the process of cross-partition transaction commit. In the following, I will use an example to explain why serial commit is necessary for this cross-partition transaction commit process. Assume the initial state of the Partition Table is as follows: At this point, 2 concurrent transactions (t0 and t1) are issued, both of which observe S0. t1 modifies table0 via a detached commit, resulting in: t2 modifies both table0 and table1 via detached commits, resulting in: (Here, 21, 31, and 32 are all detached versions.) Next, t0 and t1 are to commit, i.e., to insert records into the Since the If we replace the read_version with a List to record the read_version's timeline, the result remains the same. Depending on the commit order, the final possible states are the following 2 cases. The root cause is: our commits in the |
|
I think this should be solved by the primary key dedupe feature that me and Vino added previously. In this case, the primary key is |
|
Even if duplicate rows are not inserted (using primary key merge_into to prevent this), there are still semantic issues. Returning to the previous example: assume the commit order is S0 -> t1 -> t2. Primary key merge_into ensures that t2 cannot insert into table_0 during its commit, but table_1 can still be inserted successfully, resulting in the final state: But what we need is t2 to fail completely and trigger a transaction retry, rather than having one insertion succeed and the other fail. Because t2's commit must perform a conflict check with t1's commit (a conflict check on every modified partitioned table). Serial Commit can guarantee that t2 fails entirely. |
I don't think so. When you run merge-insert, your source is as a whole. If |
|
@jackye1995 Thanks your explanation, you are right. We can use a The It is a smart solution, let me update the SPEC with it. |
No description provided.