Skip to content

Transaction V2 #3724

@wjones127

Description

@wjones127

Issues with Transactions

  • Transactions need to serve as diffs between manifests. In some respects they do: an Append operation shows the new fragments added. But other operations are not helpful: in Update operation, we just provide the new fragments, but it's unclear which data files were modified.
  • Transactions don't provide a very clear user-friendly view of what happened.
  • Creating new transaction types requires creating new proto messages, or else trying to repurpose existing ones.
  • Conflict resolution requires O(n**2) where n=# of operation types cases to be handled

Proposed design

Provide a new CompositeOperation type that is meant to be a generic operation type. It will be three levels:

  • CompositeOperation represents the full operation that is applied to create the next version of the manifest
  • Has multiple UserOperation, which represent a logical user-facing operation. (For example, delete multiple rows.)
  • Each UserOperation contains multiple Actions, which represent a single change to the manifest.

To apply a CompositeOperation to create a new manifest, simply apply the flattened list of Actions in order.

For example, consider the SQL query:

BEGIN TRANSACTION;
INSERT INTO t VALUES (1);
DELETE FROM t WHERE id = 32;
UPDATE t SET a = 2 WHERE id = 1;
COMMIT;

Could be represented as a single CompositeOperation:

- description: "INSERT INTO t VALUES (1);"
  uuid: 123e4567-e89b-12d3-a456-426655440001
  read_version: 0
  actions:
    - type: add_fragments
      fragments:
        - id: 10
          files:
            - path: data/123e4567-e89b-12d3-a456-426655440000.lance
              fields: [0]
- description: "DELETE FROM t WHERE id = 32;"
  uuid: 123e4567-e89b-12d3-a456-426655440002
  read_version: 0
  actions:
    - type: updated_fragments
      fragments:
        - id: 0
          files:
            - path: data/dfdsfdsd-e89b-12d3-a456-426655440000.lance
              fields: [0]
              deletion_file:
                - file_type: ARROW_ARRAY
                  read_version: 0
                  id: 10
                  num_deleted_rows: 1
- description: "UPDATE t SET a = 2 WHERE id = 1;"
  uuid: 123e4567-e89b-12d3-a456-426655440003
  read_version: 0
  actions:
    - type: updated_fragments
      fragments:
        - id: 0
          files:
            - path: data/123e4567-sdfs-12d3-sdfs-426655440000.lance
              fields: [0]
              deletion_file:
                - file_type: ARROW_ARRAY
                  read_version: 0
                  id: 10
                  num_deleted_rows: 1

This design is prototyped in #3204

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions