Closed
Description
Description
For further progress on the writer, we'll need to implement features that require query engines. This includes:
- Higher writer protocols:
- V2: column invariants (Write
enforce_invariant()
function #592) - V3: CHECK constraints
- V4: generated columns
- V2: column invariants (Write
- Write types that require rewriting files:
- DELETE
- UPDATE
- MERGE (Implement merge command #850)
We can provide a default one with DataFusion, but we will also have users that wish to plug in their own query engine. In addition, it's possible we may have users that wish to user their own Parquet writer (for distributed engines, for example). So we will likely want to refactor into three distinct layers:
- A transaction layer for those who want to use their own Parquet writer to handle data writes (you write data; we write transaction);
- A parametrized writer layer, who want to use their own query engine but will use the built-in data writer (you verify data; we write data and transaction);
- A DataFusion-based writer that handles everything (verification, writing, transaction).
I'm not sure how viable this is yet, and would welcome feedback from others.
Use Case
Related Issue(s)