[xCluster] Support Atomic and Ordered multi-shard Transactions 

Jira Link: [[DB-310]](https://yugabyte.atlassian.net/browse/DB-310)
### Product-Level Description and Requirements for the First Iteration

####  The Use Cases Supported

For the use case where only one cluster is taking writes at a given time, we will support the following transaction semantics on the target cluster:
1. Multi-shard atomic reads: A transaction spanning tablets A and B will either fully be readable or not at all.
2. Ordered multi-shard reads for a single-session: If timestamp(txn1) < timestamp(txn2) on the source universe, then the user will never be able to read txn2 before txn1 in a single session.
3. Consistent cutover on disaster recovery. When the target is promoted, it will have a clean cut of the system and preserve atomic and ordered reads.

For the above, multi-shard can refer to single-table, multi-table, or table-index. So this will encapsulate guarantees for transactional updates to a main table and index, as well as two separate transactions to a parent then child foreign key table.

#### Product-level Changes and Semantics

1. Reads on the target will occur at a timestamp bounded by the laggiest tablet in the system. If the lag is 1s, the read timestamp for any read will be at least 1s in the past. 
2. This read time will be applied on all tables on the target, whether or not they are under replication.
3. Each server will periodically get an updated version of the read time from the master leader. So if the master process is down or there is a master - tserver partition, the read time will not advance and reads will become increasingly stale.
4. In the case of DR cutover, all tablets will cutover to the timestamp of the laggiest tablet. What this means is, if the laggiest tablet is behind by 1s, every tablet will lose 1s worth of data in the system.
5. A toggle will be introduced between reading at a consistent but past timestamp or reading at the current time. The tradeoff is the atomic and ordered guarantees of an older timestamp vs lower latency and less data loss in case of a cutover of a real-time read.
6. When target-side read staleness exceeds the history retention cutoff (currently 15m), the target will reject reads to avoid serving incorrect data.
7. Transaction status WAL retention will need to be increased on the source, which will which will increase space utilization.
8. The user will need to do a full backup restore of all tables in case the transaction status replication stream can’t catch up. This is because the transaction status table has no B/R mechanism and therefore the only way to resolve transactions on the target for user tables is to do a B/R for those tables.

#### Situations not supported

1. Active-acive use cases (both sides taking independent writes) will not preserve transactional guarantees. For this case, we will continue to use last-writer wins semantics.
2. Ordered multi-shard reads across sessions: If timestamp(txn1) < timestamp(txn2) on the source universe, then the user may still be to read txn2 before txn1 across multiple sessions.
3. 1 : N, N : 1, or triangle formations 
4. Local transaction status tables on the source cluster

### Subtasks

- #11199
- #11200
- #11201
- #11202

### Open Problems

1. What do we do when we need to bootstrap the txn status table? Since there is no rocksdb for this table and all committed txns are eventually gc'ed, we may need to bootstrap the entire cluster over again to resolve lost transactions.
2. How do we ensure that two versions of ht_read on two different servers are bounded? On a single cluster, we use clock skew to bound the read time on any two servers. However, ht_read doesn't have any bound between two consecutive versions.
3. Do we need to change history retention at all? Given we're reading at a timestamp in the past, if ht_read falls below history retention, we may no longer be able to serve reads.

[DB-310]: https://yugabyte.atlassian.net/browse/DB-310?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

Product-Level Description and Requirements for the First Iteration

The Use Cases Supported

Product-level Changes and Semantics

Situations not supported

Subtasks

Open Problems

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

Description

Product-Level Description and Requirements for the First Iteration

The Use Cases Supported

Product-level Changes and Semantics

Situations not supported

Subtasks

Open Problems

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions