Skip to content

HIVE-29459: [DR][HiveACIDReplication] Add clearDanglingTxnTask at the end#6334

Open
harshal-16 wants to merge 1 commit intoapache:masterfrom
harshal-16:HIVE-29459
Open

HIVE-29459: [DR][HiveACIDReplication] Add clearDanglingTxnTask at the end#6334
harshal-16 wants to merge 1 commit intoapache:masterfrom
harshal-16:HIVE-29459

Conversation

@harshal-16
Copy link
Contributor

@harshal-16 harshal-16 commented Feb 24, 2026

What changes were proposed in this pull request?

  • Add clearDanglingTxnTask only if it doesn't have more work

Why are the changes needed?

* Currently, at the end of replLoadTask, clearDanglingTxnTask is added. That works in normal scenario
if (conf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_CLEAR_DANGLING_TXNS_ON_TARGET)) {      ClearDanglingTxnWork clearDanglingTxnWork = new ClearDanglingTxnWork(work.getDumpDirectory(), targetDb.getName());
      Task<ClearDanglingTxnWork> clearDanglingTxnTaskTask = TaskFactory.get(clearDanglingTxnWork, conf);
      if (childTasks.isEmpty()) {
        childTasks.add(clearDanglingTxnTaskTask);
      } else {
        DAGTraversal.traverse(childTasks, new AddDependencyToLeaves(Collections.singletonList(clearDanglingTxnTaskTask)));
      }
    }    return 0;
  • But if the no of events for incremental load is > hive.repl.approx.max.load.tasks then Load operation can break down the tasks into batches of approx hive.repl.approx.max.load.tasks{{ (Not a hard limit)}}
  • In this case, it can lead to pre-maturely cleaning of repl_txn_map and aborting the transaction in between the replication because clearDanglingTxnTask gets called in between the batches rather than calling at the end only once per Load cycle.

Does this PR introduce any user-facing change?

No

How was this patch tested?

* Tested on live cluster
* Added test-case

@harshal-16 harshal-16 changed the title CDPD-97300: [DR][HiveACIDReplication] Add clearDanglingTxnTaskTask at the end CDPD-97300: [DR][HiveACIDReplication] Add clearDanglingTxnTask at the end Feb 24, 2026
… end

Details:
	* Currently, at the end of replLoadTask, clearDanglingTxnTask is added. That works in normal scenario
```java
if (conf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_CLEAR_DANGLING_TXNS_ON_TARGET)) {      ClearDanglingTxnWork clearDanglingTxnWork = new ClearDanglingTxnWork(work.getDumpDirectory(), targetDb.getName());
      Task<ClearDanglingTxnWork> clearDanglingTxnTask = TaskFactory.get(clearDanglingTxnWork, conf);
      if (childTasks.isEmpty()) {
        childTasks.add(clearDanglingTxnTask);
      } else {
        DAGTraversal.traverse(childTasks, new AddDependencyToLeaves(Collections.singletonList(clearDanglingTxnTask)));
      }
    }    return 0;
```

	* But if the no of events for incremental load is > hive.repl.approx.max.load.tasks
		then Load operation can break down the tasks into batches of approx hive.repl.approx.max.load.tasks{{ (Not a hard limit)}}
	* In this case, it can lead to pre-maturely cleaning of repl_txn_map and aborting the transaction in between the replication
		because clearDanglingTxnTask gets called in between the batches rather than calling at the end only once per Load cycle.
Fix:
	* Add clearDanglingTxnTask only if it doesn't have more work

Testing:
	* Tested on live cluster
	* Added test-case
@harshal-16 harshal-16 changed the title CDPD-97300: [DR][HiveACIDReplication] Add clearDanglingTxnTask at the end HIVE-29459: [DR][HiveACIDReplication] Add clearDanglingTxnTask at the end Feb 24, 2026
@sonarqubecloud
Copy link

@harshal-16
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants