Follow-up from #16237 (per @AnatolyPopov and @laskoviymishka review).
Currently, any full-commit failure immediately terminates the coordinator. For transient errors (e.g., CommitFailedException from catalog contention), bounded retries would avoid unnecessary operator intervention.
Proposal: add a configurable consecutive-failure threshold (e.g., 3). Only terminate the coordinator after N consecutive failures of the same commit cycle.
Willingness to contribute
Follow-up from #16237 (per @AnatolyPopov and @laskoviymishka review).
Currently, any full-commit failure immediately terminates the coordinator. For transient errors (e.g.,
CommitFailedExceptionfrom catalog contention), bounded retries would avoid unnecessary operator intervention.Proposal: add a configurable consecutive-failure threshold (e.g., 3). Only terminate the coordinator after N consecutive failures of the same commit cycle.
Willingness to contribute