Coalesce DMLs across concurrent queueables to reduce row-lock contention

### Problem
Today many independent queueables fire concurrently, each ending in its own DML against children of a shared parent (lookup / M-D). Salesforce locks the parent on every child write → `UNABLE_TO_LOCK_ROW`. Lookup/ownership skew amplifies. Retry (Issue #N) is a safety net but doesn't address root cause: N small races for the same parent lock instead of one batched commit.

### Proposal
Treat this as a **design spike** — not a green-lit implementation. Explore whether the Async-lib can intercept DMLs at the end of independent queueables and coalesce them into a single batched commit within a time window (or other grouping signal).

The producer side should require **zero changes** to existing queueables — devs already wrote `MyJob` ending in `update records;`. The framework should be able to opt that job into "deferred DML" mode without rewriting it.

```apex
// Sketch — devs don't redesign their jobs, they just opt in
Async.queueable(new MyJob(args))
    .deferDmlVia(AccountDmlCoalescer.class)
    .enqueue();
```

The coalescer collects pending DML requests from N producer jobs within a configurable window, sorts by parent ID, deduplicates, commits once.

### Mitigations to evaluate FIRST (cheaper, ship before spike)
Before building coalescing infra, prove these aren't enough:
1. **Sort by parent ID before DML** — kills most lock contention in practice. Could be a `.sortByParent(Field)` helper in DML-lib.
2. **Partial commit + retry failed subset** — `Database.update(records, false)` + retry the locked rows. Pairs naturally with Issue #N.
3. **`.dedupe()` option on individual queueables** — drop duplicate work in same job.

If 1+2+3 still leaves measurable contention, the coalescer is justified.

### Open Questions
- Cross-transaction queue mechanism — Async-lib internal state, Platform Event, custom object queue, Platform Cache? Open to options; PE was one idea, not a requirement.
- Grouping key — per `SObjectType`, per parent ID, per coalescer class, per business window?
- Time/size window — fixed delay, max batch size, both?
- Single-runner enforcement — how do we guarantee only one coalescer flushes a given group at a time?
- Failure semantics — if coalesced DML partially fails, how do we attribute per-producer logs back?
- Interplay with retry framework — does retry happen at producer level or coalescer level?

### Acceptance Criteria (for the spike)
- [ ] Measurement: real production lock-row incident profile (frequency, hot objects, parent skew)
- [ ] Quantified test of mitigations 1+2+3 against representative load
- [ ] Decision doc: ship cheap mitigations only, or proceed to coalescer impl?
- [ ] If coalescer proceeds: design doc covering all open questions above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coalesce DMLs across concurrent queueables to reduce row-lock contention #35

Problem

Proposal

Mitigations to evaluate FIRST (cheaper, ship before spike)

Open Questions

Acceptance Criteria (for the spike)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Coalesce DMLs across concurrent queueables to reduce row-lock contention #35

Description

Problem

Proposal

Mitigations to evaluate FIRST (cheaper, ship before spike)

Open Questions

Acceptance Criteria (for the spike)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions