Skip to content

perf(fabricx/finality): batch finality via shared poller#1849

Open
EvanYan1024 wants to merge 2 commits into
LFDT-Panurus:mainfrom
Built-by-Sign:perf/fabricx-finality-batch-poller
Open

perf(fabricx/finality): batch finality via shared poller#1849
EvanYan1024 wants to merge 2 commits into
LFDT-Panurus:mainfrom
Built-by-Sign:perf/fabricx-finality-batch-poller

Conversation

@EvanYan1024

@EvanYan1024 EvanYan1024 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Fixes #1848.

Motivation

NSListenerManager resolves FabricX finality with one remote committer query per transaction: a speculative GetTransactionStatus at registration (almost always seeing a not-yet-committed tx) plus per-tx work items that each block a queue worker on its own network round-trip. In our deployment (Fabric-X backend, sustained submission from 4000 concurrent clients), finality resolution settled ~92 tx/s while the chain committed ~1400 tx/s, accumulating >100k inflight finality watchers.

Changes

  • AddFinalityListener now adds the tx to a pending set instead of enqueuing a per-tx TxCheck; a single shared poller, started lazily on first registration, sweeps the pending set every poll interval.
  • Each sweep resolves statuses in chunks with one committer query per chunk: it uses the batched GetTransactionStatuses (perf(fabricx/queryservice): add batched GetTransactionStatuses hyperledger-labs/fabric-smart-client#1553) when the query service provides it, and falls back to per-tx GetTransactionStatus otherwise. No FSC dependency bump is required; the batch path activates once the FSC dependency includes Add benchmarks to track performance over time #1553. In the per-tx path a failing query only skips that tx (a not-yet-committed tx returns an error rather than a status), so one fresh tx cannot fail the whole chunk.
  • A tx can have several finality waiters: the pending set keeps a waiter list per txID and notifies all of them on resolution, matching the semantics of the other listener managers.
  • Token-request hashes for valid txs are fetched with one batched GetStates query, grouped by namespace.
  • Terminal txs are handed to the existing worker pool as pre-resolved notification events that perform no network I/O; if the queue is full, the tx stays pending and is retried on the next sweep.
  • Pending entries are reclaimed after a TTL so permanently dropped txs cannot grow the pending set; callers' own finality timeouts settle those waiters.
  • Poll interval, batch size, and pending TTL are read from configuration (token.finality.poller.interval / .batchSize / .pendingTTL), following the existing token.finality.notification.* and token.fabricx.lookup.* config patterns; unset values fall back to defaults (1s / 2000 / 10m).
  • docs/configuration.md: documented the new token.finality.poller.* keys and updated the now-stale note about the immediate status query at subscription time.

Results

With this change our deployment's finality resolution improved ~5x immediately and stopped being network-bound (remaining limits in our runs were local, e.g. DB connection pool sizing). It has been running in our test environments under sustained load since.

Notes for reviewers

  • The wrapped notification-service ListenerManager is still constructed and held, but no longer used for status delivery. Happy to either drop it or re-wire it so push notifications complement the poller (poller as sweep/fallback) if that's the preferred direction.
  • The poller goroutine starts lazily and runs for the lifetime of the process. If a shutdown hook is preferred, point me at the lifecycle to hang it on.

@EvanYan1024 EvanYan1024 force-pushed the perf/fabricx-finality-batch-poller branch 2 times, most recently from a33db8e to 4ac1b86 Compare July 3, 2026 03:22
Replace the per-transaction remote status check (a speculative
GetTransactionStatus at registration time plus a per-tx fallback, which
serialized every queue worker on its own committer round-trip) with a
single shared poller. It sweeps all pending txs each interval, batches
the committer status query (and the token-request-hash lookup for valid
txs), and hands terminal txs to the existing worker pool for
notification with no network I/O on that path.

The poller uses the query service's batched GetTransactionStatuses when
available (added to the FabricX query service in
hyperledger-labs/fabric-smart-client#1553) and falls back to per-tx
GetTransactionStatus otherwise, so no FSC version bump is required. In
the per-tx path a failing query only skips that tx, since a
not-yet-committed tx returns an error rather than a status.

A tx can have several finality waiters; the pending set keeps a waiter
list per txID and notifies all of them on resolution.

Poll interval, batch size, and pending TTL are read from configuration
(token.finality.poller.*), following the existing notification queue
and lookup config patterns.

Signed-off-by: Evan <evanyan@sign.global>
@EvanYan1024 EvanYan1024 force-pushed the perf/fabricx-finality-batch-poller branch from 4ac1b86 to f4396a2 Compare July 3, 2026 03:28
@AkramBitar AkramBitar requested review from AkramBitar and adecaro and removed request for adecaro July 3, 2026 21:38
@AkramBitar AkramBitar added the enhancement New feature or request label Jul 3, 2026
@AkramBitar AkramBitar added this to the Q3/26 milestone Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: FabricX finality resolution serializes on per-transaction status queries

2 participants