Skip to content

docs: add Token SDK metrics reference and identify coverage gaps#1749

Open
SuyashAlphaC wants to merge 2 commits into
LFDT-Panurus:mainfrom
SuyashAlphaC:docs/metrics-reference
Open

docs: add Token SDK metrics reference and identify coverage gaps#1749
SuyashAlphaC wants to merge 2 commits into
LFDT-Panurus:mainfrom
SuyashAlphaC:docs/metrics-reference

Conversation

@SuyashAlphaC

@SuyashAlphaC SuyashAlphaC commented May 27, 2026

Copy link
Copy Markdown
Contributor

Closes #1745

Adds docs/metrics.md, a complete reference of the 50 metrics the SDK emits (driver services, ttx lifecycle, finality, envelope sessions, auditor, selection, certification, identity caches, Fabric-X queue), each with type, labels, and description. Linked from the monitoring guide.

Also adds a coverage-gap section ranking the uninstrumented layers — storage, the auditor lock manager, the standard Fabric network path, validation, a transaction-failure counter, and wallet resolution — with concrete suggested metrics for follow-up.

Catalog every metric the SDK emits (50 across driver services, the ttx
transaction lifecycle, finality listener, versioned envelope sessions, the
auditor service, token selection, certification, identity caches, and the
Fabric-X finality queue), each with its type, labels, and description, in a new
docs/metrics.md, linked from the monitoring guide.

Also add a coverage-gap section calling out the layers that are currently
uninstrumented, ranked by impact: the storage/persistence layer (no metrics at
all), the distributed auditor lock manager, the standard Fabric network/approval
path, token-request validation/double-spend, a transaction-level failure
counter, and wallet/identity resolution.

Closes LFDT-Panurus#1745

Signed-off-by: SuyashAlphaC <suyashagrawal862@gmail.com>
@SuyashAlphaC

Copy link
Copy Markdown
Contributor Author

Hey @adecaro Can you pls review this PR once.

@adecaro adecaro self-requested a review May 28, 2026 13:27
@adecaro adecaro self-assigned this May 28, 2026
@adecaro adecaro added the documentation Improvements or additions to documentation label May 28, 2026
@adecaro adecaro added this to the Q2/26 milestone May 28, 2026
Comment thread docs/metrics.md Outdated
Recommended: request-approval and broadcast counters, durations, and error
counters for the Fabric network driver, mirroring the Fabric-X queue metrics.

### 4. Validation / double-spend

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean the EndorserService in token/services/network/fabric/endorsement/provider.go?
Double-spending can only be enforced at committing time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestApprovalView could also be instrumented. Remember though that the execution of views themselves are instrumented in FSC directly.

@adecaro

adecaro commented May 28, 2026

Copy link
Copy Markdown
Contributor

Hi @SuyashAlphaC , I left some comments. Thanks for looking at this.
An add-on for the PR could be a nice Grafana dashboard. If you feel comfortable with this, please, add it. Otherwise, the PR is fine as well without it 🙏

Comment thread docs/metrics.md
lifecycle, finality, auditing, selection). Several layers are currently
uninstrumented. The items below are ordered by impact.

### 1. Storage / persistence layer (highest priority)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean this in addition to what the DB can provide already?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, on top of the DB-level stuff, not replacing it for two reasons:

  • semantic labels the DB can't synthesize cheaply. postgres_exporter / pg_stat_statements give you INSERT INTO tokens row counts and query latency, but no per-store / per-operation domain labels (store=tokendb,operation=write vs
    store=ttxdb,operation=write) without per-query parsing rules you'd have to maintain by hand and re-tune on every schema change.
  • sqlite / embedded backends. there's no exporter there, so SDK-level counters are the only option if you want any signal at all.

so it's complementary: DB exporter for IO/contention/query stats, SDK counters for the application-level semantics.

Review fixes on docs/metrics.md from @adecaro:

- Storage gap (section 1): clarify the SDK-level counters complement, rather
  than duplicate, what postgres_exporter / pg_stat_statements already give.
  The SDK metrics add semantic labels the DB layer cannot infer without
  per-query parsing, and they are the only source of metrics when the backend
  is sqlite or another embedded store with no exporter.
- Endorser path (former sections 3 + 4 merged): drop the imprecise
  "validation / double-spend" framing; describe the actual EndorserService in
  token/services/network/fabric/endorsement/provider.go and the
  RequestApprovalView in fsc/initiator.go, and call out that double-spend is a
  commit-time concern enforced by Fabric / the token chaincode, not by an
  SDK-level metric.
- Add a global note that FSC instruments view execution at the platform layer,
  so the suggestions here stay domain-specific instead of duplicating it.

Also adds a ready-to-import Grafana dashboard covering all 50 catalogued
metrics under docs/monitoring/grafana/, with multi-select template variables
for network/channel/namespace/method, plus a README describing the panel
layout and import steps. Linked from docs/metrics.md.

Signed-off-by: SuyashAlphaC <suyashagrawal862@gmail.com>
@SuyashAlphaC

Copy link
Copy Markdown
Contributor Author

hey @adecaro! Can you pls review the recent changes in the docs I made as per your comments. 🙏

@SuyashAlphaC

Copy link
Copy Markdown
Contributor Author

hey @adecaro , are there any further changes you want from my side on this PR?

@adecaro adecaro modified the milestones: Q2/26, Q3/26 Jun 2, 2026
@adecaro

adecaro commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Hi @SuyashAlphaC , thanks for this effort. Let me ask @AkramBitar if he can verify the grafana dashboard in our deployment so see how it looks. Thanks @AkramBitar 🙏

Comment thread docs/metrics.md
| `finality_listener_confirmed_total` | Counter | Transactions confirmed on the ledger and committed to local storage |
| `finality_listener_deleted_total` | Counter | Transactions marked deleted due to an invalid ledger status or token-request hash mismatch |
| `finality_listener_hash_mismatch_total` | Counter | Transactions rejected because the committed token-request hash did not match the local one |
| `finality_listener_retry_exhausted_total` | Counter | Transactions abandoned after all finality-processing retries were exhausted |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For finalty I found the following. As you see:

  1. None of the above in the list
  2. The list has new ones that is not described in that doc
    fts_services_network_fabricx_finality_queue_finality_queue_pending_events
    fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds
    fts_services_network_fabricx_finality_queue_finality_queue_processing_errors_total
  3. The prefix is ts_services_network_fabricx_ then the metric it slef (need to verify that for the all the metrics that you listed above in the table has the same prefix since I do not see these in Prometheus query.
# HELP fts_services_network_fabricx_finality_queue_finality_queue_pending_events Current number of finality events waiting in the queue buffer
# TYPE fts_services_network_fabricx_finality_queue_finality_queue_pending_events gauge
fts_services_network_fabricx_finality_queue_finality_queue_pending_events 0
# HELP fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds Histogram of successful event processing time in worker goroutines (seconds)
# TYPE fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds histogram
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.001"} 0
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.005"} 0
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.01"} 0
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.025"} 0
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.05"} 2
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.1"} 17
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.25"} 5327
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="0.5"} 6372
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="1"} 6387
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="2.5"} 6406
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="5"} 6408
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_bucket{le="+Inf"} 6409
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_sum 1324.906861666999
fts_services_network_fabricx_finality_queue_finality_queue_processing_duration_seconds_count 6409
# HELP fts_services_network_fabricx_finality_queue_finality_queue_processing_errors_total Total number of errors returned by event.Process in worker goroutines
# TYPE fts_services_network_fabricx_finality_queue_finality_queue_processing_errors_total counter
fts_services_network_fabricx_finality_queue_finality_queue_processing_errors_
```total 6393

Comment thread docs/metrics.md

| Metric | Type | Description |
|--------|------|-------------|
| `endorsed_transactions` | Counter | Number of endorsed transactions |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the metrics in deployment environment starts with prefix fts_core_common_metrics_ and then the v it self.

See for example:

**fts_services_ttx_**endorsed_transactions

**fts_services_ttx_**endorsement_duration_seconds{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10021", job="FSC.issuer", namespace="tokenchaincode", network="mytopos"}

Comment thread docs/metrics.md

| Metric | Type | Description |
|--------|------|-------------|
| `issue_service_operations_total` | Counter | Total IssueService method invocations |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the metrics in deployment environment starts with prefix fts_core_common_metrics_ and then the metric it self.

Example 1:
**fts_core_common_metrics_**issue_service_operations_total{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10021", job="FSC.issuer", method="DeserializeIssueAction", namespace="tokenchaincode", network="mytopos"} 7492
fts_core_common_metrics_issue_service_operations_total{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10021", job="FSC.issuer", method="Issue", namespace="tokenchaincode", network="mytopos"} 3706
fts_core_common_metrics_issue_service_operations_total{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", method="DeserializeIssueAction", namespace="tokenchaincode", network="mytopos"} 11118
fts_core_common_metrics_issue_service_operations_total{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", method="VerifyIssue", namespace="tokenchaincode", network="mytopos"}

Example 2:
fts_core_common_metrics_auditor_service_operations_total{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", method="AuditorCheck", namespace="tokenchaincode", network="mytopos"}

"options": {"legend": {"showLegend": true, "placement": "bottom"}},
"targets": [
{"refId": "issue", "expr": "sum(rate(issue_service_operations_total{network=~\"$network\",channel=~\"$channel\",namespace=~\"$namespace\",method=~\"$method\"}[$__rate_interval]))", "legendFormat": "issue"},
{"refId": "transfer", "expr": "sum(rate(transfer_service_operations_total{network=~\"$network\",channel=~\"$channel\",namespace=~\"$namespace\",method=~\"$method\"}[$__rate_interval]))", "legendFormat": "transfer"},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix the metrics names. All has prefix see my comments in docs/metrics.md file.

Comment thread docs/metrics.md
|--------|------|-------------|
| `auditor_audit_duration_seconds` | Histogram | Audit() processing time per transaction, including lock acquisition |
| `auditor_audit_lock_conflicts_total` | Counter | Audit() calls that failed to acquire enrollment-ID locks |
| `auditor_append_duration_seconds` | Histogram | Append() processing time per transaction |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strats with fts_services_auditor prefix
see fts_services_auditor_auditor_append_duration_seconds

fts_services_auditor_auditor_duration_count{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw"} 10108
fts_services_auditor_auditor_duration_sum{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw"} 1127.6164152429933
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="0.005"} 128
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="0.01"} 228
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="0.025"} 230
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="0.05"} 230
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="0.1"} 6447
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="0.25"} 9997
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="0.5"} 10069
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="1.0"} 10093
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="2.5"} 10104
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="5.0"} 10105
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="10.0"} 10108
fts_services_auditor_auditor_duration_bucket{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw", le="+Inf"} 10108
fts_services_auditor_auditor_operations{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw"} 10108
fts_services_auditor_auditor_releases_total{instance="dectrust20.vpc.cloud9.ibm.com:10031", job="FSC.dw"}

Comment thread docs/metrics.md

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `unspent_tokens_invocations` | Counter | `fetcher_type` | Number of unspent-token fetch invocations |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starts with fts_services_selector_sherdlock_ prefix (need to duble check in the code)

See
fts_services_selector_sherdlock_unspent_tokens_invocations{fetcher_type="eager", instance="dectrust20.vpc.cloud9.ibm.com:10021", job="FSC.issuer"} 3302
fts_services_selector_sherdlock_unspent_tokens_invocations{fetcher_type="eager", instance="dectrust21.vpc.cloud9.ibm.com:10041", job="FSC.banka"}

Comment thread docs/metrics.md

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `certified_tokens` | Counter | `network`, `channel`, `namespace` | Number of tokens certified |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find them in Prometheus query, please see in the code which prefix the have.

Comment thread docs/metrics.md

| Metric | Type | Labels | Source | Description |
|--------|------|--------|--------|-------------|
| `cache_level` | Gauge | `network`, `channel`, `namespace` | `token/services/identity/idemix/cache/metrics.go` | Fill level of the Idemix credential cache |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starts with fts_core_common_metrics_ prifex

See

fts_core_common_metrics_cache_level{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10021", job="FSC.issuer", namespace="tokenchaincode", network="mytopos"} 4
fts_core_common_metrics_recipient_data_cache_level{channel="arma", instance="dectrust20.vpc.cloud9.ibm.com:10021", job="FSC.issuer", namespace="tokenchaincode", network="mytopos"}

Comment thread docs/metrics.md

| Metric | Type | Description |
|--------|------|-------------|
| `finality_queue_pending_events` | Gauge | Finality events currently waiting in the queue buffer |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has fts_services_network_fabricx_ prefix see

fts_services_network_fabricx_finality_queue_finality_queue_pending_events{instance="dectrust21.vpc.cloud9.ibm.com:10041", job="FSC.banka"}

Comment thread docs/metrics.md

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `ttx_envelope_sent_total` | Counter | `version`, `type` | Versioned envelopes sent |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find them in Prometheus query, please see in the code which prefix the have.

@AkramBitar

AkramBitar commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Hi @SuyashAlphaC , thanks for this effort. Let me ask @AkramBitar if he can verify the grafana dashboard in our deployment so see how it looks. Thanks @AkramBitar

Hello @SuyashAlphaC

Thanks a lot for your effort on that PR.

Following @adecaro request, I have checked the dashboard in our deployment and I did not see any information on it (see picture).

I think this is related to my review comments. Pease have a look at them.

image

Thanks a lot,
Akram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document Token SDK Monitoring Metrics and Identify Missing Metrics

3 participants