Problem
The Transaction History (TH) component in §01 computes:
TH = settled_count / total_count # all-time lifetime ratio
An agent that settled 500 consecutive transactions then failed on the last 20 still scores TH = 0.96. The ALPHA composite score barely moves, yet the agent is demonstrably degraded. The lifetime ratio is accurate as a historical statement but misleading as a current trust signal.
Real-World Evidence
We ran a 20-agent behavioral monitoring study (published as PDR v1.0 on Zenodo) tracking agent reliability over 30 days. Key finding:
Window-based scoring (last 14 days) detected a 57% reliability drop that the lifetime aggregate missed entirely. The same agents, using lifetime settlement rate, appeared "stable" for 72 hours after degradation began.
This is structurally identical to TH's current design: lifetime statistics absorb recent failures.
Proposed Fix
Add a windowed settlement rate as an optional TH mode:
- AgentBaseline stores a ring buffer of recent transaction outcomes (configurable window, e.g., last 90 days or last N transactions)
- computes from the window when sufficient data exists, falls back to lifetime for new agents
- A new parameter (default: 90, None = lifetime, backward-compat)
Spec change (§01):
TH SHOULD use a windowed settlement rate over a configurable lookback period (default: 90 days). When fewer than 5 transactions are available in the window, implementations MUST fall back to the lifetime rate.
Python implementation sketch:
def _compute_th(self, b: AgentBaseline, window_days: int = 90) -> float:
# Use recent window if we have enough data
if b.recent_window and len(b.recent_window) >= 5:
settled = sum(1 for txn in b.recent_window if txn.settled and not txn.cancelled)
return settled / len(b.recent_window)
# Fallback to lifetime
if b.total_count == 0:
return 0.5
return b.settled_count / b.total_count
Why This Matters
TH carries 0.25 weight in ALPHA. With lifetime averaging, a degraded agent's ALPHA score decays logarithmically — too slowly for real-time trust routing. A 90-day window makes TH responsive to the actual current state, which is what counterparties need when deciding whether to transact.
This is the same design principle behind credit score "recent inquiry" weighting vs. total history — the recent signal is higher-information for predicting near-future behavior.
Implementation Notes
- Backward compat: When
th_window_days=None, behavior is identical to current v0.1 spec. No breaking changes.
- Storage cost: A 90-day ring buffer for an active agent is O(N) where N is max daily transactions × 90. Trivial for most deployments.
- CI: The existing test suite structure in
tests/test_scorer.py makes adding windowed TH tests straightforward.
Happy to submit an RFC (spec/rfcs/RFC-0001-windowed-th.md) and a PR to truce-py if this direction resonates.
This connects to the broader question in Issue #1 about Layer 3 Community Signals — recency weighting in community reputation data (not just settlement history) has the same structural problem. Windowed TH could be the template pattern.
Problem
The Transaction History (TH) component in §01 computes:
An agent that settled 500 consecutive transactions then failed on the last 20 still scores TH = 0.96. The ALPHA composite score barely moves, yet the agent is demonstrably degraded. The lifetime ratio is accurate as a historical statement but misleading as a current trust signal.
Real-World Evidence
We ran a 20-agent behavioral monitoring study (published as PDR v1.0 on Zenodo) tracking agent reliability over 30 days. Key finding:
This is structurally identical to TH's current design: lifetime statistics absorb recent failures.
Proposed Fix
Add a windowed settlement rate as an optional TH mode:
Spec change (§01):
Python implementation sketch:
Why This Matters
TH carries 0.25 weight in ALPHA. With lifetime averaging, a degraded agent's ALPHA score decays logarithmically — too slowly for real-time trust routing. A 90-day window makes TH responsive to the actual current state, which is what counterparties need when deciding whether to transact.
This is the same design principle behind credit score "recent inquiry" weighting vs. total history — the recent signal is higher-information for predicting near-future behavior.
Implementation Notes
th_window_days=None, behavior is identical to current v0.1 spec. No breaking changes.tests/test_scorer.pymakes adding windowed TH tests straightforward.Happy to submit an RFC (
spec/rfcs/RFC-0001-windowed-th.md) and a PR totruce-pyif this direction resonates.This connects to the broader question in Issue #1 about Layer 3 Community Signals — recency weighting in community reputation data (not just settlement history) has the same structural problem. Windowed TH could be the template pattern.