feat: add actor and kv metrics #3952

MasterPtato · 2026-01-17T00:29:50Z

No description provided.

railway-app · 2026-01-17T00:29:58Z

🚅 Deployed to the rivet-pr-3952 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
frontend-cloud	😴 Sleeping (View Logs)	Web	Jan 23, 2026 at 1:19 am
frontend-inspector	😴 Sleeping (View Logs)	Web	Jan 23, 2026 at 1:18 am
website	❌ Build Failed (View Logs)	Web	Jan 23, 2026 at 1:10 am

MasterPtato · 2026-01-17T00:30:07Z

DO NOT MERGE: add metrics exporter for ee #4014
chore: clean up pools, remove unused crates #4013
feat: add actor and kv metrics #3952 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-01-20T18:50:43Z

PR Review: Add Actor and KV Metrics

This PR adds comprehensive metrics tracking for Rivet Actors and KV operations. Overall, the implementation is solid with good patterns, but there are several issues that should be addressed.

Critical Issues

1. Duplicate Metric Increment in `setup.rs`

Location: engine/packages/pegboard/src/workflows/actor/setup.rs:114-127

The TotalActors metric is incremented twice in the same transaction:

// Update metrics
keys::ns::metric::inc(
    &tx,
    input.namespace_id,
    keys::ns::metric::Metric::TotalActors(input.name.clone()),
    1,
);

// Update metrics  
keys::ns::metric::inc(
    &tx,
    input.namespace_id,
    keys::ns::metric::Metric::TotalActors(input.name.clone()),
    1,
);

Impact: This will cause the total actor count to be double what it should be, breaking usage tracking and billing.

Fix: Remove one of the duplicate calls.

Code Quality Issues

2. Inconsistent Byte Order in Metrics

Location: engine/packages/pegboard/src/keys/ns/metric.rs:84-90

The pegboard metrics use little-endian (LE) byte order:

fn deserialize(&self, raw: &[u8]) -> Result<Self::Value> {
    Ok(i64::from_le_bytes(raw.try_into()?))
}

While gasoline metrics (engine/packages/gasoline/src/db/kv/keys/metric.rs) also use LE, most other keys in the codebase use big-endian (BE) for timestamps and numeric values (see keys/actor.rs).

Concern: While LE works for atomic operations, the inconsistency could lead to confusion. The comment "IMPORTANT: Uses LE bytes, not BE" suggests this is intentional, but it's worth documenting why LE is preferred for metrics.

Recommendation: Add a comment explaining the rationale (likely related to FDB atomic operations).

3. Potential Integer Overflow with `try_into().unwrap_or_default()`

Location: Multiple files including:

engine/packages/pegboard/src/actor_kv/mod.rs:130
engine/packages/pegboard/src/actor_kv/mod.rs:283

total_size_chunked.try_into().unwrap_or_default()

Issue: If total_size_chunked (a u64) exceeds i64::MAX, the conversion fails and silently defaults to 0, causing metrics to be underreported.

Recommendation: Either:

Use saturating conversion: total_size_chunked.min(i64::MAX as u64) as i64
Or handle the error explicitly with proper logging

4. Missing Metric Decrement on Actor Destruction

Location: engine/packages/pegboard/src/workflows/actor/destroy.rs

The TotalActors metric is incremented on creation (setup.rs:114-127) but I don't see a corresponding decrement in the destroy workflow.

Impact: The total actors metric will keep growing indefinitely, never reflecting actual active actors.

Recommendation: Add a metric decrement in the destroy workflow:

keys::ns::metric::inc(
    &tx,
    namespace_id,
    keys::ns::metric::Metric::TotalActors(actor_name.to_string()),
    -1,
);

Performance Considerations

5. KV Size Estimation on Every Operation

Location: engine/packages/pegboard/src/actor_kv/mod.rs:265

let total_size = estimate_kv_size(&tx, recipient.actor_id).await? as usize;

Issue: The put operation calls estimate_kv_size on every write to validate against MAX_STORAGE_SIZE. FDB's range size estimation can be expensive for large key ranges.

Impact: This could become a bottleneck for high-frequency KV writes.

Recommendation: Consider:

Tracking approximate size in memory/state
Only checking periodically or when approaching limits
Or accepting the cost but documenting it

6. Metrics Recording Adds Extra Transaction Load

All KV operations now include atomic metric updates in the same transaction. While atomic ops are fast, they still add:

Database load
Transaction size
Potential conflict ranges

Recommendation: This is probably acceptable, but monitor transaction conflicts and consider batching if needed.

Security & Best Practices

7. Missing Validation on Metric Names

Location: engine/packages/pegboard/src/keys/ns/metric.rs

Actor names are used directly in metric keys without validation or sanitization. While actor names are validated elsewhere, defensive programming would validate here too.

Recommendation: Consider adding length limits or character validation to prevent potential key explosion.

8. Atomic Operations Without Validation

Location: Multiple keys::ns::metric::inc calls

The inc function uses atomic add without any bounds checking:

pub fn inc(tx: &universaldb::Transaction, namespace_id: Id, metric: Metric, by: i64) {
    tx.atomic_op(
        &MetricKey::new(namespace_id, metric),
        &by.to_le_bytes(),
        MutationType::Add,
    );
}

Issue: Metrics could theoretically overflow i64::MAX with enough operations.

Recommendation: This is probably acceptable for now, but consider monitoring or capping if metrics could realistically overflow.

Documentation & Testing

9. Empty Test Files

Location:

engine/packages/pegboard/tests/kv_list_edge_cases.rs (0 additions/deletions)
engine/packages/pegboard/tests/kv_operations.rs (0 additions/deletions)

These test files are listed in the PR but appear to be empty placeholders.

Recommendation: Either:

Add actual test coverage for KV metrics
Remove the placeholder files

10. Missing Documentation on Metrics

The new metrics types lack documentation on:

Units (bytes, seconds, count)
When they're updated
Expected ranges
Billing implications

Recommendation: Add doc comments to each metric variant explaining what it measures and how it's used.

Minor Issues

11. Inconsistent Naming Convention

Some metrics use past tense (KvStorageUsed) while others use present tense (KvRead, KvWrite).

Recommendation: Standardize on one convention.

12. Grafana Dashboard Updates

The PR includes Grafana dashboard changes across multiple environments. Ensure these were generated from the template as per CLAUDE.md requirements.

Positive Aspects

✅ Good separation of concerns - Metrics logic is properly encapsulated in dedicated modules
✅ Atomic operations - Using FDB atomic ops for metrics is the right approach
✅ Consistent patterns - Metrics recording follows similar patterns across KV and gateway operations
✅ Structured logging - Good use of tracing with structured fields
✅ Proper error handling - Most error cases are handled appropriately

Recommendations Summary

Must Fix:

Remove duplicate TotalActors increment in setup.rs
Add TotalActors decrement in destroy workflow
Fix or document the integer overflow potential in metric recording

Should Fix:
4. Add test coverage or remove empty test files
5. Document metric units and semantics
6. Consider performance implications of estimate_kv_size on every write

Nice to Have:
7. Standardize metric naming conventions
8. Document why LE byte order is used for metrics
9. Add validation/sanitization for metric names

Let me know if you'd like me to elaborate on any of these points or if you have questions about the recommendations!

pkg-pr-new · 2026-01-20T18:51:32Z

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3952

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3952

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3952

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3952

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3952

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3952

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3952

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3952

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3952

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3952

commit: fc05f9d

graphite-app · 2026-01-21T21:37:09Z

engine/packages/pegboard/src/workflows/actor/setup.rs

+			// Update metrics
+			keys::ns::metric::inc(
+				&tx,
+				input.namespace_id,
+				keys::ns::metric::Metric::TotalActors(input.name.clone()),
+				1,
+			);
+
+			// Update metrics
+			keys::ns::metric::inc(
+				&tx,
+				input.namespace_id,
+				keys::ns::metric::Metric::TotalActors(input.name.clone()),
+				1,
+			);


Duplicate metric increment will double-count actor creation. The TotalActors metric is incremented twice with identical parameters.

// Remove the duplicate block (lines 121-127) keys::ns::metric::inc( &tx, input.namespace_id, keys::ns::metric::Metric::TotalActors(input.name.clone()), 1, );

This causes the actor count to be incremented by 2 instead of 1 for each new actor, leading to incorrect metrics reporting.

Suggested change

// Update metrics

keys::ns::metric::inc(

&tx,

input.namespace_id,

keys::ns::metric::Metric::TotalActors(input.name.clone()),

1,

);

// Update metrics

keys::ns::metric::inc(

&tx,

input.namespace_id,

keys::ns::metric::Metric::TotalActors(input.name.clone()),

1,

);

// Update metrics

keys::ns::metric::inc(

&tx,

input.namespace_id,

keys::ns::metric::Metric::TotalActors(input.name.clone()),

1,

);

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 17, 2026 00:30 Destroyed

MasterPtato changed the title ~~fix: day 1 patch, add dashboards~~ feat: add actor and kv metrics Jan 17, 2026

MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 67c4a86 to 8b7ad2b Compare January 20, 2026 18:48

railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 20, 2026 18:48 Destroyed

MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 8b7ad2b to b47e44c Compare January 20, 2026 20:53

railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 20, 2026 20:53 Destroyed

MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from b47e44c to fc05f9d Compare January 21, 2026 21:33

railway-app bot requested a deployment to rivet-frontend / rivet-pr-3952 January 21, 2026 21:33 In progress

MasterPtato marked this pull request as ready for review January 21, 2026 21:33

MasterPtato requested a review from NathanFlurry January 21, 2026 21:33

graphite-app bot reviewed Jan 21, 2026

View reviewed changes

This was referenced Jan 23, 2026

chore: clean up pools, remove unused crates #4013

Draft

DO NOT MERGE: add metrics exporter for ee #4014

Draft

feat: add actor and kv metrics

8246e45

MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from fc05f9d to 8246e45 Compare January 23, 2026 01:09

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-3952 January 23, 2026 01:09 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add actor and kv metrics #3952

feat: add actor and kv metrics #3952

MasterPtato commented Jan 17, 2026 •

edited

Loading

Uh oh!

railway-app bot commented Jan 17, 2026 •

edited

Loading

Uh oh!

MasterPtato commented Jan 17, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

graphite-app bot Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add actor and kv metrics #3952

Are you sure you want to change the base?

feat: add actor and kv metrics #3952

Conversation

MasterPtato commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

railway-app bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterPtato commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add Actor and KV Metrics

Critical Issues

1. Duplicate Metric Increment in setup.rs

Code Quality Issues

2. Inconsistent Byte Order in Metrics

3. Potential Integer Overflow with try_into().unwrap_or_default()

4. Missing Metric Decrement on Actor Destruction

Performance Considerations

5. KV Size Estimation on Every Operation

6. Metrics Recording Adds Extra Transaction Load

Security & Best Practices

7. Missing Validation on Metric Names

8. Atomic Operations Without Validation

Documentation & Testing

9. Empty Test Files

10. Missing Documentation on Metrics

Minor Issues

11. Inconsistent Naming Convention

12. Grafana Dashboard Updates

Positive Aspects

Recommendations Summary

Uh oh!

pkg-pr-new bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

graphite-app bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MasterPtato commented Jan 17, 2026 •

edited

Loading

railway-app bot commented Jan 17, 2026 •

edited

Loading

MasterPtato commented Jan 17, 2026 •

edited

Loading

claude bot commented Jan 20, 2026 •

edited

Loading

1. Duplicate Metric Increment in `setup.rs`

3. Potential Integer Overflow with `try_into().unwrap_or_default()`

pkg-pr-new bot commented Jan 20, 2026 •

edited

Loading