Skip to content

Conversation

@MasterPtato
Copy link
Contributor

@MasterPtato MasterPtato commented Jan 17, 2026

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 17, 2026

🚅 Deployed to the rivet-pr-3952 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud 😴 Sleeping (View Logs) Web Jan 23, 2026 at 1:19 am
frontend-inspector 😴 Sleeping (View Logs) Web Jan 23, 2026 at 1:18 am
website ❌ Build Failed (View Logs) Web Jan 23, 2026 at 1:10 am

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 17, 2026 00:30 Destroyed
Copy link
Contributor Author

MasterPtato commented Jan 17, 2026


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@MasterPtato MasterPtato changed the title fix: day 1 patch, add dashboards feat: add actor and kv metrics Jan 17, 2026
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 67c4a86 to 8b7ad2b Compare January 20, 2026 18:48
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 20, 2026 18:48 Destroyed
@claude
Copy link

claude bot commented Jan 20, 2026

PR Review: Add Actor and KV Metrics

This PR adds comprehensive metrics tracking for Rivet Actors and KV operations. Overall, the implementation is solid with good patterns, but there are several issues that should be addressed.


Critical Issues

1. Duplicate Metric Increment in setup.rs

Location: engine/packages/pegboard/src/workflows/actor/setup.rs:114-127

The TotalActors metric is incremented twice in the same transaction:

// Update metrics
keys::ns::metric::inc(
    &tx,
    input.namespace_id,
    keys::ns::metric::Metric::TotalActors(input.name.clone()),
    1,
);

// Update metrics  
keys::ns::metric::inc(
    &tx,
    input.namespace_id,
    keys::ns::metric::Metric::TotalActors(input.name.clone()),
    1,
);

Impact: This will cause the total actor count to be double what it should be, breaking usage tracking and billing.

Fix: Remove one of the duplicate calls.


Code Quality Issues

2. Inconsistent Byte Order in Metrics

Location: engine/packages/pegboard/src/keys/ns/metric.rs:84-90

The pegboard metrics use little-endian (LE) byte order:

fn deserialize(&self, raw: &[u8]) -> Result<Self::Value> {
    Ok(i64::from_le_bytes(raw.try_into()?))
}

While gasoline metrics (engine/packages/gasoline/src/db/kv/keys/metric.rs) also use LE, most other keys in the codebase use big-endian (BE) for timestamps and numeric values (see keys/actor.rs).

Concern: While LE works for atomic operations, the inconsistency could lead to confusion. The comment "IMPORTANT: Uses LE bytes, not BE" suggests this is intentional, but it's worth documenting why LE is preferred for metrics.

Recommendation: Add a comment explaining the rationale (likely related to FDB atomic operations).

3. Potential Integer Overflow with try_into().unwrap_or_default()

Location: Multiple files including:

  • engine/packages/pegboard/src/actor_kv/mod.rs:130
  • engine/packages/pegboard/src/actor_kv/mod.rs:283
total_size_chunked.try_into().unwrap_or_default()

Issue: If total_size_chunked (a u64) exceeds i64::MAX, the conversion fails and silently defaults to 0, causing metrics to be underreported.

Recommendation: Either:

  1. Use saturating conversion: total_size_chunked.min(i64::MAX as u64) as i64
  2. Or handle the error explicitly with proper logging

4. Missing Metric Decrement on Actor Destruction

Location: engine/packages/pegboard/src/workflows/actor/destroy.rs

The TotalActors metric is incremented on creation (setup.rs:114-127) but I don't see a corresponding decrement in the destroy workflow.

Impact: The total actors metric will keep growing indefinitely, never reflecting actual active actors.

Recommendation: Add a metric decrement in the destroy workflow:

keys::ns::metric::inc(
    &tx,
    namespace_id,
    keys::ns::metric::Metric::TotalActors(actor_name.to_string()),
    -1,
);

Performance Considerations

5. KV Size Estimation on Every Operation

Location: engine/packages/pegboard/src/actor_kv/mod.rs:265

let total_size = estimate_kv_size(&tx, recipient.actor_id).await? as usize;

Issue: The put operation calls estimate_kv_size on every write to validate against MAX_STORAGE_SIZE. FDB's range size estimation can be expensive for large key ranges.

Impact: This could become a bottleneck for high-frequency KV writes.

Recommendation: Consider:

  1. Tracking approximate size in memory/state
  2. Only checking periodically or when approaching limits
  3. Or accepting the cost but documenting it

6. Metrics Recording Adds Extra Transaction Load

All KV operations now include atomic metric updates in the same transaction. While atomic ops are fast, they still add:

  • Database load
  • Transaction size
  • Potential conflict ranges

Recommendation: This is probably acceptable, but monitor transaction conflicts and consider batching if needed.


Security & Best Practices

7. Missing Validation on Metric Names

Location: engine/packages/pegboard/src/keys/ns/metric.rs

Actor names are used directly in metric keys without validation or sanitization. While actor names are validated elsewhere, defensive programming would validate here too.

Recommendation: Consider adding length limits or character validation to prevent potential key explosion.

8. Atomic Operations Without Validation

Location: Multiple keys::ns::metric::inc calls

The inc function uses atomic add without any bounds checking:

pub fn inc(tx: &universaldb::Transaction, namespace_id: Id, metric: Metric, by: i64) {
    tx.atomic_op(
        &MetricKey::new(namespace_id, metric),
        &by.to_le_bytes(),
        MutationType::Add,
    );
}

Issue: Metrics could theoretically overflow i64::MAX with enough operations.

Recommendation: This is probably acceptable for now, but consider monitoring or capping if metrics could realistically overflow.


Documentation & Testing

9. Empty Test Files

Location:

  • engine/packages/pegboard/tests/kv_list_edge_cases.rs (0 additions/deletions)
  • engine/packages/pegboard/tests/kv_operations.rs (0 additions/deletions)

These test files are listed in the PR but appear to be empty placeholders.

Recommendation: Either:

  1. Add actual test coverage for KV metrics
  2. Remove the placeholder files

10. Missing Documentation on Metrics

The new metrics types lack documentation on:

  • Units (bytes, seconds, count)
  • When they're updated
  • Expected ranges
  • Billing implications

Recommendation: Add doc comments to each metric variant explaining what it measures and how it's used.


Minor Issues

11. Inconsistent Naming Convention

Some metrics use past tense (KvStorageUsed) while others use present tense (KvRead, KvWrite).

Recommendation: Standardize on one convention.

12. Grafana Dashboard Updates

The PR includes Grafana dashboard changes across multiple environments. Ensure these were generated from the template as per CLAUDE.md requirements.


Positive Aspects

Good separation of concerns - Metrics logic is properly encapsulated in dedicated modules
Atomic operations - Using FDB atomic ops for metrics is the right approach
Consistent patterns - Metrics recording follows similar patterns across KV and gateway operations
Structured logging - Good use of tracing with structured fields
Proper error handling - Most error cases are handled appropriately


Recommendations Summary

Must Fix:

  1. Remove duplicate TotalActors increment in setup.rs
  2. Add TotalActors decrement in destroy workflow
  3. Fix or document the integer overflow potential in metric recording

Should Fix:
4. Add test coverage or remove empty test files
5. Document metric units and semantics
6. Consider performance implications of estimate_kv_size on every write

Nice to Have:
7. Standardize metric naming conventions
8. Document why LE byte order is used for metrics
9. Add validation/sanitization for metric names


Let me know if you'd like me to elaborate on any of these points or if you have questions about the recommendations!

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 20, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3952

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3952

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3952

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3952

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3952

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3952

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3952

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3952

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3952

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3952

commit: fc05f9d

@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 8b7ad2b to b47e44c Compare January 20, 2026 20:53
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 20, 2026 20:53 Destroyed
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from b47e44c to fc05f9d Compare January 21, 2026 21:33
@MasterPtato MasterPtato marked this pull request as ready for review January 21, 2026 21:33
Comment on lines +113 to +127
// Update metrics
keys::ns::metric::inc(
&tx,
input.namespace_id,
keys::ns::metric::Metric::TotalActors(input.name.clone()),
1,
);

// Update metrics
keys::ns::metric::inc(
&tx,
input.namespace_id,
keys::ns::metric::Metric::TotalActors(input.name.clone()),
1,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate metric increment will double-count actor creation. The TotalActors metric is incremented twice with identical parameters.

// Remove the duplicate block (lines 121-127)
keys::ns::metric::inc(
    &tx,
    input.namespace_id,
    keys::ns::metric::Metric::TotalActors(input.name.clone()),
    1,
);

This causes the actor count to be incremented by 2 instead of 1 for each new actor, leading to incorrect metrics reporting.

Suggested change
// Update metrics
keys::ns::metric::inc(
&tx,
input.namespace_id,
keys::ns::metric::Metric::TotalActors(input.name.clone()),
1,
);
// Update metrics
keys::ns::metric::inc(
&tx,
input.namespace_id,
keys::ns::metric::Metric::TotalActors(input.name.clone()),
1,
);
// Update metrics
keys::ns::metric::inc(
&tx,
input.namespace_id,
keys::ns::metric::Metric::TotalActors(input.name.clone()),
1,
);

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants