Skip to content

Conversation

@chrisgacsal
Copy link
Collaborator

@chrisgacsal chrisgacsal commented Jan 6, 2026

Overview

Ensure that transaction for the whole batch is not rolled back in case reconciling any of the events fail.

Summary by CodeRabbit

  • Refactor
    • Notification reconciliation now logs individual event failures and continues processing remaining events instead of aggregating and returning a combined error.

✏️ Tip: You can customize this high-level summary in your review settings.

@chrisgacsal chrisgacsal requested a review from tothandras January 6, 2026 00:05
@chrisgacsal chrisgacsal self-assigned this Jan 6, 2026
@chrisgacsal chrisgacsal requested a review from a team as a code owner January 6, 2026 00:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

Per-event reconciliation errors in the event handler are no longer aggregated or returned; each failed event is logged and processing continues. The outer Reconcile function always returns nil after processing events.

Changes

Cohort / File(s) Summary
Event handler reconcile change
openmeter/notification/eventhandler/reconcile.go
Removed per-event error aggregation and returning; errors from individual reconcileEvent calls are logged and not collected. The outer reconciliation loop continues on errors and returns nil. No exported signatures changed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • tothandras
  • turip

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: error handling in reconcile loop' directly and specifically captures the main change—adjusting error handling in the reconciliation process to prevent batch rollbacks on individual event failures.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/notifications

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chrisgacsal chrisgacsal added the release-note/bug-fix Release note: Bug Fixes label Jan 6, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openmeter/notification/eventhandler/reconcile.go (1)

76-135: The transaction design looks solid for batch resilience.

You're right that errors from individual event reconciliations are logged but don't propagate, keeping the batch transaction alive. I've confirmed that reconcileWebhookEvent does perform DB writes (UpdateEventDeliveryStatus), and they happen inside the transaction boundary—so all successful status updates in the batch commit together. This is exactly what the PR aims for.

One thing to note: since reconcileEvent swallows errors and returns void, the actual DB operations that might fail (status updates) are isolated within reconcileWebhookEvent, and their errors are accumulated but don't stop processing other statuses. That's a clean isolation pattern.

On performance: your sequential event processing here is a valid bottleneck concern for large batches. The TODO already flags parallelization—just keep in mind that parallel goroutines plus a shared transaction would need careful error aggregation (which you've already set up).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73b16ce and 624419f.

📒 Files selected for processing (1)
  • openmeter/notification/eventhandler/reconcile.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

⚙️ CodeRabbit configuration file

**/*.go: In general when reviewing the Golang code make readability and maintainability a priority, even potentially suggest restructuring the code to improve them.

Performance should be a priority in critical code paths. Anything related to event ingestion, message processing, database operations (regardless of database) should be vetted for potential performance bottlenecks.

Files:

  • openmeter/notification/eventhandler/reconcile.go
🧬 Code graph analysis (1)
openmeter/notification/eventhandler/reconcile.go (5)
openmeter/notification/webhook/noop/noop.go (1)
  • Handler (13-15)
openmeter/notification/eventhandler/handler.go (1)
  • Handler (54-71)
openmeter/notification/eventhandler/noop/handler.go (1)
  • Handler (11-11)
openmeter/notification/event.go (1)
  • Event (51-67)
pkg/framework/tracex/tracex.go (1)
  • StartWithNoValue (75-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Lint
  • GitHub Check: Code Generators
  • GitHub Check: Migration Checks
  • GitHub Check: Build
  • GitHub Check: Test
  • GitHub Check: Repository Scan
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
openmeter/notification/eventhandler/reconcile.go (1)

21-64: Retry and idempotency are both solid here. Failed events stay in Pending, Sending, or Resending states and get picked up by the next reconciliation sweep—no silent drops. The DB updates inside reconcileWebhookEvent are all to existing delivery status records (no inserts), so they're safe to retry. The void return on reconcileEvent achieves the PR goal (isolation) without losing events; transient update failures just defer to the next cycle.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
openmeter/notification/eventhandler/reconcile.go (1)

113-119: Consider adding observability for partial failures.

The change correctly prevents transaction rollback on individual event failures, which aligns with the PR goal. However, since errors are only logged and the function always returns nil, there's no programmatic way to track reconciliation failures.

A few suggestions to improve this:

  1. Add telemetry: Track failure count in span attributes so observability tools can alert on reconciliation issues:

    failureCount := 0
    for _, event := range out.Items {
        if err = h.reconcileEvent(ctx, &event); err != nil {
            failureCount++
            h.logger.ErrorContext(ctx, "failed to reconcile notification event", ...)
        }
    }
    if failureCount > 0 {
        span.SetAttributes(attribute.Int("event_handler.reconcile.failures", failureCount))
    }
  2. Verify transaction isolation: Ensure that if reconcileEvent partially modifies database state before failing, those partial changes don't create inconsistent state when the outer transaction commits. If needed, consider wrapping each reconcileEvent call in its own nested transaction/savepoint.

  3. Metrics: Consider emitting metrics for success/failure rates to enable monitoring and alerting on reconciliation health.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 624419f and 5fedc00.

📒 Files selected for processing (1)
  • openmeter/notification/eventhandler/reconcile.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

⚙️ CodeRabbit configuration file

**/*.go: In general when reviewing the Golang code make readability and maintainability a priority, even potentially suggest restructuring the code to improve them.

Performance should be a priority in critical code paths. Anything related to event ingestion, message processing, database operations (regardless of database) should be vetted for potential performance bottlenecks.

Files:

  • openmeter/notification/eventhandler/reconcile.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Code Generators
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
openmeter/notification/eventhandler/reconcile.go (1)

129-129: Correctly prevents transaction rollback on partial failures.

Returning nil here ensures the transaction commits even when individual event reconciliations fail, which matches the PR objective. This preserves successful reconciliations within the batch while isolating failures to individual events.

Just make sure the observability gaps (mentioned in the previous comment) are addressed so you don't lose visibility into reconciliation health.

@chrisgacsal chrisgacsal requested a review from turip January 6, 2026 00:18
@chrisgacsal chrisgacsal enabled auto-merge (squash) January 6, 2026 00:25
@chrisgacsal chrisgacsal disabled auto-merge January 6, 2026 00:25
@chrisgacsal chrisgacsal merged commit 9bf2d72 into main Jan 6, 2026
28 checks passed
@chrisgacsal chrisgacsal deleted the fix/notifications branch January 6, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/bug-fix Release note: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants