-
Notifications
You must be signed in to change notification settings - Fork 145
fix: error handling in reconcile loop #3753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughPer-event reconciliation errors in the event handler are no longer aggregated or returned; each failed event is logged and processing continues. The outer Reconcile function always returns nil after processing events. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
624419f to
50841c9
Compare
50841c9 to
5fedc00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openmeter/notification/eventhandler/reconcile.go (1)
76-135: The transaction design looks solid for batch resilience.You're right that errors from individual event reconciliations are logged but don't propagate, keeping the batch transaction alive. I've confirmed that
reconcileWebhookEventdoes perform DB writes (UpdateEventDeliveryStatus), and they happen inside the transaction boundary—so all successful status updates in the batch commit together. This is exactly what the PR aims for.One thing to note: since
reconcileEventswallows errors and returns void, the actual DB operations that might fail (status updates) are isolated withinreconcileWebhookEvent, and their errors are accumulated but don't stop processing other statuses. That's a clean isolation pattern.On performance: your sequential event processing here is a valid bottleneck concern for large batches. The TODO already flags parallelization—just keep in mind that parallel goroutines plus a shared transaction would need careful error aggregation (which you've already set up).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
openmeter/notification/eventhandler/reconcile.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
⚙️ CodeRabbit configuration file
**/*.go: In general when reviewing the Golang code make readability and maintainability a priority, even potentially suggest restructuring the code to improve them.Performance should be a priority in critical code paths. Anything related to event ingestion, message processing, database operations (regardless of database) should be vetted for potential performance bottlenecks.
Files:
openmeter/notification/eventhandler/reconcile.go
🧬 Code graph analysis (1)
openmeter/notification/eventhandler/reconcile.go (5)
openmeter/notification/webhook/noop/noop.go (1)
Handler(13-15)openmeter/notification/eventhandler/handler.go (1)
Handler(54-71)openmeter/notification/eventhandler/noop/handler.go (1)
Handler(11-11)openmeter/notification/event.go (1)
Event(51-67)pkg/framework/tracex/tracex.go (1)
StartWithNoValue(75-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Lint
- GitHub Check: Code Generators
- GitHub Check: Migration Checks
- GitHub Check: Build
- GitHub Check: Test
- GitHub Check: Repository Scan
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
openmeter/notification/eventhandler/reconcile.go (1)
21-64: Retry and idempotency are both solid here. Failed events stay inPending,Sending, orResendingstates and get picked up by the next reconciliation sweep—no silent drops. The DB updates insidereconcileWebhookEventare all to existing delivery status records (no inserts), so they're safe to retry. The void return onreconcileEventachieves the PR goal (isolation) without losing events; transient update failures just defer to the next cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
openmeter/notification/eventhandler/reconcile.go (1)
113-119: Consider adding observability for partial failures.The change correctly prevents transaction rollback on individual event failures, which aligns with the PR goal. However, since errors are only logged and the function always returns
nil, there's no programmatic way to track reconciliation failures.A few suggestions to improve this:
Add telemetry: Track failure count in span attributes so observability tools can alert on reconciliation issues:
failureCount := 0 for _, event := range out.Items { if err = h.reconcileEvent(ctx, &event); err != nil { failureCount++ h.logger.ErrorContext(ctx, "failed to reconcile notification event", ...) } } if failureCount > 0 { span.SetAttributes(attribute.Int("event_handler.reconcile.failures", failureCount)) }Verify transaction isolation: Ensure that if
reconcileEventpartially modifies database state before failing, those partial changes don't create inconsistent state when the outer transaction commits. If needed, consider wrapping eachreconcileEventcall in its own nested transaction/savepoint.Metrics: Consider emitting metrics for success/failure rates to enable monitoring and alerting on reconciliation health.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
openmeter/notification/eventhandler/reconcile.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
⚙️ CodeRabbit configuration file
**/*.go: In general when reviewing the Golang code make readability and maintainability a priority, even potentially suggest restructuring the code to improve them.Performance should be a priority in critical code paths. Anything related to event ingestion, message processing, database operations (regardless of database) should be vetted for potential performance bottlenecks.
Files:
openmeter/notification/eventhandler/reconcile.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Code Generators
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
openmeter/notification/eventhandler/reconcile.go (1)
129-129: Correctly prevents transaction rollback on partial failures.Returning
nilhere ensures the transaction commits even when individual event reconciliations fail, which matches the PR objective. This preserves successful reconciliations within the batch while isolating failures to individual events.Just make sure the observability gaps (mentioned in the previous comment) are addressed so you don't lose visibility into reconciliation health.
Overview
Ensure that transaction for the whole batch is not rolled back in case reconciling any of the events fail.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.