Skip to content

Conversation

@chrisgacsal
Copy link
Collaborator

@chrisgacsal chrisgacsal commented Nov 17, 2025

Overview

Fix reconcile state of delivery status when re-sending event by not overriding err which is used to control reconcile flow for delivery statuses in PENDING or SENDING states. This is a temporary measure until there is a caching logic implemented for reconciler to prevent re-fetching resources from webhook provider for each access request.

Summary by CodeRabbit

  • Bug Fixes

    • Corrected delivery status state prioritization for webhook notifications to ensure proper sequencing
  • Refactor

    • Enhanced webhook delivery error handling with improved diagnostic logging and error context management

@chrisgacsal chrisgacsal self-assigned this Nov 17, 2025
@chrisgacsal chrisgacsal requested a review from a team as a code owner November 17, 2025 12:20
@chrisgacsal chrisgacsal added the release-note/misc Miscellaneous changes label Nov 17, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

📝 Walkthrough

Walkthrough

This change adjusts delivery status priority ordering in the sorting function and refactors webhook event reconciliation to improve error handling, variable scoping, and logging. The webhook changes introduce structured logging keys, separate error variable scoping in retry paths, and return validation errors for payload serialization failures.

Changes

Cohort / File(s) Summary
Delivery Status Priority
openmeter/notification/eventhandler/deliverystatus.go
Updated sortDeliveryStatusStateByPriority map: Resending (0→1), Sending (1→2), Failed (2→3), Success (2→3), with Pending remaining at priority 0
Webhook Event Handler Refactoring
openmeter/notification/eventhandler/webhook.go
Refactored reconcileWebhookEvent error handling with explicit variable declarations and scoping; expanded error logging to use structured keys; added local error shadowing in Resending state; changed payload serialization errors to return webhook.ValidationError instead of plain error

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • webhook.go error handling paths: Multiple interrelated changes to variable scoping, error types, and logging keys across the reconciliation flow—each path needs verification for correctness
  • Logging key consistency: Ensure new structured keys (notification.delivery_status.state, notification.event.id) are correctly applied across all error branches
  • ValidationError wrapping: Confirm payload serialization errors correctly wrap underlying errors and maintain error chain integrity

Possibly related PRs

Suggested labels

release-note/bug-fix, kind/refactor

Suggested reviewers

  • gergely-kurucz-konghq

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: correcting reconciliation of delivery status when resending a message by avoiding an error variable override.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/reconcile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chrisgacsal chrisgacsal enabled auto-merge (squash) November 17, 2025 12:24
@chrisgacsal chrisgacsal merged commit e3b8ea2 into main Nov 17, 2025
29 of 31 checks passed
@chrisgacsal chrisgacsal deleted the fix/reconcile branch November 17, 2025 12:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
openmeter/notification/eventhandler/deliverystatus.go (1)

60-66: Updated priority map keeps RESENDING between PENDING and SENDING

The new priorities (PENDING=0, RESENDING=1, SENDING=2, FAILED/SUCCESS=3) line up nicely with how reconcileWebhookEvent processes active statuses — fresh PENDING first, then RESENDING retries, then long‑running SENDING. That fits the intent of the reconciler and the RESENDING fix.

Tiny nit: FAILED and SUCCESS still share the same numeric priority while the comparator only looks at < and otherwise returns 1. If this helper ever gets used on slices that include final states, that technically violates strict ordering and could make the sort result a bit non‑deterministic. If you care about that, you could either give them distinct priorities or add a tie‑breaker when priority[a.State] == priority[b.State].

openmeter/notification/eventhandler/webhook.go (3)

57-82: Hoisting err/webhooksOut/msg clarifies shared provider state

Moving err, webhooksOut, and msg to this scope makes it much clearer that you fetch webhooks and the provider message once per event and then reuse that state across all delivery statuses. That matches how the reconcile loop is written and plays nicely with the later checks on err and msg for PENDING/SENDING states.

One small readability thought: the same err variable is also reused later for UpdateEventDeliveryStatus (Line 422), so a repo error in one iteration will overwrite the provider error that subsequent iterations see. That behavior existed before this refactor, but it’s a bit surprising when you’re mentally tracking which error the webhook.Is*Error checks are referring to. If you want to make this more obvious/robust, splitting it into e.g. providerErr (for get/send) and updateErr (for the repo writes) would keep the control flow easier to reason about and avoid any chance of DB errors influencing webhook error handling.


152-167: Routing validation errors into the unrecoverable branch looks good

Adding webhook.IsValidationError(err) alongside IsUnrecoverableError in the PENDING branch is a nice touch — payload/validation issues aren’t realistically recoverable by retrying the provider, so treating them as final failures is a sane choice. The extra structured fields in the error log (notification.delivery_status.state, notification.event.id, etc.) should also make debugging much easier.

Minor wording nit: the log message and span text still talk about “fetching webhook message from provider” even though, with the new ValidationError path, some of those errors can now originate from sendWebhookMessage (payload serialization) rather than just the fetch. Not a big deal, but if logs are used heavily for triage, you might want to tweak the wording to something a bit more generic like “handling webhook message at provider”.


243-257: Keeping resend errors local fixes RESENDING reconciliation without breaking PENDING/SENDING

The local err := h.resendWebhookMessage(...) here is exactly what you want: errors from the resend operation no longer stomp on the outer err that drives the PENDING/SENDING logic. That lines up with the PR description and avoids the situation where a resend failure accidentally gets interpreted as a provider fetch/send error for other statuses in the same reconcile run.

Tiny comment nit: “Ignore any error from the resend operation” is a bit ambiguous since you still append it to errs (which is good). If you care about precision, something like “Ignore resend errors for state transitions, but still surface them via errs” would match what the code actually does.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96243e5 and 988b65a.

📒 Files selected for processing (2)
  • openmeter/notification/eventhandler/deliverystatus.go (1 hunks)
  • openmeter/notification/eventhandler/webhook.go (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

⚙️ CodeRabbit configuration file

**/*.go: In general when reviewing the Golang code make readability and maintainability a priority, even potentially suggest restructuring the code to improve them.

Performance should be a priority in critical code paths. Anything related to event ingestion, message processing, database operations (regardless of database) should be vetted for potential performance bottlenecks.

Files:

  • openmeter/notification/eventhandler/webhook.go
  • openmeter/notification/eventhandler/deliverystatus.go
🧠 Learnings (2)
📚 Learning: 2025-11-07T12:29:09.087Z
Learnt from: chrisgacsal
Repo: openmeterio/openmeter PR: 3581
File: openmeter/notification/eventhandler/webhook.go:391-396
Timestamp: 2025-11-07T12:29:09.087Z
Learning: In the OpenMeter notification system (specifically in openmeter/notification/eventhandler/webhook.go), webhook channels are equivalent to notification rules. When sending webhook messages, the rule ID (event.Rule.ID) should be used as the channel identifier in the Channels field of SendMessageInput.

Applied to files:

  • openmeter/notification/eventhandler/webhook.go
📚 Learning: 2025-11-07T11:28:40.430Z
Learnt from: chrisgacsal
Repo: openmeterio/openmeter PR: 3581
File: openmeter/notification/webhook/svix/webhook.go:253-257
Timestamp: 2025-11-07T11:28:40.430Z
Learning: In the openmeter notification system (specifically in openmeter/notification/webhook/svix/webhook.go), event type filtering for webhooks is not allowed or supported. Only channel-based filtering is used.

Applied to files:

  • openmeter/notification/eventhandler/webhook.go
🧬 Code graph analysis (2)
openmeter/notification/eventhandler/webhook.go (2)
openmeter/notification/webhook/handler.go (3)
  • Webhook (14-28)
  • ListWebhooksInput (32-38)
  • Message (187-207)
openmeter/notification/webhook/errors.go (3)
  • IsUnrecoverableError (150-152)
  • IsValidationError (45-47)
  • NewValidationError (37-43)
openmeter/notification/eventhandler/deliverystatus.go (1)
openmeter/notification/deliverystatus.go (4)
  • EventDeliveryStatusStateResending (18-18)
  • EventDeliveryStatusStateSending (16-16)
  • EventDeliveryStatusStateFailed (15-15)
  • EventDeliveryStatusStateSuccess (14-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Artifacts / Benthos Collector Container image
  • GitHub Check: Artifacts / Container image
  • GitHub Check: Migration Checks
  • GitHub Check: Lint
  • GitHub Check: Code Generators
  • GitHub Check: Test
  • GitHub Check: Build
  • GitHub Check: Repository Scan
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
openmeter/notification/eventhandler/webhook.go (1)

536-541: Wrapping payload serialization failures in ValidationError fits the error model

Returning webhook.NewValidationError(fmt.Errorf("failed to serialize webhook message payload: %w", err)) on eventAsPayload failure makes the error taxonomy much cleaner: reconcile can now explicitly treat bad/unsupported payloads as validation problems, and the PENDING branch’s IsValidationError check correctly routes them into the unrecoverable path instead of retrying.

You still preserve useful context via the wrapped message, and this plays nicely with the rest of the webhook error helpers. Also nice to see Channels: []string{event.Rule.ID} is still used here, which is consistent with the “rule ID as channel identifier” behavior of the webhook subsystem. Based on learnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/misc Miscellaneous changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants