Skip to content

Conversation

@capcom6
Copy link
Member

@capcom6 capcom6 commented May 11, 2025

Summary by CodeRabbit

  • New Features

    • Improved error reporting for message sending, allowing users to see which specific messages failed.
    • Retry attempts for failed messages are now tracked, with a limit on the number of retries.
    • Messages exceeding retry limits are blacklisted temporarily to prevent repeated failures.
    • Enhanced monitoring with new metrics for retry attempts, blacklist operations, and messages reaching the maximum retry limit.
  • Bug Fixes

    • More accurate logging and metrics for sent, retried, and blacklisted messages.
  • Documentation

    • Updated example request to reflect a change in the "priority" field value.

@coderabbitai
Copy link

coderabbitai bot commented May 11, 2025

Walkthrough

The changes update the push notification system to track and report errors on a per-message basis, introducing a new error map return type for the Send method in multiple client implementations. The service layer now wraps events with retry logic and metrics, including blacklist handling, and the cache stores this new wrapper type. Event struct fields were encapsulated with getter methods. A minor documentation example was also updated.

Changes

File(s) Change Summary
internal/sms-gateway/modules/push/fcm/client.go
internal/sms-gateway/modules/push/upstream/client.go
Updated Send method signature to return a map of errors per address and a combined error. Refactored error handling accordingly. Added helper for error mapping in upstream client.
internal/sms-gateway/modules/push/service.go Added blacklist cache and Prometheus counters for retries and blacklist operations. Changed cache to store eventWrapper with event, token, and retry count. Updated enqueue and send logic for retries, blacklisting, and error handling.
internal/sms-gateway/modules/push/types.go Updated client interface Send method signature. Introduced eventWrapper struct and alias NewEvent for event creation.
internal/sms-gateway/modules/push/domain/events.go Made Event struct fields unexported and added getter methods. Updated constructor and Map() method accordingly.
internal/sms-gateway/modules/push/consts.go Added constants maxRetries = 3 and blacklistTimeout = 15 * time.Minute. Added types and constants for retry outcomes and blacklist operations.
internal/sms-gateway/handlers/upstream.go Replaced manual push.Event struct initialization with call to push.NewEvent. Changed Enqueue call to pass event by value.
pkg/swagger/docs/requests.http Changed the "priority" field value in a JSON request example from 128 to 127.

Sequence Diagram(s)

sequenceDiagram
    participant Service
    participant Cache
    participant Client
    participant Prometheus

    Service->>Cache: Store eventWrapper (event, token, retries=0)
    loop Periodic sendAll
        Service->>Cache: Retrieve all eventWrappers
        Service->>Service: Filter wrappers (retries < maxRetries)
        Service->>Client: Send(map[token]event)
        alt On failure
            Client-->>Service: (map[token]error, error)
            Service->>Cache: Increment retries for failed tokens
            Service->>Prometheus: Increment retriesCounter ("retried")
            alt retries >= maxRetries
                Service->>Cache: Blacklist token with TTL
                Service->>Prometheus: Increment blacklistCounter ("added")
            else
                Service->>Cache: Re-cache for retry
            end
        else On success
            Client-->>Service: (nil, nil)
        end
    end
Loading
sequenceDiagram
    participant Client
    participant ExternalAPI

    Client->>ExternalAPI: Send batched messages
    alt On error
        ExternalAPI-->>Client: Error response
        Client->>Client: Map each address to error
        Client-->>Caller: (map[address]error, error)
    else On success
        ExternalAPI-->>Client: Success response
        Client-->>Caller: (nil, nil)
    end
Loading

Tip

⚡️ Faster reviews with caching
  • CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9012e67 and d69838a.

📒 Files selected for processing (8)
  • internal/sms-gateway/handlers/upstream.go (1 hunks)
  • internal/sms-gateway/modules/push/consts.go (1 hunks)
  • internal/sms-gateway/modules/push/domain/events.go (1 hunks)
  • internal/sms-gateway/modules/push/fcm/client.go (2 hunks)
  • internal/sms-gateway/modules/push/service.go (5 hunks)
  • internal/sms-gateway/modules/push/types.go (1 hunks)
  • internal/sms-gateway/modules/push/upstream/client.go (3 hunks)
  • pkg/swagger/docs/requests.http (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • pkg/swagger/docs/requests.http
  • internal/sms-gateway/handlers/upstream.go
  • internal/sms-gateway/modules/push/types.go
  • internal/sms-gateway/modules/push/domain/events.go
  • internal/sms-gateway/modules/push/fcm/client.go
  • internal/sms-gateway/modules/push/service.go
  • internal/sms-gateway/modules/push/consts.go
  • internal/sms-gateway/modules/push/upstream/client.go
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint
  • GitHub Check: Analyze (go)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
internal/sms-gateway/modules/push/service.go (3)

64-70: Metric label values are open-ended; consider enumerating constants
The counter is declared with a single outcome label whose runtime values are string literals ("max_attempts", "retried").
Using typed constants (or iota enum) keeps label values consistent and avoids accidental typos when the codebase grows.


100-109: Unnecessary struct copy – pass the event by pointer
eventWrapper stores event by value, so the whole domain.Event is copied on every enqueue/retry.
If domain.Event contains large payloads (which FCM messages often do), this will add GC pressure.

-type eventWrapper struct {
-    token   string
-    event   domain.Event
-    retries int
-}
+type eventWrapper struct {
+    token   string
+    event   *domain.Event
+    retries int
+}

Remember to dereference accordingly in the rest of the file.


122-131: Magic constant for max retries
3 is scattered in the filtering logic. Defining a const maxRetries = 3 (or exposing it via Config) improves readability and allows future tuning without code changes.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e457a16 and faf5974.

📒 Files selected for processing (5)
  • internal/sms-gateway/modules/push/fcm/client.go (3 hunks)
  • internal/sms-gateway/modules/push/service.go (4 hunks)
  • internal/sms-gateway/modules/push/types.go (1 hunks)
  • internal/sms-gateway/modules/push/upstream/client.go (4 hunks)
  • pkg/swagger/docs/requests.http (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
internal/sms-gateway/modules/push/types.go (1)
internal/sms-gateway/modules/push/domain/events.go (1)
  • Event (9-12)
internal/sms-gateway/modules/push/upstream/client.go (2)
internal/sms-gateway/modules/push/types.go (1)
  • Event (12-12)
internal/sms-gateway/modules/push/domain/events.go (1)
  • Event (9-12)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Test
  • GitHub Check: E2E
  • GitHub Check: lint
  • GitHub Check: Analyze (go)
🔇 Additional comments (13)
pkg/swagger/docs/requests.http (1)

24-24: Priority value updated to align with new message handling.

The priority value for push messages has been decreased from 128 to 127, which aligns with the changes in the push notification system that now supports per-message error tracking and retries.

internal/sms-gateway/modules/push/types.go (2)

21-21: Updated interface signature to support per-message error reporting.

The Send method signature has been enhanced to return a map of errors keyed by message address (map[string]error) in addition to a combined error. This change enables tracking and reporting errors on a per-message basis, which is essential for implementing proper retry logic.


25-29: Well-structured wrapper type for event retry handling.

The new eventWrapper struct is a good addition that encapsulates all necessary information for tracking retry attempts. The struct contains the token (identifier), the event data, and a retry counter, providing a clean way to manage retries in the service layer.

internal/sms-gateway/modules/push/fcm/client.go (4)

7-8: Required imports added for error handling.

The maps and slices packages are appropriately added to support the new error handling approach that uses maps instead of slices for error collection.


58-60: Method signature updated to support granular error reporting.

The Send method now returns both a map of per-token errors and a combined error, which aligns with the updated interface in types.go. The error map is properly initialized with the expected capacity.


70-70: Error collection now maps errors to specific tokens.

Errors are now stored in a map with the token address as the key, enabling callers to identify which specific messages failed and potentially retry only those messages.


74-74: Combined error handling properly implemented.

The implementation correctly returns both the detailed error map and a combined error using errors.Join. The use of slices.Collect(maps.Values(errs)) is an elegant way to convert the map values to a slice for joining.

internal/sms-gateway/modules/push/upstream/client.go (5)

14-14: External map helper package imported.

The code uses a custom map helper package from github.com/capcom6/go-helpers/maps. Ensure this dependency is properly managed in your project.

Can you verify that this dependency is properly tracked in your project's dependency management system (like go.mod)?


45-45: Method signature updated to support granular error reporting.

The Send method now returns both a map of per-token errors and a combined error, which aligns with the updated interface in types.go and with the FCM client implementation.


59-60: Consistent error handling with helper method.

All error cases now use the sendError helper method to generate a consistent error response format, which improves code readability and maintainability.

Also applies to: 64-65, 72-73, 81-82


84-85: Clear success case handling.

On success, the method correctly returns nil, nil to indicate no errors occurred, which is consistent with the new return type.


87-91: Elegant helper method for error mapping.

The sendError method efficiently maps the same error to all message addresses using the helper function maps.MapValues. This is a clean way to generate the error map while maintaining the original error for the combined error return.

internal/sms-gateway/modules/push/service.go (1)

41-43: Confirm cache.Cache is truly concurrent-safe
Enqueue can be called by many goroutines while the ticker loop concurrently calls sendAll → Drain.
If cache.Cache is not explicitly documented as goroutine-safe for all exposed operations ( Set, SetOrFail, Drain ), you risk hard-to-reproduce data races.
Please double-check the implementation or guard access with a mutex.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/sms-gateway/modules/push/service.go (1)

131-140: Fixed retry logic when handling errors.

The error handling now correctly checks both the map of per-token errors and the general error, addressing previous issues where retry logic was skipped when the general error was nil.

🧹 Nitpick comments (3)
internal/sms-gateway/modules/push/fcm/client.go (1)

71-71: Consider aggregating errors for non-nil general error.

When multiple messages fail, the function returns a map of errors but still returns nil for the general error. Consider returning a meaningful aggregate error when the error map is not empty to make error checking more consistent.

-	return errs, nil
+	if len(errs) > 0 {
+		return errs, fmt.Errorf("failed to send %d messages", len(errs))
+	}
+	return errs, nil
internal/sms-gateway/modules/push/service.go (2)

137-140: Consider handling individual errors even with general failure.

The function returns early if there's a general error, potentially ignoring per-token failures that could be retried. Consider handling both the general error and processing the error map in such cases.

 	if err != nil {
 		s.logger.Error("Can't send messages", zap.Error(err))
-		return
+		// Continue to process individual errors even in case of general failure
 	}

127-127: Add logging for retry statistics.

Consider adding logging for the number of messages that failed and will be retried at the end of processing, to give a clearer picture of the system's behavior.

 	s.logger.Info("Sending messages", zap.Int("count", len(messages)))
+
+	// After line 159, add:
+	s.logger.Info("Retry statistics", 
+		zap.Int("failed", len(errs)), 
+		zap.Int("retried", len(errs) - countMaxAttemptsExceeded))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faf5974 and bab9aaf.

📒 Files selected for processing (7)
  • internal/sms-gateway/handlers/upstream.go (1 hunks)
  • internal/sms-gateway/modules/push/consts.go (1 hunks)
  • internal/sms-gateway/modules/push/domain/events.go (1 hunks)
  • internal/sms-gateway/modules/push/fcm/client.go (2 hunks)
  • internal/sms-gateway/modules/push/service.go (5 hunks)
  • internal/sms-gateway/modules/push/types.go (1 hunks)
  • internal/sms-gateway/modules/push/upstream/client.go (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • internal/sms-gateway/modules/push/consts.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/sms-gateway/modules/push/types.go
  • internal/sms-gateway/modules/push/upstream/client.go
🧰 Additional context used
🧬 Code Graph Analysis (4)
internal/sms-gateway/handlers/upstream.go (2)
internal/sms-gateway/modules/push/domain/events.go (2)
  • NewEvent (31-36)
  • Event (9-12)
internal/sms-gateway/modules/push/types.go (2)
  • NewEvent (14-14)
  • Event (12-12)
internal/sms-gateway/modules/push/fcm/client.go (2)
internal/sms-gateway/modules/push/domain/events.go (1)
  • Event (9-12)
internal/sms-gateway/modules/push/types.go (1)
  • Event (12-12)
internal/sms-gateway/modules/push/domain/events.go (1)
internal/sms-gateway/modules/push/types.go (2)
  • Event (12-12)
  • NewEvent (14-14)
internal/sms-gateway/modules/push/service.go (2)
internal/sms-gateway/modules/push/domain/events.go (1)
  • Event (9-12)
internal/sms-gateway/modules/push/types.go (3)
  • Event (12-12)
  • RetryOutcomeMaxAttempts (30-30)
  • RetryOutcomeRetried (31-31)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Build / Docker image (linux/amd64)
  • GitHub Check: Build / Docker image (linux/arm64)
  • GitHub Check: Analyze (go)
🔇 Additional comments (12)
internal/sms-gateway/handlers/upstream.go (2)

71-74: Good implementation of the updated Event constructor.

The code now uses the push.NewEvent constructor instead of direct struct initialization, which properly aligns with the encapsulation changes in the Event struct where fields were made unexported.


76-76: Method signature update correctly implemented.

The Enqueue method now accepts an event value directly rather than a pointer, matching the updated service interface.

internal/sms-gateway/modules/push/fcm/client.go (2)

55-56: Method signature improvement for granular error handling.

The Send method signature has been updated to return (map[string]error, error) instead of just error, providing per-message error reporting which will enable more effective retry mechanisms.


67-68: Per-token error tracking implemented correctly.

Errors are now properly stored in a map keyed by token address, which allows for more precise error handling and retry logic at the service level.

internal/sms-gateway/modules/push/domain/events.go (3)

9-12: Good encapsulation of Event struct fields.

The Event and Data fields have been properly changed from exported to unexported, improving encapsulation and preventing unintended modifications.


14-20: Appropriate getter methods added.

Clean implementation of getter methods for the now-unexported fields, maintaining the same external API while improving internal structure.


22-28: Map method correctly updated to use unexported fields.

The Map() method has been properly updated to access the unexported fields directly.

internal/sms-gateway/modules/push/service.go (5)

42-45: Good enhancement with event wrapper and metrics.

The cache now stores the eventWrapper type instead of domain.Event, and a new Prometheus counter for tracking retries has been added. These changes properly support the retry functionality.


65-70: Well-implemented retry metrics.

The new Prometheus counter vector for tracking retry outcomes with appropriate labels is a good addition for monitoring the system's retry behavior.


101-107: Properly implemented event wrapping.

The event is now correctly wrapped with the token and a retry counter initialized to zero, which will enable tracking retry attempts per message.


123-126: Good use of maps helper for transformation.

The maps.MapValues function is used effectively to transform the event wrappers into domain events for sending.


142-159: Well-implemented retry mechanism.

The retry logic now properly:

  1. Increments retry counts for failed messages
  2. Checks against maximum retry limits
  3. Logs appropriate messages for different outcomes
  4. Updates metrics to track retry attempts by outcome
  5. Re-queues messages that haven't reached the retry limit

This is a significant improvement for handling transient failures.

@capcom6 capcom6 force-pushed the push/add-push-send-retries branch from bab9aaf to cac472a Compare May 11, 2025 05:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cac472a and 593a26d.

📒 Files selected for processing (3)
  • internal/sms-gateway/modules/push/consts.go (1 hunks)
  • internal/sms-gateway/modules/push/service.go (5 hunks)
  • internal/sms-gateway/modules/push/types.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/sms-gateway/modules/push/types.go
🧰 Additional context used
🪛 golangci-lint (1.64.8)
internal/sms-gateway/modules/push/service.go

169-169: Error return value of s.blacklist.Set is not checked

(errcheck)

🪛 GitHub Check: lint
internal/sms-gateway/modules/push/service.go

[failure] 169-169:
Error return value of s.blacklist.Set is not checked (errcheck)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Build / Docker image (linux/amd64)
  • GitHub Check: Build / Docker image (linux/arm64)
  • GitHub Check: Analyze (go)
🔇 Additional comments (11)
internal/sms-gateway/modules/push/service.go (8)

42-44: Good addition of blacklist cache alongside wrapper cache

The introduction of a separate blacklist cache with structured empty values is a clean approach to token management. This allows for efficient checking of problematic tokens before attempting to send messages.


67-80: Well-structured metrics for tracking retry and blacklist operations

The new Prometheus counters provide valuable insights into retry attempts and blacklist operations, which will help with monitoring and troubleshooting the push notification system.


85-88: Good TTL configuration for blacklist cache

Setting a TTL for blacklisted tokens is an excellent approach as it allows tokens to be "rehabilitated" after a certain period, preventing permanent blacklisting of potentially valid tokens.


115-119: Efficient early blacklist check in Enqueue

Checking the blacklist early in the Enqueue method prevents unnecessary processing and caching of events for known problematic tokens. The logging and metrics increment provide good visibility.


121-126: Good encapsulation of event with retry metadata

Wrapping the event with retry metadata provides a clean way to track retry attempts for each token without modifying the original event structure.


143-145: Efficient mapping from wrappers to events

Using maps.MapValues is a clean and efficient way to extract the events from the wrappers for sending.


151-160: Improved error handling structure

The updated error handling properly separates general send errors from token-specific errors, with appropriate logging for each case.


162-183: Well-structured per-token error handling and retry logic

The per-token error handling with progressive retries and eventual blacklisting provides a robust mechanism for dealing with problematic tokens while giving them multiple chances to succeed.

🧰 Tools
🪛 golangci-lint (1.64.8)

169-169: Error return value of s.blacklist.Set is not checked

(errcheck)

🪛 GitHub Check: lint

[failure] 169-169:
Error return value of s.blacklist.Set is not checked (errcheck)

internal/sms-gateway/modules/push/consts.go (3)

5-8: Good use of named constants for configuration values

Using named constants for maxRetries and blacklistTimeout improves code readability and maintainability by centralizing these configuration values.


10-15: Well-defined typed constants for retry outcomes

Using a string-based enum type for retry outcomes improves type safety and makes the code more self-documenting. This is better than using raw strings directly in the code.


17-22: Well-defined typed constants for blacklist operations

Similar to the retry outcomes, using a string-based enum type for blacklist operations provides type safety and better code readability.

@capcom6 capcom6 force-pushed the push/add-push-send-retries branch from 2dbbbca to 9012e67 Compare May 12, 2025 07:45
@capcom6 capcom6 force-pushed the push/add-push-send-retries branch from 9012e67 to d69838a Compare May 12, 2025 22:57
@capcom6 capcom6 merged commit 4e6b4e7 into master May 13, 2025
10 checks passed
@capcom6 capcom6 deleted the push/add-push-send-retries branch May 13, 2025 09:48
@coderabbitai coderabbitai bot mentioned this pull request Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants