-
Notifications
You must be signed in to change notification settings - Fork 27
[push] add send retries #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes update the push notification system to track and report errors on a per-message basis, introducing a new error map return type for the Changes
Sequence Diagram(s)sequenceDiagram
participant Service
participant Cache
participant Client
participant Prometheus
Service->>Cache: Store eventWrapper (event, token, retries=0)
loop Periodic sendAll
Service->>Cache: Retrieve all eventWrappers
Service->>Service: Filter wrappers (retries < maxRetries)
Service->>Client: Send(map[token]event)
alt On failure
Client-->>Service: (map[token]error, error)
Service->>Cache: Increment retries for failed tokens
Service->>Prometheus: Increment retriesCounter ("retried")
alt retries >= maxRetries
Service->>Cache: Blacklist token with TTL
Service->>Prometheus: Increment blacklistCounter ("added")
else
Service->>Cache: Re-cache for retry
end
else On success
Client-->>Service: (nil, nil)
end
end
sequenceDiagram
participant Client
participant ExternalAPI
Client->>ExternalAPI: Send batched messages
alt On error
ExternalAPI-->>Client: Error response
Client->>Client: Map each address to error
Client-->>Caller: (map[address]error, error)
else On success
ExternalAPI-->>Client: Success response
Client-->>Caller: (nil, nil)
end
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (8)
🚧 Files skipped from review as they are similar to previous changes (8)
⏰ Context from checks skipped due to timeout of 90000ms (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/sms-gateway/modules/push/service.go (3)
64-70: Metric label values are open-ended; consider enumerating constants
The counter is declared with a singleoutcomelabel whose runtime values are string literals ("max_attempts","retried").
Using typed constants (or iota enum) keeps label values consistent and avoids accidental typos when the codebase grows.
100-109: Unnecessary struct copy – pass the event by pointer
eventWrapperstoreseventby value, so the wholedomain.Eventis copied on every enqueue/retry.
Ifdomain.Eventcontains large payloads (which FCM messages often do), this will add GC pressure.-type eventWrapper struct { - token string - event domain.Event - retries int -} +type eventWrapper struct { + token string + event *domain.Event + retries int +}Remember to dereference accordingly in the rest of the file.
122-131: Magic constant for max retries
3is scattered in the filtering logic. Defining aconst maxRetries = 3(or exposing it viaConfig) improves readability and allows future tuning without code changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
internal/sms-gateway/modules/push/fcm/client.go(3 hunks)internal/sms-gateway/modules/push/service.go(4 hunks)internal/sms-gateway/modules/push/types.go(1 hunks)internal/sms-gateway/modules/push/upstream/client.go(4 hunks)pkg/swagger/docs/requests.http(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
internal/sms-gateway/modules/push/types.go (1)
internal/sms-gateway/modules/push/domain/events.go (1)
Event(9-12)
internal/sms-gateway/modules/push/upstream/client.go (2)
internal/sms-gateway/modules/push/types.go (1)
Event(12-12)internal/sms-gateway/modules/push/domain/events.go (1)
Event(9-12)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Test
- GitHub Check: E2E
- GitHub Check: lint
- GitHub Check: Analyze (go)
🔇 Additional comments (13)
pkg/swagger/docs/requests.http (1)
24-24: Priority value updated to align with new message handling.The priority value for push messages has been decreased from 128 to 127, which aligns with the changes in the push notification system that now supports per-message error tracking and retries.
internal/sms-gateway/modules/push/types.go (2)
21-21: Updated interface signature to support per-message error reporting.The
Sendmethod signature has been enhanced to return a map of errors keyed by message address (map[string]error) in addition to a combined error. This change enables tracking and reporting errors on a per-message basis, which is essential for implementing proper retry logic.
25-29: Well-structured wrapper type for event retry handling.The new
eventWrapperstruct is a good addition that encapsulates all necessary information for tracking retry attempts. The struct contains the token (identifier), the event data, and a retry counter, providing a clean way to manage retries in the service layer.internal/sms-gateway/modules/push/fcm/client.go (4)
7-8: Required imports added for error handling.The
mapsandslicespackages are appropriately added to support the new error handling approach that uses maps instead of slices for error collection.
58-60: Method signature updated to support granular error reporting.The
Sendmethod now returns both a map of per-token errors and a combined error, which aligns with the updated interface intypes.go. The error map is properly initialized with the expected capacity.
70-70: Error collection now maps errors to specific tokens.Errors are now stored in a map with the token address as the key, enabling callers to identify which specific messages failed and potentially retry only those messages.
74-74: Combined error handling properly implemented.The implementation correctly returns both the detailed error map and a combined error using
errors.Join. The use ofslices.Collect(maps.Values(errs))is an elegant way to convert the map values to a slice for joining.internal/sms-gateway/modules/push/upstream/client.go (5)
14-14: External map helper package imported.The code uses a custom map helper package from
github.com/capcom6/go-helpers/maps. Ensure this dependency is properly managed in your project.Can you verify that this dependency is properly tracked in your project's dependency management system (like
go.mod)?
45-45: Method signature updated to support granular error reporting.The
Sendmethod now returns both a map of per-token errors and a combined error, which aligns with the updated interface intypes.goand with the FCM client implementation.
59-60: Consistent error handling with helper method.All error cases now use the
sendErrorhelper method to generate a consistent error response format, which improves code readability and maintainability.Also applies to: 64-65, 72-73, 81-82
84-85: Clear success case handling.On success, the method correctly returns
nil, nilto indicate no errors occurred, which is consistent with the new return type.
87-91: Elegant helper method for error mapping.The
sendErrormethod efficiently maps the same error to all message addresses using the helper functionmaps.MapValues. This is a clean way to generate the error map while maintaining the original error for the combined error return.internal/sms-gateway/modules/push/service.go (1)
41-43: Confirmcache.Cacheis truly concurrent-safe
Enqueuecan be called by many goroutines while the ticker loop concurrently callssendAll → Drain.
Ifcache.Cacheis not explicitly documented as goroutine-safe for all exposed operations (Set,SetOrFail,Drain), you risk hard-to-reproduce data races.
Please double-check the implementation or guard access with a mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
internal/sms-gateway/modules/push/service.go (1)
131-140: Fixed retry logic when handling errors.The error handling now correctly checks both the map of per-token errors and the general error, addressing previous issues where retry logic was skipped when the general error was nil.
🧹 Nitpick comments (3)
internal/sms-gateway/modules/push/fcm/client.go (1)
71-71: Consider aggregating errors for non-nil general error.When multiple messages fail, the function returns a map of errors but still returns nil for the general error. Consider returning a meaningful aggregate error when the error map is not empty to make error checking more consistent.
- return errs, nil + if len(errs) > 0 { + return errs, fmt.Errorf("failed to send %d messages", len(errs)) + } + return errs, nilinternal/sms-gateway/modules/push/service.go (2)
137-140: Consider handling individual errors even with general failure.The function returns early if there's a general error, potentially ignoring per-token failures that could be retried. Consider handling both the general error and processing the error map in such cases.
if err != nil { s.logger.Error("Can't send messages", zap.Error(err)) - return + // Continue to process individual errors even in case of general failure }
127-127: Add logging for retry statistics.Consider adding logging for the number of messages that failed and will be retried at the end of processing, to give a clearer picture of the system's behavior.
s.logger.Info("Sending messages", zap.Int("count", len(messages))) + + // After line 159, add: + s.logger.Info("Retry statistics", + zap.Int("failed", len(errs)), + zap.Int("retried", len(errs) - countMaxAttemptsExceeded))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
internal/sms-gateway/handlers/upstream.go(1 hunks)internal/sms-gateway/modules/push/consts.go(1 hunks)internal/sms-gateway/modules/push/domain/events.go(1 hunks)internal/sms-gateway/modules/push/fcm/client.go(2 hunks)internal/sms-gateway/modules/push/service.go(5 hunks)internal/sms-gateway/modules/push/types.go(1 hunks)internal/sms-gateway/modules/push/upstream/client.go(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/sms-gateway/modules/push/consts.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/sms-gateway/modules/push/types.go
- internal/sms-gateway/modules/push/upstream/client.go
🧰 Additional context used
🧬 Code Graph Analysis (4)
internal/sms-gateway/handlers/upstream.go (2)
internal/sms-gateway/modules/push/domain/events.go (2)
NewEvent(31-36)Event(9-12)internal/sms-gateway/modules/push/types.go (2)
NewEvent(14-14)Event(12-12)
internal/sms-gateway/modules/push/fcm/client.go (2)
internal/sms-gateway/modules/push/domain/events.go (1)
Event(9-12)internal/sms-gateway/modules/push/types.go (1)
Event(12-12)
internal/sms-gateway/modules/push/domain/events.go (1)
internal/sms-gateway/modules/push/types.go (2)
Event(12-12)NewEvent(14-14)
internal/sms-gateway/modules/push/service.go (2)
internal/sms-gateway/modules/push/domain/events.go (1)
Event(9-12)internal/sms-gateway/modules/push/types.go (3)
Event(12-12)RetryOutcomeMaxAttempts(30-30)RetryOutcomeRetried(31-31)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Build / Docker image (linux/amd64)
- GitHub Check: Build / Docker image (linux/arm64)
- GitHub Check: Analyze (go)
🔇 Additional comments (12)
internal/sms-gateway/handlers/upstream.go (2)
71-74: Good implementation of the updated Event constructor.The code now uses the
push.NewEventconstructor instead of direct struct initialization, which properly aligns with the encapsulation changes in theEventstruct where fields were made unexported.
76-76: Method signature update correctly implemented.The
Enqueuemethod now accepts an event value directly rather than a pointer, matching the updated service interface.internal/sms-gateway/modules/push/fcm/client.go (2)
55-56: Method signature improvement for granular error handling.The
Sendmethod signature has been updated to return(map[string]error, error)instead of justerror, providing per-message error reporting which will enable more effective retry mechanisms.
67-68: Per-token error tracking implemented correctly.Errors are now properly stored in a map keyed by token address, which allows for more precise error handling and retry logic at the service level.
internal/sms-gateway/modules/push/domain/events.go (3)
9-12: Good encapsulation of Event struct fields.The
EventandDatafields have been properly changed from exported to unexported, improving encapsulation and preventing unintended modifications.
14-20: Appropriate getter methods added.Clean implementation of getter methods for the now-unexported fields, maintaining the same external API while improving internal structure.
22-28: Map method correctly updated to use unexported fields.The
Map()method has been properly updated to access the unexported fields directly.internal/sms-gateway/modules/push/service.go (5)
42-45: Good enhancement with event wrapper and metrics.The cache now stores the
eventWrappertype instead ofdomain.Event, and a new Prometheus counter for tracking retries has been added. These changes properly support the retry functionality.
65-70: Well-implemented retry metrics.The new Prometheus counter vector for tracking retry outcomes with appropriate labels is a good addition for monitoring the system's retry behavior.
101-107: Properly implemented event wrapping.The event is now correctly wrapped with the token and a retry counter initialized to zero, which will enable tracking retry attempts per message.
123-126: Good use of maps helper for transformation.The
maps.MapValuesfunction is used effectively to transform the event wrappers into domain events for sending.
142-159: Well-implemented retry mechanism.The retry logic now properly:
- Increments retry counts for failed messages
- Checks against maximum retry limits
- Logs appropriate messages for different outcomes
- Updates metrics to track retry attempts by outcome
- Re-queues messages that haven't reached the retry limit
This is a significant improvement for handling transient failures.
bab9aaf to
cac472a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
internal/sms-gateway/modules/push/consts.go(1 hunks)internal/sms-gateway/modules/push/service.go(5 hunks)internal/sms-gateway/modules/push/types.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/sms-gateway/modules/push/types.go
🧰 Additional context used
🪛 golangci-lint (1.64.8)
internal/sms-gateway/modules/push/service.go
169-169: Error return value of s.blacklist.Set is not checked
(errcheck)
🪛 GitHub Check: lint
internal/sms-gateway/modules/push/service.go
[failure] 169-169:
Error return value of s.blacklist.Set is not checked (errcheck)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Build / Docker image (linux/amd64)
- GitHub Check: Build / Docker image (linux/arm64)
- GitHub Check: Analyze (go)
🔇 Additional comments (11)
internal/sms-gateway/modules/push/service.go (8)
42-44: Good addition of blacklist cache alongside wrapper cacheThe introduction of a separate blacklist cache with structured empty values is a clean approach to token management. This allows for efficient checking of problematic tokens before attempting to send messages.
67-80: Well-structured metrics for tracking retry and blacklist operationsThe new Prometheus counters provide valuable insights into retry attempts and blacklist operations, which will help with monitoring and troubleshooting the push notification system.
85-88: Good TTL configuration for blacklist cacheSetting a TTL for blacklisted tokens is an excellent approach as it allows tokens to be "rehabilitated" after a certain period, preventing permanent blacklisting of potentially valid tokens.
115-119: Efficient early blacklist check in EnqueueChecking the blacklist early in the Enqueue method prevents unnecessary processing and caching of events for known problematic tokens. The logging and metrics increment provide good visibility.
121-126: Good encapsulation of event with retry metadataWrapping the event with retry metadata provides a clean way to track retry attempts for each token without modifying the original event structure.
143-145: Efficient mapping from wrappers to eventsUsing
maps.MapValuesis a clean and efficient way to extract the events from the wrappers for sending.
151-160: Improved error handling structureThe updated error handling properly separates general send errors from token-specific errors, with appropriate logging for each case.
162-183: Well-structured per-token error handling and retry logicThe per-token error handling with progressive retries and eventual blacklisting provides a robust mechanism for dealing with problematic tokens while giving them multiple chances to succeed.
🧰 Tools
🪛 golangci-lint (1.64.8)
169-169: Error return value of
s.blacklist.Setis not checked(errcheck)
🪛 GitHub Check: lint
[failure] 169-169:
Error return value ofs.blacklist.Setis not checked (errcheck)internal/sms-gateway/modules/push/consts.go (3)
5-8: Good use of named constants for configuration valuesUsing named constants for
maxRetriesandblacklistTimeoutimproves code readability and maintainability by centralizing these configuration values.
10-15: Well-defined typed constants for retry outcomesUsing a string-based enum type for retry outcomes improves type safety and makes the code more self-documenting. This is better than using raw strings directly in the code.
17-22: Well-defined typed constants for blacklist operationsSimilar to the retry outcomes, using a string-based enum type for blacklist operations provides type safety and better code readability.
2dbbbca to
9012e67
Compare
9012e67 to
d69838a
Compare
Summary by CodeRabbit
New Features
Bug Fixes
Documentation