-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(alertmanager): add service for alertmanager #7136
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 7fd4e18 in 3 minutes and 48 seconds
More details
- Looked at
1012
lines of code in14
files - Skipped
0
files when reviewing. - Skipped posting
6
drafted comments based on config settings.
1. pkg/errors/code.go:9
-
Draft comment:
These error scenarios are already defined usingtyp
struct. Consider consolidating to avoid maintaining parallel error definitions. -
Error types constants block (type.go)
-
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment raises a valid point about parallel error definitions. However, there may be a deliberate design choice to have both 'type' and 'code' representations for errors. Without understanding the broader error handling system and how these two concepts differ in usage, suggesting consolidation could be premature. The comment also doesn't provide clear guidance on how to consolidate.
I might be missing important architectural context about why these parallel structures exist. Perhaps there's a good reason to have both type and code representations.
While the duplication is real, suggesting consolidation without understanding the architectural need for separation could lead to incorrect advice.
The comment should be deleted because it requires broader architectural context to determine if this is actually a problem that needs fixing.
2. pkg/alertmanager/internalalertmanager/provider.go:38
- Draft comment:
Consider buffering the syncC channel (e.g. make(chan struct{}, 1)) to prevent potential deadlock if no receiver is ready when ticker fires. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50%
The code shows a goroutine writing to syncC when ticker fires, but we don't see the consumer side. The unbuffered channel could potentially block if no receiver is ready, but the goroutine pattern used here is common and safe - the channel is returned before the goroutine writes to it, so a consumer should be ready. The Stop() method also properly cleans up by stopping the ticker and closing the channel.
I might be underestimating the risk - if the consumer isn't listening continuously, messages could be missed. Also, what if multiple ticks occur before consumption?
The current design appears intentional - using an unbuffered channel ensures synchronization between producer and consumer. If messages need to be buffered, that should be a higher-level architectural decision, not a code review comment.
While the suggestion isn't wrong, the current unbuffered channel implementation appears to be a valid design choice with proper cleanup. The comment is more of a design suggestion than a clear issue.
3. pkg/types/alertmanagertypes/channel.go:55
- Draft comment:
The reflection loop iterates over the receiver struct fields on every call. Consider caching reflection metadata to improve performance. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. pkg/types/alertmanagertypes/config.go:161
- Draft comment:
MD5 is being used for config hashing. Ensure this is acceptable given its use case (detecting changes) since MD5 is weak for security purposes. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. pkg/alertmanager/service.go:79
- Draft comment:
Empty server.Config is passed to server.New. Clarify if this is intentional or if the service-level config should be used to initialize the server. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The code appears to follow a two-step initialization pattern: 1) Create server with empty config 2) Set the actual config via SetConfig(). This seems intentional since the config is retrieved separately and error handling is done for both steps. The service-level config is used differently - for creating default configs when needed.
I could be wrong about the two-step initialization being intentional - maybe it's an oversight. The service.config field's purpose isn't completely clear from this file alone.
The consistent error handling and separation of server creation from config setting strongly suggests this is an intentional pattern. If it wasn't intended, the code would likely try to set the config in one step.
The comment should be deleted as the empty config appears to be intentional design, with config being set in a separate step right after server creation.
6. pkg/types/alertmanagertypes/channel.go:56
- Draft comment:
Reflection is used in NewChannelFromReceiver to inspect receiver fields. Consider potential performance implications if receivers have many fields or if this function is called frequently. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
Workflow ID: wflow_U8qoyKc3fag4A4C6
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
7fd4e18
to
00ddc81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 00ddc81 in 1 minute and 18 seconds
More details
- Looked at
30
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. pkg/sqlstore/sqlitesqlstore/provider.go:40
- Draft comment:
Removed inline log query hook addition. Confirm that sqlstore.NewBunDB with hooks supplies equivalent logging functionality. This removal may affect query-level logging if not properly handled. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. pkg/sqlstore/sqlitesqlstore/provider.go:3
- Draft comment:
Removed unused import for slog. Ensure that any required logging functionality is provided through the settings.Logger instance passed in via configuration. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. pkg/sqlstore/sqlitesqlstore/provider.go:44
- Draft comment:
Removed explicit creation of bun.DB and the addition of a logging query hook (via sqlstorehook.NewLogging). Please ensure that sqlstore.NewBunDB properly registers all necessary hooks (including any logging/tracing) so that query diagnostics aren’t lost. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. pkg/sqlstore/sqlitesqlstore/provider.go:3
- Draft comment:
The imports for 'log/slog' and 'sqlstorehook' have been removed. Confirm that these were solely used for the logging hook functionality, and that the functionality is now fully handled via the hooks passed to sqlstore.NewBunDB. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_JyFOCkCbDFJDfur6
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 831d81e in 2 minutes and 4 seconds
More details
- Looked at
415
lines of code in5
files - Skipped
0
files when reviewing. - Skipped posting
8
drafted comments based on config settings.
1. pkg/alertmanager/alertmanagerstore/sqlalertmanagerstore/state.go:49
- Draft comment:
Consider logging or handling the error from tx.Rollback (even if using nolint) for completeness. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
2. pkg/alertmanager/server/server.go:140
- Draft comment:
Similar state persistence logic is duplicated in both silences and nflog maintenance routines. Consider refactoring common code to improve maintainability. - Reason this comment was not posted:
Comment was on unchanged code.
3. pkg/types/alertmanagertypes/state.go:52
- Draft comment:
The switch in Set() uses base64.StdEncoding (and Get() too) while the previous version used RawStdEncoding. Ensure this encoding change is intentional and consistent across components. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
While encoding changes could be important, this comment is speculative without showing the previous code. It's asking for verification rather than pointing out a clear issue. It starts with "Ensure this..." which violates our rules. We can't verify if there's an actual problem without seeing the old code.
The encoding change could potentially cause compatibility issues with existing stored data. This might be a legitimate concern worth investigating.
However, we don't have evidence in the diff of the previous encoding type, and the comment is phrased as a verification request rather than pointing out a concrete issue. The author likely considered encoding compatibility when making these changes.
Delete the comment because it's speculative, asks for verification, and requires knowledge of code outside the diff to evaluate.
4. pkg/alertmanager/alertmanagerstore/sqlalertmanagerstore/state.go:51
- Draft comment:
Using an upsert with a WHERE clause and updating both ‘silences’ and ‘nflog’ may lead to race conditions when concurrent maintenance routines update only one field. Consider splitting updates or merging changes to avoid clobbering concurrent updates. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
While the comment raises an interesting point about concurrent updates, the new code actually uses a transaction and updates both fields atomically in a single upsert. This is generally safer than updating fields separately. The old code was more prone to race conditions by handling states separately. The new approach seems like an improvement.
I might be overlooking specific business requirements where silences and nflog need to be updated independently. The comment author may have domain knowledge about maintenance routines I'm not aware of.
Even if there are maintenance routines, the atomic update within a transaction is a sound approach that prevents partial updates. If independent updates are needed, that should be a separate feature request with clear requirements.
The comment should be deleted as it suggests a potential problem that isn't actually present - the new code uses transactions and atomic updates which is good practice for preventing race conditions.
5. pkg/alertmanager/server/server.go:131
- Draft comment:
The maintenance routines for silences and nflog both retrieve and update the same state concurrently. Consider centralizing or serializing state updates to prevent potential lost updates. - Reason this comment was not posted:
Marked as duplicate.
6. pkg/alertmanager/server/server.go:160
- Draft comment:
Similar to the silences maintenance, the nflog maintenance routine performs a read-modify-write on the same state object. This duplication increases the risk of race conditions. Consider refactoring to update fields independently. - Reason this comment was not posted:
Marked as duplicate.
7. pkg/types/alertmanagertypes/state.go:54
- Draft comment:
In the StoreableState.Set switch, no default case is provided. Consider adding a default branch to handle unknown state names, returning an error rather than silently doing nothing. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The StateName type is controlled within this package and only has two possible values. The switch statement handles both cases. Adding a default case would handle a situation that should never occur in practice since StateName values can only be created within this package. The current approach of silently doing nothing for an invalid state is actually consistent with Go's zero-value behavior.
I could be wrong about the encapsulation - maybe there's a way for external code to create StateName values that I'm not seeing. Also, defensive programming might still be valuable even for "impossible" cases.
Looking at the code again, StateName's only field is unexported ('name string'), so external packages cannot create new instances. The two package variables are the only possible values.
The comment should be deleted. The switch statement is correct as-is since it handles all possible StateName values, and the type system prevents creation of invalid states.
8. pkg/types/alertmanagertypes/state.go:52
- Draft comment:
Note: The implementation now uses base64.StdEncoding instead of RawStdEncoding. Ensure this encoding change is intentional and does not break backward compatibility with previously stored states. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_VdRdCPRcNhm3uH4K
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 1c69201 in 1 minute and 33 seconds
More details
- Looked at
168
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
9
drafted comments based on config settings.
1. pkg/alertmanager/alertmanager.go:16
- Draft comment:
Interface now extends factory.Service. Ensure all required methods from factory.Service are implemented by service. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. pkg/alertmanager/internalalertmanager/provider.go:41
- Draft comment:
Start method runs a blocking ticker loop. Ensure this is intended; consider running it in a separate goroutine if non-blocking startup is desired. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The blocking nature of Start() appears intentional based on the implementation. This is a common pattern for background services - Start() blocks and runs until Stop() is called via context cancellation. The previous implementation also blocked via the SyncFunc. The comment suggests running in a goroutine, but that would just push the responsibility of managing the goroutine to the caller.
I could be wrong about the intended usage pattern of this provider. Maybe callers expect Start() to return immediately.
The implementation follows a standard pattern for background services. If non-blocking behavior was needed, the previous implementation would have had issues too. The clean shutdown handling via context suggests this blocking behavior is intentional.
This comment should be deleted as it questions an intentional and common implementation pattern without strong evidence that it's problematic.
3. pkg/alertmanager/internalalertmanager/provider.go:27
- Draft comment:
Provider settings are scoped correctly. Ensure the naming convention used here ('go.signoz.io/signoz/pkg/alertmanager/internalalertmanager') remains consistent across the project. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
4. pkg/alertmanager/service.go:32
- Draft comment:
Removal of SyncFunc usage in Service construction simplifies startup. Verify that the new pattern of triggering SyncServers externally meets your synchronization requirements. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
5. pkg/alertmanager/alertmanager.go:16
- Draft comment:
Embedded factory.Service was added to the Alertmanager interface. Ensure that the implementing provider correctly satisfies all methods required by factory.Service. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
6. pkg/alertmanager/internalalertmanager/provider.go:41
- Draft comment:
The Start method contains an infinite loop with a ticker. Make sure this is run in a separate goroutine so it does not block main execution. - Reason this comment was not posted:
Marked as duplicate.
7. pkg/alertmanager/internalalertmanager/provider.go:49
- Draft comment:
SyncServers errors in the Start loop are currently only logged. Consider whether additional error handling (or metrics) is needed for repeated failures. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment raises a valid concern about error handling robustness. However, it's speculative ("Consider whether...") and doesn't definitively state that additional error handling is required. The current error logging approach might be intentional and sufficient for the use case. Without more context about SyncServers' failure modes and system requirements, we can't be certain additional handling is needed.
The comment identifies a potential improvement in error handling, which could be important for system reliability. Not having metrics for sync failures could make it harder to monitor system health.
While the concern is valid, the comment is too speculative and doesn't provide clear direction. It asks the author to "consider" something rather than making a specific recommendation.
The comment should be removed because it's speculative and doesn't provide clear, actionable guidance. It violates the rule against asking authors to verify or consider things.
8. pkg/alertmanager/service.go:52
- Draft comment:
In SyncServers, the mutex is manually unlocked after processing all orgIDs. Consider using 'defer' immediately after Lock to ensure the mutex is released even if an early return or panic occurs. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. pkg/alertmanager/service.go:60
- Draft comment:
Creation of new servers and calling SetConfig occurs while holding the mutex. If these operations are time-consuming, consider performing them outside the critical section to reduce lock contention. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_mIO4cCOFHWe94AHm
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Summary
adds an alertmanager service
Related Issues / PR's
Screenshots
NA
Affected Areas and Manually Tested Areas
Important
Add Alertmanager service with configuration and state management, and API endpoints for alert handling.
Alertmanager
interface inalertmanager.go
with methodsGetAlerts
,PutAlerts
, andTestReceiver
.Service
struct inservice.go
to manage alertmanager servers per organization.SyncServers
to synchronize alertmanager servers for all organizations.ConfigStore
interface andconfig
struct insqlalertmanagerstore/config.go
for managing alertmanager configurations.Get
,Set
, andListOrgs
methods for configuration management.StateStore
interface andstate
struct insqlalertmanagerstore/state.go
for managing alertmanager states.Get
andSet
methods for state management.API
struct inapi.go
with methodsGetAlerts
,TestReceiver
,GetChannels
,GetChannelByID
,UpdateChannelByID
, andDeleteChannelByID
.errors/code.go
for various error scenarios.server.go
to use new state management methods for silences and notification logs.This description was created by
for 1c69201. It will automatically update as commits are pushed.