-
Notifications
You must be signed in to change notification settings - Fork 820
Add Alertmanager Integration Tests and Static File Backend #1686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Alertmanager Integration Tests and Static File Backend #1686
Conversation
Where does the new protobuf come in? |
I chose to use a protobuf for the polling interface boundary to provide a easy to use storage format that wasn't yaml based and easy to version. This would make any additional backend based on cloud provider object stores much easier to add, while not reducing any of the current functionality. |
eaa9211
to
b32dbb3
Compare
d079fa6
to
e574088
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First time I'm seeing the multi-tenant alertmanager and config DB, so I have have miss something. I left some comments. All in all, looks doing it job. I left several comments to polish it up a bit (especially from the user perspective), plus I think we're not correctly handling the case an user config is removed (see comment in multitenant.go
).
pkg/alertmanager/storage.go
Outdated
// AlertStoreConfig configures the alertmanager backend | ||
type AlertStoreConfig struct { | ||
Type string `yaml:"type"` | ||
ConfigDB client.Config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts:
- I would suggest to make the YAML field name explict. Given the
type
could befile
orconfigdb
, we may call these fields with the same name (to keep it specular), soyaml:"configdb"
andyaml:"file"
. However,file
is a very bad name because very generic. - I've seen you've also used "local" sometimes in the code: maybe it's a better name for
file
? - Is really a value naming the DB type
configdb
? Isn't just adb
the backend storage?
What's your take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think local is a better fit. It's what we use to designate the boltdb backend for chunks
pkg/alertmanager/storage.go
Outdated
cfg.File.RegisterFlags(f) | ||
cfg.ConfigDB.RegisterFlagsWithPrefix("alertmanager.", f) | ||
f.StringVar(&cfg.Type, "alertmanager.storage.type", "configdb", "Method to use for backend rule storage (configdb, file)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI flags should have all the same prefix, then:
.type
.<type>.<option>
(ie..file.path
)
Right now there's a bit of inconsistency:
-alertmanager.storage.type string
-alertmanager.storage.local.path string
-alertmanager.configs.auto-webhook-root string
-alertmanager.configs.client-timeout duration
-alertmanager.configs.fallback string
-alertmanager.configs.poll-interval duration
-alertmanager.configs.url value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything related to backend config storage should have a alertmanager.storage.
. However, I wanted to avoid making a breaking change in this PR and updating the prefix for alertmanager.configs
to alertmanager.storage.configs
would force users to change their flags. I can add a TODO to make that change in the future and go throough a change/deprecation process in a future PR?
pkg/alertmanager/storage.go
Outdated
case "file": | ||
return local.NewFileAlertStore(cfg.File) | ||
default: | ||
return nil, fmt.Errorf("Unrecognized rule storage mode %v, choose one of: configdb, file", cfg.Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mode
>type
given we call like this in the configrule
> replacing it withalertmanager
is more clear to the final user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, a lot of this code is based on the Ruler.
@@ -0,0 +1,71 @@ | |||
package local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to name the package and file based on the content. For example, local/alert_client.go
makes me think about "local" storage (while it's named file
) and "client" is actually code "store" in the code.
|
||
// RegisterFlags registers flags related to the alertmanager file store | ||
func (cfg *FileAlertStoreConfig) RegisterFlags(f *flag.FlagSet) { | ||
f.StringVar(&cfg.Path, "alertmanager.storage.local.path", "/etc/cortex/alertmanager_configs/", "Path at which alertmanager configurations are stored.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is /etc/cortex/alertmanager_configs/
a good default? We don't have any default filepath in that location. Looks more inherited by our cluster setup, than picked to be a good default to me.
pkg/alertmanager/alerts/compat.go
Outdated
package alerts | ||
|
||
// ToProto con | ||
func ToProto(cfg string, templates map[string]string, user string) (*AlertConfigDesc, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who uses it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed in this PR, I'll remove this file
pkg/alertmanager/alerts/compat.go
Outdated
} | ||
|
||
// ParseTemplates returns a alertmanager config object | ||
func ParseTemplates(cfg *AlertConfigDesc) map[string]string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who uses it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed in this PR, I'll remove this file
4f81964
to
65b264b
Compare
@pracucci this should be good for a second look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @jtlisi! I think it's in a pretty good shape. I've left few nits, which should be quick to address. I haven't manually tested it, please don't forget to test all features with the latest version of the code 🙏
# The configstore_config configures the config database storing rules and | ||
# alerts, and is used by the Cortex alertmanager. | ||
# The CLI flags prefix for this block config is: alertmanager | ||
[config_store: <configstore_config>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an entry to CHANGELOG.md
explaning all changes (this is a config breaking change).
pkg/cortex/modules.go
Outdated
@@ -446,7 +446,7 @@ func (t *Cortex) stopConfigs() error { | |||
} | |||
|
|||
func (t *Cortex) initAlertmanager(cfg *Config) (err error) { | |||
t.alertmanager, err = alertmanager.NewMultitenantAlertmanager(&cfg.Alertmanager, cfg.ConfigStore) | |||
t.alertmanager, err = alertmanager.NewMultitenantAlertmanager(&cfg.Alertmanager) | |||
if err != nil { | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We specify the err
variable in the function signature and utilize a naked return pattern throughout the modules.go file. I personally like to explicitly return the error so I'll update it.
pkg/alertmanager/multitenant.go
Outdated
if err != nil { | ||
level.Warn(util.Logger).Log("msg", "MultitenantAlertmanager: configs server poll failed", "err", err) | ||
level.Warn(util.Logger).Log("msg", "MultitenantAlertmanager: configs alert store poll failed", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This MultitenantAlertmanager:
prefix repeated across all logs looks a bit annoying. If you want to clean it up, you may pick the logger in input in NewMultitenantAlertmanager()
, wrap it with log.With(util.Logger, "component", "multiteant-alertmanager")
and then always use that internally.
Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use an embedded logger with a component field.
pkg/alertmanager/multitenant.go
Outdated
} | ||
|
||
func (am *MultitenantAlertmanager) addNewConfigs(cfgs map[string]configs.View) { | ||
func (am *MultitenantAlertmanager) syncConfigs(cfgs map[string]alerts.AlertConfigDesc) { | ||
// TODO: instrument how many configs we have, both valid & invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this TODO about? Doable in this PR too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke the totalConfigs
metric up to have a status
label
pkg/alertmanager/multitenant.go
Outdated
am.alertmanagersMtx.Unlock() | ||
} else if am.cfgs[userID].AlertmanagerConfig != config.AlertmanagerConfig || hasTemplateChanges { | ||
} else if am.cfgs[cfg.User].RawConfig != cfg.RawConfig || hasTemplateChanges { | ||
level.Debug(util.Logger).Log("msg", "MultitenantAlertmanager: updating new alertmanager tenant", "user", cfg.User) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be an Info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
pkg/alertmanager/alerts/store.go
Outdated
@@ -0,0 +1,64 @@ | |||
package alerts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being the configdb store, may make sense to move it to pkg/alertmanager/alerts/configdb/store.go
and just move the AlertStore
interface back into pkg/alertmanager/alerts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I will also move the pkg/alertmananger/local
package to be pkg/alertmanager/alerts/local
pkg/alertmanager/alerts/store.go
Outdated
} | ||
|
||
// ConfigAlertStore is a concrete implementation of RuleStore that sources rules from the config service | ||
type ConfigAlertStore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfigDBAlertStore
would be more appropriate? Just asking, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
pkg/alertmanager/alerts/store.go
Outdated
type ConfigAlertStore struct { | ||
configClient client.Client | ||
since configs.ID | ||
alertConfigs map[string]AlertConfigDesc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to retain it? Looking at ListAlertConfigs()
implementation looks not useful to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configs, err := c.configClient.GetAlerts(ctx, c.since)
Will only return configs that have IDs
greater than c.since
. This avoids the configdb having to do a full pull of it's table.
pkg/alertmanager/local/store.go
Outdated
return err | ||
} | ||
|
||
user := strings.TrimSuffix(info.Name(), ext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously mentioned, I would add a comment to explain that the filename must match the user ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't rectify this previously
pkg/alertmanager/local/store.go
Outdated
|
||
ext := filepath.Ext(info.Name()) | ||
|
||
if !info.IsDir() && (ext == ".yml" || ext == ".yaml") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously mentioned, I would reverse this into a guard. Looks easier to read the code, if we return nil
in case it's not a file or not a YAML extension.
I would also suggest to log with a Warn
any skipped entry, because we wouldn't expect any spurious file/directory within f.cfg.Path
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't rectify this previously
712b5e2
to
545835c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bash-based integration tests are no more a thing. I see two options:
- Strip away integration tests from this PR and eventually open a new PR for them
- Move these tests to the new Go integration tests
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
545835c
to
cbd8c18
Compare
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Closing in favor of #2125 |
Overview
Important Detail
pkg/alertmanager/multitenant_test.go
is run with!race
tag because of an underlying mild race condition in the Alertmanager.https://github.com/prometheus/alertmanager/issues/2182
Related
#1647
#1244
#1513
#619