Skip to content

Add Alertmanager Integration Tests and Static File Backend #1686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jtlisi
Copy link
Contributor

@jtlisi jtlisi commented Sep 20, 2019

Overview

  • Abstract the ConfigDB client behind the alert store interface
  • Add Integration tests for the Alertmanager that ensures the proper functionality of the v1 api
  • Add a static file config backend for the Alertmanager

Important Detail

  • pkg/alertmanager/multitenant_test.go is run with !race tag because of an underlying mild race condition in the Alertmanager. https://github.com/prometheus/alertmanager/issues/2182

Related

#1647
#1244
#1513
#619

@bboreham
Copy link
Contributor

Where does the new protobuf come in?

@jtlisi
Copy link
Contributor Author

jtlisi commented Sep 23, 2019

I chose to use a protobuf for the polling interface boundary to provide a easy to use storage format that wasn't yaml based and easy to version. This would make any additional backend based on cloud provider object stores much easier to add, while not reducing any of the current functionality.

@jtlisi jtlisi force-pushed the 20190916_file_backed_alertmanager branch 2 times, most recently from eaa9211 to b32dbb3 Compare October 11, 2019 02:36
@jtlisi jtlisi force-pushed the 20190916_file_backed_alertmanager branch 2 times, most recently from d079fa6 to e574088 Compare January 15, 2020 16:45
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time I'm seeing the multi-tenant alertmanager and config DB, so I have have miss something. I left some comments. All in all, looks doing it job. I left several comments to polish it up a bit (especially from the user perspective), plus I think we're not correctly handling the case an user config is removed (see comment in multitenant.go).

// AlertStoreConfig configures the alertmanager backend
type AlertStoreConfig struct {
Type string `yaml:"type"`
ConfigDB client.Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts:

  • I would suggest to make the YAML field name explict. Given the type could be file or configdb, we may call these fields with the same name (to keep it specular), so yaml:"configdb" and yaml:"file". However, file is a very bad name because very generic.
  • I've seen you've also used "local" sometimes in the code: maybe it's a better name for file?
  • Is really a value naming the DB type configdb? Isn't just a db the backend storage?

What's your take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think local is a better fit. It's what we use to designate the boltdb backend for chunks

Comment on lines 21 to 23
cfg.File.RegisterFlags(f)
cfg.ConfigDB.RegisterFlagsWithPrefix("alertmanager.", f)
f.StringVar(&cfg.Type, "alertmanager.storage.type", "configdb", "Method to use for backend rule storage (configdb, file)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI flags should have all the same prefix, then:

  • .type
  • .<type>.<option> (ie. .file.path)

Right now there's a bit of inconsistency:

  -alertmanager.storage.type string
  -alertmanager.storage.local.path string
  -alertmanager.configs.auto-webhook-root string
  -alertmanager.configs.client-timeout duration
  -alertmanager.configs.fallback string
  -alertmanager.configs.poll-interval duration
  -alertmanager.configs.url value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything related to backend config storage should have a alertmanager.storage.. However, I wanted to avoid making a breaking change in this PR and updating the prefix for alertmanager.configs to alertmanager.storage.configs would force users to change their flags. I can add a TODO to make that change in the future and go throough a change/deprecation process in a future PR?

case "file":
return local.NewFileAlertStore(cfg.File)
default:
return nil, fmt.Errorf("Unrecognized rule storage mode %v, choose one of: configdb, file", cfg.Type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • mode > type given we call like this in the config
  • rule > replacing it with alertmanager is more clear to the final user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, a lot of this code is based on the Ruler.

@@ -0,0 +1,71 @@
package local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to name the package and file based on the content. For example, local/alert_client.go makes me think about "local" storage (while it's named file) and "client" is actually code "store" in the code.


// RegisterFlags registers flags related to the alertmanager file store
func (cfg *FileAlertStoreConfig) RegisterFlags(f *flag.FlagSet) {
f.StringVar(&cfg.Path, "alertmanager.storage.local.path", "/etc/cortex/alertmanager_configs/", "Path at which alertmanager configurations are stored.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is /etc/cortex/alertmanager_configs/ a good default? We don't have any default filepath in that location. Looks more inherited by our cluster setup, than picked to be a good default to me.

package alerts

// ToProto con
func ToProto(cfg string, templates map[string]string, user string) (*AlertConfigDesc, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who uses it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed in this PR, I'll remove this file

}

// ParseTemplates returns a alertmanager config object
func ParseTemplates(cfg *AlertConfigDesc) map[string]string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who uses it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed in this PR, I'll remove this file

@jtlisi jtlisi force-pushed the 20190916_file_backed_alertmanager branch from 4f81964 to 65b264b Compare January 28, 2020 22:42
@jtlisi
Copy link
Contributor Author

jtlisi commented Jan 29, 2020

@pracucci this should be good for a second look

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @jtlisi! I think it's in a pretty good shape. I've left few nits, which should be quick to address. I haven't manually tested it, please don't forget to test all features with the latest version of the code 🙏

# The configstore_config configures the config database storing rules and
# alerts, and is used by the Cortex alertmanager.
# The CLI flags prefix for this block config is: alertmanager
[config_store: <configstore_config>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to CHANGELOG.md explaning all changes (this is a config breaking change).

@@ -446,7 +446,7 @@ func (t *Cortex) stopConfigs() error {
}

func (t *Cortex) initAlertmanager(cfg *Config) (err error) {
t.alertmanager, err = alertmanager.NewMultitenantAlertmanager(&cfg.Alertmanager, cfg.ConfigStore)
t.alertmanager, err = alertmanager.NewMultitenantAlertmanager(&cfg.Alertmanager)
if err != nil {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We specify the err variable in the function signature and utilize a naked return pattern throughout the modules.go file. I personally like to explicitly return the error so I'll update it.

if err != nil {
level.Warn(util.Logger).Log("msg", "MultitenantAlertmanager: configs server poll failed", "err", err)
level.Warn(util.Logger).Log("msg", "MultitenantAlertmanager: configs alert store poll failed", "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MultitenantAlertmanager: prefix repeated across all logs looks a bit annoying. If you want to clean it up, you may pick the logger in input in NewMultitenantAlertmanager(), wrap it with log.With(util.Logger, "component", "multiteant-alertmanager") and then always use that internally.

Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use an embedded logger with a component field.

}

func (am *MultitenantAlertmanager) addNewConfigs(cfgs map[string]configs.View) {
func (am *MultitenantAlertmanager) syncConfigs(cfgs map[string]alerts.AlertConfigDesc) {
// TODO: instrument how many configs we have, both valid & invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this TODO about? Doable in this PR too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke the totalConfigs metric up to have a status label

am.alertmanagersMtx.Unlock()
} else if am.cfgs[userID].AlertmanagerConfig != config.AlertmanagerConfig || hasTemplateChanges {
} else if am.cfgs[cfg.User].RawConfig != cfg.RawConfig || hasTemplateChanges {
level.Debug(util.Logger).Log("msg", "MultitenantAlertmanager: updating new alertmanager tenant", "user", cfg.User)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be an Info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -0,0 +1,64 @@
package alerts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being the configdb store, may make sense to move it to pkg/alertmanager/alerts/configdb/store.go and just move the AlertStore interface back into pkg/alertmanager/alerts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will also move the pkg/alertmananger/local package to be pkg/alertmanager/alerts/local

}

// ConfigAlertStore is a concrete implementation of RuleStore that sources rules from the config service
type ConfigAlertStore struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigDBAlertStore would be more appropriate? Just asking, up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

type ConfigAlertStore struct {
configClient client.Client
since configs.ID
alertConfigs map[string]AlertConfigDesc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to retain it? Looking at ListAlertConfigs() implementation looks not useful to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configs, err := c.configClient.GetAlerts(ctx, c.since)

Will only return configs that have IDs greater than c.since. This avoids the configdb having to do a full pull of it's table.

return err
}

user := strings.TrimSuffix(info.Name(), ext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously mentioned, I would add a comment to explain that the filename must match the user ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't rectify this previously


ext := filepath.Ext(info.Name())

if !info.IsDir() && (ext == ".yml" || ext == ".yaml") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously mentioned, I would reverse this into a guard. Looks easier to read the code, if we return nil in case it's not a file or not a YAML extension.

I would also suggest to log with a Warn any skipped entry, because we wouldn't expect any spurious file/directory within f.cfg.Path, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't rectify this previously

@jtlisi jtlisi force-pushed the 20190916_file_backed_alertmanager branch from 712b5e2 to 545835c Compare February 11, 2020 18:42
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bash-based integration tests are no more a thing. I see two options:

  1. Strip away integration tests from this PR and eventually open a new PR for them
  2. Move these tests to the new Go integration tests

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
@jtlisi jtlisi force-pushed the 20190916_file_backed_alertmanager branch from 545835c to cbd8c18 Compare February 12, 2020 18:20
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>
@jtlisi
Copy link
Contributor Author

jtlisi commented Feb 12, 2020

Closing in favor of #2125

@jtlisi jtlisi closed this Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants