Skip to content

Add Alertmanager Integration Tests and Static File Backend #2125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,20 @@

## master / unreleased
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we didn't mention the introduction of the local storage in the changelog. I think it's worth to mention it.


* [CHANGE] Config file changed to remove top level `config_store` field in favor of a nested `configdb` field. #2125
* [CHANGE] Removed unnecessary `frontend.cache-split-interval` in favor of `querier.split-queries-by-interval` both to reduce configuration complexity and guarantee alignment of these two configs. Starting from now, `-querier.cache-results` may only be enabled in conjunction with `-querier.split-queries-by-interval` (previously the cache interval default was `24h` so if you want to preserve the same behaviour you should set `-querier.split-queries-by-interval=24h`). #2040
* [CHANGE] Removed remaining support for using denormalised tokens in the ring. If you're still running ingesters with denormalised tokens (Cortex 0.4 or earlier, with `-ingester.normalise-tokens=false`), such ingesters will now be completely invisible to distributors and need to be either switched to Cortex 0.6.0 or later, or be configured to use normalised tokens. #2034
* [CHANGE] Moved `--store.min-chunk-age` to the Querier config as `--querier.query-store-after`, allowing the store to be skipped during query time if the metrics wouldn't be found. The YAML config option `ingestermaxquerylookback` has been renamed to `query_ingesters_within` to match its CLI flag. #1893
* `--store.min-chunk-age` has been removed
* `--querier.query-store-after` has been added in it's place.
* [CHANGE] Experimental Memberlist KV store can now be used in single-binary Cortex. Attempts to use it previously would fail with panic. This change also breaks existing binary protocol used to exchange gossip messages, so this version will not be able to understand gossiped Ring when used in combination with the previous version of Cortex. Easiest way to upgrade is to shutdown old Cortex installation, and restart it with new version. Incremental rollout works too, but with reduced functionality until all components run the same version. #2016
* [CHANGE] Renamed the cache configuration setting `defaul_validity` to `default_validity`. #2140
* [FEATURE] Added a read-only local alertmanager config store using files named corresponding to their tenant id. #2125
* [FEATURE] Added user sub rings to distribute users to a subset of ingesters. #1947
* `--experimental.distributor.user-subring-size`
* [FEATURE] Added flag `-experimental.ruler.enable-api` to enable the ruler api which implements the Prometheus API `/api/v1/rules` and `/api/v1/alerts` endpoints under the configured `-http.prefix`. #1999
* [FEATURE] Added sharding support to compactor when using the experimental TSDB blocks storage. #2113
* [ENHANCEMENT] Add `status` label to `cortex_alertmanager_configs` metric to gauge the number of valid and invalid configs. #2125
* [ENHANCEMENT] Cassandra Authentication: added the `custom_authenticators` config option that allows users to authenticate with cassandra clusters using password authenticators that are not approved by default in [gocql](https://github.com/gocql/gocql/blob/81b8263d9fe526782a588ef94d3fa5c6148e5d67/conn.go#L27) #2093
* [ENHANCEMENT] Experimental TSDB: Export TSDB Syncer metrics from Compactor component, they are prefixed with `cortex_compactor_`. #2023
* [ENHANCEMENT] Experimental TSDB: Added dedicated flag `-experimental.tsdb.bucket-store.tenant-sync-concurrency` to configure the maximum number of concurrent tenants for which blocks are synched. #2026
Expand Down
21 changes: 16 additions & 5 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,6 @@ Supported contents and default values of the config file:
# and used by the 'configs' service to expose APIs to manage them.
[configdb: <configdb_config>]

# The configstore_config configures the config database storing rules and
# alerts, and is used by the Cortex alertmanager.
# The CLI flags prefix for this block config is: alertmanager
[config_store: <configstore_config>]

# The alertmanager_config configures the Cortex alertmanager.
[alertmanager: <alertmanager_config>]

Expand Down Expand Up @@ -821,6 +816,22 @@ externalurl:
# Root of URL to generate if config is http://internal.monitor
# CLI flag: -alertmanager.configs.auto-webhook-root
[autowebhookroot: <string> | default = ""]

store:
# Type of backend to use to store alertmanager configs. Supported values are:
# "configdb", "local".
# CLI flag: -alertmanager.storage.type
[type: <string> | default = "configdb"]

# The configstore_config configures the config database storing rules and
# alerts, and is used by the Cortex alertmanager.
# The CLI flags prefix for this block config is: alertmanager
[configdb: <configstore_config>]

local:
# Path at which alertmanager configurations are stored.
# CLI flag: -alertmanager.storage.local.path
[path: <string> | default = ""]
```

## `table_manager_config`
Expand Down
47 changes: 47 additions & 0 deletions integration/alertmanager_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
package main

import (
"context"
"io/ioutil"
"os"
"path/filepath"
"testing"

"github.com/stretchr/testify/require"

"github.com/cortexproject/cortex/integration/e2e"
"github.com/cortexproject/cortex/integration/e2ecortex"
)

func TestAlertmanager(t *testing.T) {
s, err := e2e.NewScenario(networkName)
require.NoError(t, err)
defer s.Close()

alertmanagerDir := filepath.Join(s.SharedDir(), "alertmanager_configs")
require.NoError(t, os.Mkdir(alertmanagerDir, os.ModePerm))

require.NoError(t, ioutil.WriteFile(
filepath.Join(alertmanagerDir, "user-1.yaml"),
[]byte(cortexAlertmanagerUserConfigYaml),
os.ModePerm),
)

alertmanager := e2ecortex.NewAlertmanager("alertmanager", AlertmanagerConfigs, "")
require.NoError(t, s.StartAndWaitReady(alertmanager))
require.NoError(t, alertmanager.WaitSumMetric("cortex_alertmanager_configs", 1))

c, err := e2ecortex.NewClient("", "", alertmanager.Endpoint(80), "user-1")
require.NoError(t, err)

cfg, err := c.GetAlertmanagerConfig(context.Background())
require.NoError(t, err)

// Ensure the returned status config matches alertmanager_test_fixtures/user-1.yaml
require.NotNil(t, cfg)
require.Equal(t, "example_receiver", cfg.Route.Receiver)
require.Len(t, cfg.Route.GroupByStr, 1)
require.Equal(t, "example_groupby", cfg.Route.GroupByStr[0])
require.Len(t, cfg.Receivers, 1)
require.Equal(t, "example_receiver", cfg.Receivers[0].Name)
}
4 changes: 2 additions & 2 deletions integration/backward_compatibility_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ func TestBackwardCompatibilityWithChunksStorage(t *testing.T) {
now := time.Now()
series, expectedVector := generateSeries("series_1", now)

c, err := e2ecortex.NewClient(distributor.Endpoint(80), "", "user-1")
c, err := e2ecortex.NewClient(distributor.Endpoint(80), "", "", "user-1")
require.NoError(t, err)

res, err := c.Push(series)
Expand All @@ -74,7 +74,7 @@ func TestBackwardCompatibilityWithChunksStorage(t *testing.T) {
require.NoError(t, querier.WaitSumMetric("cortex_ring_tokens_total", 512))

// Query the series
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "user-1")
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "", "user-1")
require.NoError(t, err)

result, err := c.Query("series_1", now)
Expand Down
13 changes: 13 additions & 0 deletions integration/configs.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,22 @@ const (
prefix: cortex_chunks_
period: 168h0m0s
`

cortexAlertmanagerUserConfigYaml = `route:
receiver: "example_receiver"
group_by: ["example_groupby"]
receivers:
- name: "example_receiver"
`
)

var (
AlertmanagerConfigs = map[string]string{
"-alertmanager.storage.local.path": filepath.Join(e2e.ContainerSharedDir, "alertmanager_configs"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make sense have a alertmanagerConfigDir constant which we could use both here and in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory used in the test is the path globally for the shared directory before it is mapped into the filesystem of the docker container. I can't use this directory for both without running into permission issues on my Mac OS setup.

"-alertmanager.storage.type": "local",
"-alertmanager.web.external-url": "http://localhost/api/prom",
}

BlocksStorage = map[string]string{
"-store.engine": "tsdb",
"-experimental.tsdb.backend": "s3",
Expand Down
51 changes: 50 additions & 1 deletion integration/e2ecortex/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,24 @@ package e2ecortex
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"time"

"github.com/gogo/protobuf/proto"
"github.com/golang/snappy"
alertConfig "github.com/prometheus/alertmanager/config"
promapi "github.com/prometheus/client_golang/api"
promv1 "github.com/prometheus/client_golang/api/prometheus/v1"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/prompb"
yaml "gopkg.in/yaml.v2"
)

// Client is a client used to interact with Cortex in integration tests
type Client struct {
alertmanagerClient promapi.Client
distributorAddress string
timeout time.Duration
httpClient *http.Client
Expand All @@ -25,7 +29,7 @@ type Client struct {
}

// NewClient makes a new Cortex client
func NewClient(distributorAddress string, querierAddress string, orgID string) (*Client, error) {
func NewClient(distributorAddress string, querierAddress string, alertmanagerAddress string, orgID string) (*Client, error) {
// Create querier API client
querierAPIClient, err := promapi.NewClient(promapi.Config{
Address: "http://" + querierAddress + "/api/prom",
Expand All @@ -43,6 +47,17 @@ func NewClient(distributorAddress string, querierAddress string, orgID string) (
orgID: orgID,
}

if alertmanagerAddress != "" {
alertmanagerAPIClient, err := promapi.NewClient(promapi.Config{
Address: "http://" + alertmanagerAddress + "/api/prom",
RoundTripper: &addOrgIDRoundTripper{orgID: orgID, next: http.DefaultTransport},
})
if err != nil {
return nil, err
}
c.alertmanagerClient = alertmanagerAPIClient
}

return c, nil
}

Expand Down Expand Up @@ -95,3 +110,37 @@ func (r *addOrgIDRoundTripper) RoundTrip(req *http.Request) (*http.Response, err

return r.next.RoundTrip(req)
}

// ServerStatus represents a Alertmanager status response
// TODO: Upgrade to Alertmanager v0.20.0+ and utilize vendored structs
type ServerStatus struct {
Data struct {
ConfigYaml string `json:"configYAML"`
} `json:"data"`
}

// GetAlertmanagerConfig gets the status of an alertmanager instance
func (c *Client) GetAlertmanagerConfig(ctx context.Context) (*alertConfig.Config, error) {
u := c.alertmanagerClient.URL("/api/v1/status", nil)

req, err := http.NewRequest(http.MethodGet, u.String(), nil)
if err != nil {
return nil, fmt.Errorf("error creating request: %v", err)
}

_, body, _, err := c.alertmanagerClient.Do(ctx, req) // Ignoring warnings.
if err != nil {
return nil, err
}

var ss *ServerStatus
err = json.Unmarshal(body, &ss)
if err != nil {
return nil, err
}

cfg := &alertConfig.Config{}
err = yaml.Unmarshal([]byte(ss.Data.ConfigYaml), cfg)

return cfg, err
}
18 changes: 18 additions & 0 deletions integration/e2ecortex/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,3 +120,21 @@ func NewSingleBinary(name string, flags map[string]string, image string, httpPor
otherPorts...,
)
}

func NewAlertmanager(name string, flags map[string]string, image string) *e2e.HTTPService {
if image == "" {
image = GetDefaultImage()
}

return e2e.NewHTTPService(
name,
image,
e2e.NewCommandWithoutEntrypoint("cortex", e2e.BuildArgs(e2e.MergeFlags(map[string]string{
"-target": "alertmanager",
"-log.level": "warn",
}, flags))...),
// The alertmanager doesn't expose a readiness probe, so we just check if the / returns 404
e2e.NewReadinessProbe(80, "/", 404),
80,
)
}
2 changes: 1 addition & 1 deletion integration/getting_started_single_process_config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ func TestGettingStartedSingleProcessConfig(t *testing.T) {
cortex := e2ecortex.NewSingleBinary("cortex-1", flags, "", 9009)
require.NoError(t, s.StartAndWaitReady(cortex))

c, err := e2ecortex.NewClient(cortex.Endpoint(9009), cortex.Endpoint(9009), "user-1")
c, err := e2ecortex.NewClient(cortex.Endpoint(9009), cortex.Endpoint(9009), "", "user-1")
require.NoError(t, err)

// Push some series to Cortex.
Expand Down
2 changes: 1 addition & 1 deletion integration/ingester_flush_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ func TestIngesterFlushWithChunksStorage(t *testing.T) {
require.NoError(t, distributor.WaitSumMetric("cortex_ring_tokens_total", 512))
require.NoError(t, querier.WaitSumMetric("cortex_ring_tokens_total", 512))

c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "user-1")
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "", "user-1")
require.NoError(t, err)

// Push some series to Cortex.
Expand Down
2 changes: 1 addition & 1 deletion integration/ingester_hand_over_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ func runIngesterHandOverTest(t *testing.T, flags map[string]string, setup func(t
require.NoError(t, distributor.WaitSumMetric("cortex_ring_tokens_total", 512))
require.NoError(t, querier.WaitSumMetric("cortex_ring_tokens_total", 512))

c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "user-1")
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "", "user-1")
require.NoError(t, err)

// Push some series to Cortex.
Expand Down
11 changes: 9 additions & 2 deletions pkg/alertmanager/alertmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,15 @@ func (am *Alertmanager) ApplyConfig(userID string, conf *config.Config) error {

am.api.Update(conf, func(_ model.LabelSet) {})

am.inhibitor.Stop()
am.dispatcher.Stop()
// Ensure inhibitor is set before being called
if am.inhibitor != nil {
am.inhibitor.Stop()
}

// Ensure dispatcher is set before being called
if am.dispatcher != nil {
am.dispatcher.Stop()
}

am.inhibitor = inhibit.NewInhibitor(am.alerts, conf.InhibitRules, am.marker, log.With(am.logger, "component", "inhibitor"))

Expand Down
Loading