Skip to content

feat: add TLS support for communication from the Ruler-->AM #3752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 2, 2021
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
- `-<prefix>.s3.sse.kms-key-id`
- `-<prefix>.s3.sse.kms-encryption-context`
* [FEATURE] Querier: Enable `@ <timestamp>` modifier in PromQL using the new `-querier.at-modifier-enabled` flag. #3744
* [ENHANCEMENT] Ruler: Add TLS and explicit basis authentication configuration options for the HTTP client the ruler uses to communicate with the alertmanager. #3752
* `-ruler.alertmanager-client.basic-auth-username`: Configure the basic authentication username used by the client. Takes precedent over a URL configured username.
* `ruler.alertmanager-client.basic-auth-password`: Configure the basic authentication password used by the client. Takes precedent over a URL configured password.
* `ruler.alertmanager-client.tls-ca-path`: File path to the CA file.
* `ruler.alertmanager-client.tls-cert-path`: File path to the TLS certificate.
* `-ruler.alertmanager-client.tls-insecure-skip-verify`: Boolean to disable verifying the certificate.
* `-ruler.alertmanager-client.tls-key-path`: File path to the TLS key certificate.
* `-ruler.alertmanager-client.tls-server-name`: Expected name on the TLS certificate.
* [ENHANCEMENT] Ingester: exposed metric `cortex_ingester_oldest_unshipped_block_timestamp_seconds`, tracking the unix timestamp of the oldest TSDB block not shipped to the storage yet. #3705
* [ENHANCEMENT] Prometheus upgraded. #3739
* Avoid unnecessary `runtime.GC()` during compactions.
Expand Down
34 changes: 34 additions & 0 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1430,6 +1430,40 @@ storage:
# CLI flag: -ruler.notification-timeout
[notification_timeout: <duration> | default = 10s]

alertmanager_client:
# Path to the client certificate file, which will be used for authenticating
# with the server. Also requires the key path to be configured.
# CLI flag: -ruler.alertmanager-client.tls-cert-path
[tls_cert_path: <string> | default = ""]

# Path to the key file for the client certificate. Also requires the client
# certificate to be configured.
# CLI flag: -ruler.alertmanager-client.tls-key-path
[tls_key_path: <string> | default = ""]

# Path to the CA certificates file to validate server certificate against. If
# not set, the host's root CA certificates are used.
# CLI flag: -ruler.alertmanager-client.tls-ca-path
[tls_ca_path: <string> | default = ""]

# Override the expected name on the server certificate.
# CLI flag: -ruler.alertmanager-client.tls-server-name
[tls_server_name: <string> | default = ""]

# Skip validating server certificate.
# CLI flag: -ruler.alertmanager-client.tls-insecure-skip-verify
[tls_insecure_skip_verify: <boolean> | default = false]

# HTTP Basic authentication username. It overrides the username set in the URL
# (if any).
# CLI flag: -ruler.alertmanager-client.basic-auth-username
[basic_auth_username: <string> | default = ""]

# HTTP Basic authentication password. It overrides the password set in the URL
# (if any).
# CLI flag: -ruler.alertmanager-client.basic-auth-password
[basic_auth_password: <string> | default = ""]

# Max time to tolerate outage for restoring "for" state of alert.
# CLI flag: -ruler.for-outage-tolerance
[for_outage_tolerance: <duration> | default = 1h]
Expand Down
20 changes: 20 additions & 0 deletions integration/e2ecortex/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,26 @@ func NewAlertmanager(name string, flags map[string]string, image string) *Cortex
)
}

func NewAlertmanagerWithTLS(name string, flags map[string]string, image string) *CortexService {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding the only difference between this and NewAlertmanager is the readiness probe. What's the problem making the readiness probe working with HTTPS too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I didn't do this was because it required a change to the function parameters in the e2e package that were rather far reaching. Would it be acceptable to do this in a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I haven't checked the complexity and it's not a big deal.

if image == "" {
image = GetDefaultImage()
}

return NewCortexService(
name,
image,
e2e.NewCommandWithoutEntrypoint("cortex", e2e.BuildArgs(e2e.MergeFlags(map[string]string{
"-target": "alertmanager",
"-log.level": "warn",
"-experimental.alertmanager.enable-api": "true",
}, flags))...),
e2e.NewTCPReadinessProbe(httpPort),
httpPort,
grpcPort,
GossipPort,
)
}

func NewRuler(name string, flags map[string]string, image string) *CortexService {
if image == "" {
image = GetDefaultImage()
Expand Down
83 changes: 83 additions & 0 deletions integration/ruler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,12 @@ package integration

import (
"context"
"crypto/x509"
"crypto/x509/pkix"
"fmt"
"math"
"net/http"
"os"
"path/filepath"
"strings"
"testing"
Expand All @@ -20,6 +23,7 @@ import (
"github.com/stretchr/testify/require"
"gopkg.in/yaml.v3"

"github.com/cortexproject/cortex/integration/ca"
"github.com/cortexproject/cortex/integration/e2e"
e2edb "github.com/cortexproject/cortex/integration/e2e/db"
"github.com/cortexproject/cortex/integration/e2ecortex"
Expand Down Expand Up @@ -337,6 +341,85 @@ func TestRulerAlertmanager(t *testing.T) {
require.NoError(t, ruler.WaitSumMetricsWithOptions(e2e.Equals(2), []string{"cortex_prometheus_notifications_alertmanagers_discovered"}, e2e.WaitMissingMetrics))
}

func TestRulerAlertmanagerTLS(t *testing.T) {
var namespaceOne = "test_/encoded_+namespace/?"
ruleGroup := createTestRuleGroup(t)

s, err := e2e.NewScenario(networkName)
require.NoError(t, err)
defer s.Close()

// Start dependencies.
dynamo := e2edb.NewDynamoDB()
minio := e2edb.NewMinio(9000, RulerFlags()["-ruler.storage.s3.buckets"])
require.NoError(t, s.StartAndWaitReady(minio, dynamo))

// set the ca
cert := ca.New("Ruler/Alertmanager Test")

// Ensure the entire path of directories exist.
require.NoError(t, os.MkdirAll(filepath.Join(s.SharedDir(), "certs"), os.ModePerm))

require.NoError(t, cert.WriteCACertificate(filepath.Join(s.SharedDir(), caCertFile)))

// server certificate
require.NoError(t, cert.WriteCertificate(
&x509.Certificate{
Subject: pkix.Name{CommonName: "client"},
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth},
},
filepath.Join(s.SharedDir(), clientCertFile),
filepath.Join(s.SharedDir(), clientKeyFile),
))
require.NoError(t, cert.WriteCertificate(
&x509.Certificate{
Subject: pkix.Name{CommonName: "server"},
DNSNames: []string{"ruler.alertmanager-client"},
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
},
filepath.Join(s.SharedDir(), serverCertFile),
filepath.Join(s.SharedDir(), serverKeyFile),
))

// Have at least one alertmanager configuration.
require.NoError(t, writeFileToSharedDir(s, "alertmanager_configs/user-1.yaml", []byte(cortexAlertmanagerUserConfigYaml)))

// Start Alertmanagers.
amFlags := mergeFlags(
AlertmanagerFlags(),
AlertmanagerLocalFlags(),
getServerHTTPTLSFlags(),
)
am1 := e2ecortex.NewAlertmanagerWithTLS("alertmanager1", amFlags, "")
require.NoError(t, s.StartAndWaitReady(am1))

// Connect the ruler to the Alertmanager
configOverrides := mergeFlags(
map[string]string{
"-ruler.alertmanager-url": "https://" + am1.HTTPEndpoint(),
},
getTLSFlagsWithPrefix("ruler.alertmanager-client", "alertmanager", true),
)

// Start Ruler.
require.NoError(t, writeFileToSharedDir(s, cortexSchemaConfigFile, []byte(cortexSchemaConfigYaml)))
ruler := e2ecortex.NewRuler("ruler", mergeFlags(ChunksStorageFlags(), RulerFlags(), configOverrides), "")
require.NoError(t, s.StartAndWaitReady(ruler))

// Create a client with the ruler address configured
c, err := e2ecortex.NewClient("", "", "", ruler.HTTPEndpoint(), "user-1")
require.NoError(t, err)

// Set the rule group into the ruler
require.NoError(t, c.SetRuleGroup(ruleGroup, namespaceOne))

// Wait until the user manager is created
require.NoError(t, ruler.WaitSumMetrics(e2e.Equals(1), "cortex_ruler_managers_total"))

// Wait until we've discovered the alertmanagers.
require.NoError(t, ruler.WaitSumMetricsWithOptions(e2e.Equals(1), []string{"cortex_prometheus_notifications_alertmanagers_discovered"}, e2e.WaitMissingMetrics))
}

func createTestRuleGroup(t *testing.T) rulefmt.RuleGroup {
t.Helper()

Expand Down
24 changes: 21 additions & 3 deletions integration/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,12 +61,30 @@ func getServerTLSFlags() map[string]string {
}
}

func getClientTLSFlagsWithPrefix(prefix string) map[string]string {
func getServerHTTPTLSFlags() map[string]string {
return map[string]string{
"-server.http-tls-cert-path": filepath.Join(e2e.ContainerSharedDir, serverCertFile),
"-server.http-tls-key-path": filepath.Join(e2e.ContainerSharedDir, serverKeyFile),
"-server.http-tls-client-auth": "RequireAndVerifyClientCert",
"-server.http-tls-ca-path": filepath.Join(e2e.ContainerSharedDir, caCertFile),
}
}

func getClientTLSFlagsWithPrefix(prefix string) map[string]string {
return getTLSFlagsWithPrefix(prefix, "ingester.client", false)
}

func getTLSFlagsWithPrefix(prefix string, servername string, http bool) map[string]string {
flags := map[string]string{
"-" + prefix + ".tls-cert-path": filepath.Join(e2e.ContainerSharedDir, clientCertFile),
"-" + prefix + ".tls-key-path": filepath.Join(e2e.ContainerSharedDir, clientKeyFile),
"-" + prefix + ".tls-ca-path": filepath.Join(e2e.ContainerSharedDir, caCertFile),
"-" + prefix + ".tls-server-name": "ingester.client",
"-" + prefix + ".tls-enabled": "true",
"-" + prefix + ".tls-server-name": servername,
}

if !http {
flags["-"+prefix+".tls-enabled"] = "true"
}

return flags
}
38 changes: 34 additions & 4 deletions pkg/ruler/notifier.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package ruler

import (
"context"
"flag"
"fmt"
"net/url"
"regexp"
Expand All @@ -16,8 +17,21 @@ import (
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/dns"
"github.com/prometheus/prometheus/notifier"

"github.com/cortexproject/cortex/pkg/util"
"github.com/cortexproject/cortex/pkg/util/tls"
)

type NotifierConfig struct {
TLS tls.ClientConfig `yaml:",inline"`
BasicAuth util.BasicAuth `yaml:",inline"`
}

func (cfg *NotifierConfig) RegisterFlags(f *flag.FlagSet) {
cfg.TLS.RegisterFlagsWithPrefix("ruler.alertmanager-client", f)
cfg.BasicAuth.RegisterFlagsWithPrefix("ruler.alertmanager-client.", f)
}

// rulerNotifier bundles a notifier.Manager together with an associated
// Alertmanager service discovery manager and handles the lifecycle
// of both actors.
Expand Down Expand Up @@ -150,19 +164,35 @@ func amConfigFromURL(rulerConfig *Config, url *url.URL, apiVersion config.Alertm
PathPrefix: url.Path,
Timeout: model.Duration(rulerConfig.NotificationTimeout),
ServiceDiscoveryConfigs: sdConfig,
HTTPClientConfig: config_util.HTTPClientConfig{
TLSConfig: config_util.TLSConfig{
CAFile: rulerConfig.Notifier.TLS.CAPath,
CertFile: rulerConfig.Notifier.TLS.CertPath,
KeyFile: rulerConfig.Notifier.TLS.KeyPath,
InsecureSkipVerify: rulerConfig.Notifier.TLS.InsecureSkipVerify,
ServerName: rulerConfig.Notifier.TLS.ServerName,
},
},
}

// Check the URL for basic authentication information first
if url.User != nil {
amConfig.HTTPClientConfig = config_util.HTTPClientConfig{
BasicAuth: &config_util.BasicAuth{
Username: url.User.Username(),
},
amConfig.HTTPClientConfig.BasicAuth = &config_util.BasicAuth{
Username: url.User.Username(),
}

if password, isSet := url.User.Password(); isSet {
amConfig.HTTPClientConfig.BasicAuth.Password = config_util.Secret(password)
}
}

// Override URL basic authentication configs with hard coded config values if present
if rulerConfig.Notifier.BasicAuth.IsEnabled() {
amConfig.HTTPClientConfig.BasicAuth = &config_util.BasicAuth{
Username: rulerConfig.Notifier.BasicAuth.Username,
Password: config_util.Secret(rulerConfig.Notifier.BasicAuth.Password),
}
}

return amConfig
}
37 changes: 36 additions & 1 deletion pkg/ruler/notifier_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ import (
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/dns"
"github.com/stretchr/testify/require"

"github.com/cortexproject/cortex/pkg/util"
)

func TestBuildNotifierConfig(t *testing.T) {
Expand Down Expand Up @@ -159,7 +161,7 @@ func TestBuildNotifierConfig(t *testing.T) {
},
},
{
name: "with Basic Authentication",
name: "with Basic Authentication URL",
cfg: &Config{
AlertmanagerURL: "http://marco:hunter2@alertmanager-0.default.svc.cluster.local/alertmanager",
},
Expand All @@ -185,6 +187,39 @@ func TestBuildNotifierConfig(t *testing.T) {
},
},
},
{
name: "with Basic Authentication URL and Explicit",
cfg: &Config{
AlertmanagerURL: "http://marco:hunter2@alertmanager-0.default.svc.cluster.local/alertmanager",
Notifier: NotifierConfig{
BasicAuth: util.BasicAuth{
Username: "jacob",
Password: "test",
},
},
},
ncfg: &config.Config{
AlertingConfig: config.AlertingConfig{
AlertmanagerConfigs: []*config.AlertmanagerConfig{
{
HTTPClientConfig: config_util.HTTPClientConfig{
BasicAuth: &config_util.BasicAuth{Username: "jacob", Password: "test"},
},
APIVersion: "v1",
Scheme: "http",
PathPrefix: "/alertmanager",
ServiceDiscoveryConfigs: discovery.Configs{
discovery.StaticConfig{
{
Targets: []model.LabelSet{{"__address__": "alertmanager-0.default.svc.cluster.local"}},
},
},
},
},
},
},
},
},
}

for _, tt := range tests {
Expand Down
3 changes: 3 additions & 0 deletions pkg/ruler/ruler.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ type Config struct {
NotificationQueueCapacity int `yaml:"notification_queue_capacity"`
// HTTP timeout duration when sending notifications to the Alertmanager.
NotificationTimeout time.Duration `yaml:"notification_timeout"`
// Client configs for interacting with the Alertmanager
Notifier NotifierConfig `yaml:"alertmanager_client"`

// Max time to tolerate outage for restoring "for" state of alert.
OutageTolerance time.Duration `yaml:"for_outage_tolerance"`
Expand Down Expand Up @@ -130,6 +132,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
cfg.ClientTLSConfig.RegisterFlagsWithPrefix("ruler.client", f)
cfg.StoreConfig.RegisterFlags(f)
cfg.Ring.RegisterFlags(f)
cfg.Notifier.RegisterFlags(f)

// Deprecated Flags that will be maintained to avoid user disruption
flagext.DeprecatedFlag(f, "ruler.client-timeout", "This flag has been renamed to ruler.configs.client-timeout")
Expand Down
17 changes: 17 additions & 0 deletions pkg/util/http.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"bytes"
"context"
"encoding/json"
"flag"
"fmt"
"html/template"
"io"
Expand All @@ -19,6 +20,22 @@ import (

const messageSizeLargerErrFmt = "received message larger than max (%d vs %d)"

// BasicAuth configures basic authentication for HTTP clients.
type BasicAuth struct {
Username string `yaml:"basic_auth_username"`
Password string `yaml:"basic_auth_password"`
}

func (b *BasicAuth) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) {
f.StringVar(&b.Username, prefix+"basic-auth-username", "", "HTTP Basic authentication username. It overrides the username set in the URL (if any).")
f.StringVar(&b.Password, prefix+"basic-auth-password", "", "HTTP Basic authentication password. It overrides the password set in the URL (if any).")
}

// IsEnabled returns false if basic authentication isn't enabled.
func (b BasicAuth) IsEnabled() bool {
return b.Username != "" || b.Password != ""
}

// WriteJSONResponse writes some JSON as a HTTP response.
func WriteJSONResponse(w http.ResponseWriter, v interface{}) {
w.Header().Set("Content-Type", "application/json")
Expand Down