Description
Describe the bug
We are running micro services for cortex, we were using v1 version for alertmanager api by specifying flag -ruler.alertmanager-use-v2=false(used cortex v1.16.0), now we upgrade cortex to v1.17.1, from the log, I see we are using v2 version for alertmanager, when I create some alert rules, I see the alerts fire, but we can't get any email notification, meanwhile we are getting some error messages like:
caller=notifier.go:544 level=error user=Test alertmanager=https://cortex-alertmanager.org/alertmanager/api/v2/alerts count=1 msg="Error sending alert" err="bad response status 422 Unprocessable Entity"
To Reproduce
Steps to reproduce the behavior:
- Start Cortex (SHA or version): start cortex v1.17.1 with micro service mode
- Perform Operations(Read/Write/Others): create alert rule and observe the logs of ruler
Expected behavior
we should get the notifications and no error log should appear.
Environment:
- Infrastructure: [e.g., Kubernetes, bare-metal, laptop]: bare-metal
- Deployment tool: [e.g., helm, jsonnet]: we are using ansible to deploy systemd services for cortex micro services
Additional Context
configuration file for cortex ruler:
ExecStart=/usr/sbin/cortex-1.17.1 \
-auth.enabled=true \
-log.level=info \
-config.file=/etc/cortex-ruler/cortex-ruler.yaml \
-runtime-config.file=/etc/cortex-shared/cortex-runtime.yaml \
-server.http-listen-port=8061 \
-server.grpc-listen-port=9061 \
-server.grpc-max-recv-msg-size-bytes=104857600 \
-server.grpc-max-send-msg-size-bytes=104857600 \
-server.grpc-max-concurrent-streams=1000 \
\
-distributor.sharding-strategy=shuffle-sharding \
-distributor.ingestion-tenant-shard-size=12 \
-distributor.replication-factor=2 \
-distributor.shard-by-all-labels=true \
-distributor.zone-awareness-enabled=true \
\
-store.engine=blocks \
-blocks-storage.backend=s3 \
-blocks-storage.s3.endpoint=s3.org:10444 \
-blocks-storage.s3.bucket-name=staging-metrics \
-blocks-storage.s3.insecure=false \
\
-blocks-storage.bucket-store.sync-dir=/local/cortex-ruler/tsdb-sync \
-blocks-storage.bucket-store.metadata-cache.backend=memcached \
-blocks-storage.bucket-store.metadata-cache.memcached.addresses=100.76.51.1:11211,100.76.51.2:11211,100.76.51.3:11211 \
\
-querier.active-query-tracker-dir=/local/cortex-ruler/active-query-tracker \
-querier.ingester-streaming=true \
-querier.query-store-after=23h \
-querier.query-ingesters-within=24h \
-querier.shuffle-sharding-ingesters-lookback-period=25h \
\
-store-gateway.sharding-enabled=true \
-store-gateway.sharding-strategy=shuffle-sharding \
-store-gateway.tenant-shard-size=6 \
-store-gateway.sharding-ring.store=etcd \
-store-gateway.sharding-ring.etcd.endpoints=10.120.121.1:2379 \
-store-gateway.sharding-ring.etcd.endpoints=10.120.121.2:2379 \
-store-gateway.sharding-ring.etcd.endpoints=10.120.121.3:2379 \
-store-gateway.sharding-ring.etcd.endpoints=10.120.121.4:2379 \
-store-gateway.sharding-ring.etcd.endpoints=10.120.121.5:2379 \
-store-gateway.sharding-ring.prefix=cortex-store-gateways/ \
-store-gateway.sharding-ring.replication-factor=2 \
-store-gateway.sharding-ring.zone-awareness-enabled=true \
-store-gateway.sharding-ring.instance-availability-zone=t1 \
-store-gateway.sharding-ring.wait-stability-min-duration=1m \
-store-gateway.sharding-ring.wait-stability-max-duration=5m \
-store-gateway.sharding-ring.instance-addr=100.76.75.1 \
-store-gateway.sharding-ring.instance-id=s_8061 \
-store-gateway.sharding-ring.heartbeat-period=15s \
-store-gateway.sharding-ring.heartbeat-timeout=1m \
\
-ring.store=etcd \
-ring.prefix=cortex-ingesters/ \
-ring.heartbeat-timeout=1m \
-etcd.endpoints=10.120.119.1:2379 \
-etcd.endpoints=10.120.119.2:2379 \
-etcd.endpoints=10.120.119.3:2379 \
-etcd.endpoints=10.120.119.4:2379 \
-etcd.endpoints=10.120.119.5:2379 \
\
-ruler.enable-sharding=true \
-ruler.sharding-strategy=shuffle-sharding \
-ruler.tenant-shard-size=2 \
-ruler.ring.store=etcd \
-ruler.ring.prefix=cortex-rulers/ \
-ruler.ring.num-tokens=32 \
-ruler.ring.heartbeat-period=15s \
-ruler.ring.heartbeat-timeout=1m \
-ruler.ring.etcd.endpoints=10.120.119.1:2379 \
-ruler.ring.etcd.endpoints=10.120.119.2:2379 \
-ruler.ring.etcd.endpoints=10.120.119.3:2379 \
-ruler.ring.etcd.endpoints=10.120.119.4:2379 \
-ruler.ring.etcd.endpoints=10.120.119.5:2379 \
-ruler.ring.instance-id=s_8061 \
-ruler.ring.instance-interface-names=e1 \
\
-ruler.max-rules-per-rule-group=500 \
-ruler.max-rule-groups-per-tenant=5000 \
\
-ruler.external.url=staging-cortex-ruler.org \
-ruler.client.grpc-max-recv-msg-size=104857600 \
-ruler.client.grpc-max-send-msg-size=16777216 \
-ruler.client.grpc-compression= \
-ruler.client.grpc-client-rate-limit=0 \
-ruler.client.grpc-client-rate-limit-burst=0 \
-ruler.client.backoff-on-ratelimits=false \
-ruler.client.backoff-min-period=500ms \
-ruler.client.backoff-max-period=10s \
-ruler.client.backoff-retries=5 \
-ruler.evaluation-interval=15s \
-ruler.poll-interval=15s \
-ruler.rule-path=/local/cortex-ruler/rules \
-ruler.alertmanager-url=https://staging-cortex-alertmanager.org/alertmanager \
-ruler.alertmanager-discovery=false \
-ruler.alertmanager-refresh-interval=1m \
-ruler.notification-queue-capacity=10000 \
-ruler.notification-timeout=10s \
-ruler.flush-period=1m \
-experimental.ruler.enable-api=true \
\
-ruler-storage.backend=s3 \
-ruler-storage.s3.endpoint=s3.org:10444 \
-ruler-storage.s3.bucket-name=staging-rules \
-ruler-storage.s3.insecure=false \
\
-target=ruler
configuration file for cortex alertmanager:
ExecStart=/usr/sbin/cortex-1.17.1 \
-auth.enabled=true \
-log.level=info \
-config.file=/etc/cortex-alertmanager-8071/cortex-alertmanager.yaml \
-runtime-config.file=/etc/cortex-shared/cortex-runtime.yaml \
-server.http-listen-port=8071 \
-server.grpc-listen-port=9071 \
-server.grpc-max-recv-msg-size-bytes=104857600 \
-server.grpc-max-send-msg-size-bytes=104857600 \
-server.grpc-max-concurrent-streams=1000 \
\
-alertmanager.storage.path=/local/cortex-alertmanager-8071/data \
-alertmanager.storage.retention=120h \
-alertmanager.web.external-url=https://staging-cortex-alertmanager.org/alertmanager \
-alertmanager.configs.poll-interval=1m \
-experimental.alertmanager.enable-api=true \
\
-alertmanager.sharding-enabled=true \
-alertmanager.sharding-ring.store=etcd \
-alertmanager.sharding-ring.prefix=cortex-alertmanagers/ \
-alertmanager.sharding-ring.heartbeat-period=15s \
-alertmanager.sharding-ring.heartbeat-timeout=1m \
-alertmanager.sharding-ring.etcd.endpoints=10.120.121.1:2379 \
-alertmanager.sharding-ring.etcd.endpoints=10.120.121.2:2379 \
-alertmanager.sharding-ring.etcd.endpoints=10.120.121.3:2379 \
-alertmanager.sharding-ring.etcd.endpoints=10.120.121.4:2379 \
-alertmanager.sharding-ring.etcd.endpoints=10.120.121.5:2379 \
-alertmanager.sharding-ring.instance-id=b_8071 \
-alertmanager.sharding-ring.instance-interface-names=e1 \
-alertmanager.sharding-ring.replication-factor=2 \
-alertmanager.sharding-ring.zone-awareness-enabled=true \
-alertmanager.sharding-ring.instance-availability-zone=t1 \
\
-alertmanager-storage.backend=s3 \
-alertmanager-storage.s3.endpoint=s3.org:10444 \
-alertmanager-storage.s3.bucket-name=staging-alerts \
-alertmanager-storage.s3.insecure=false \
\
-alertmanager.receivers-firewall-block-cidr-networks=10.163.131.164/28,10.163.131.180/28 \
-alertmanager.receivers-firewall-block-private-addresses=true \
-alertmanager.notification-rate-limit=0 \
-alertmanager.max-config-size-bytes=0 \
-alertmanager.max-templates-count=0 \
-alertmanager.max-template-size-bytes=0 \
\
-target=alertmanager
the configuration for alertmanager:
template_files:
default_template: |
{{ define "__alertmanager" }}AlertManager{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver | urlquery }}{{ end }}
alertmanager_config: |
global:
smtp_smarthost: 'yourmailhost'
smtp_from: 'youraddress'
smtp_require_tls: false
templates:
- 'default_template'
route:
receiver: example-email
receivers:
- name: example-email
email_configs:
- to: 'youraddress'