Description
If tenant is assigned "fallback" configuration at the same time as sync of configurations is in progress, tenant's AM may get deactivated (Paused). Consider following scenario:
- Sync of configuration starts. This consists of a) listing configs, b) loading them to memory, c) and then applying.
- At the same time as sync is in progress, some tenant is assigned default configuration.
If 2. happens after listing (1a) step, then active AM is created for tenant, but then during (1c) step it will be deactivated, because current sync haven't seen the config file, but can already see active AM. When AM is deactivated, cached in-memory configuration is removed for user (
cortex/pkg/alertmanager/multitenant.go
Line 350 in e2f984c
Such AM will stay deactivated also during the subsequent syncs, because check on line 436 is not triggered. Loaded "cfg.RawConfig" will be empty string, but am.cfgs[cfg.User].RawConfig
for missing user (see deletion above) also results in empty string (
cortex/pkg/alertmanager/multitenant.go
Line 436 in e2f984c
End result is that unlucky user will have its AM deactivated (until new configuration is uploaded), and ruler will complain about being unable to push alerts to such user.
/cc @gotjosh