You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some AWS hard limits around maximum cloudformation template size and number of resources per stack.
Currently some of our alerting config groups contain large numbers of alarms and so we are starting to see errors. Manually sharding alarms across multiple config groups is possible but is a pain.
Proposed solution
Add a configuration setting at alerting group level such as NumberOfCloudFormationStacks. This would default to 1.
use some deterministic method to bucket the alarms across stacks. e.g. the alarm name checksum % NumberOfCloudFormationStacks
deploy N stacks with the alarms spread across as per above e.g. aws-watchman-[alerting group name]-stack number. perhaps the first stack could keep the existing name to maintain compatibility with stacks already deployed, and then subsequent stacks are numbered?
I've thought a bit about how we could automate this (i.e. figuring out the number of stacks automatically), but it's difficult because the number of alarms can go up and down, so it seems like you would need some extra state to make sure we didn't get orphaned cloudformation stacks (maybe just listing stacks would be enough). But it seems complicated and the above solution seems like a reasonable start.
The text was updated successfully, but these errors were encountered:
The main question I have is what happens if you switch between numbers of stacks? Would you tear down any previous stacks and recreate everything, because otherwise you could find some alarms moving between stacks and being duplicated/orphaned.
If you increased the number of stacks everything would just work - e.g. if you go from 1 to 2, then half the alarms would be put into the 2nd stack, and CloudFormation would handle deleting them once they had gone from the 1st.
If you decreased the number of stacks it would work (everything would get squashed into fewer stacks), but you would need to manually delete the stacks above the new "NumberOfCloudFormationStacks" value
There are some AWS hard limits around maximum cloudformation template size and number of resources per stack.
Currently some of our alerting config groups contain large numbers of alarms and so we are starting to see errors. Manually sharding alarms across multiple config groups is possible but is a pain.
Proposed solution
NumberOfCloudFormationStacks
. This would default to 1.NumberOfCloudFormationStacks
I've thought a bit about how we could automate this (i.e. figuring out the number of stacks automatically), but it's difficult because the number of alarms can go up and down, so it seems like you would need some extra state to make sure we didn't get orphaned cloudformation stacks (maybe just listing stacks would be enough). But it seems complicated and the above solution seems like a reasonable start.
The text was updated successfully, but these errors were encountered: