Closed
Description
Description
It takes a while for ListMetrics function in cloudwatch to yield consistent results. This behaviour is documented in their documentation https://docs.aws.amazon.com/sdk-for-go/api/service/cloudwatch/#CloudWatch.ListMetrics.
Motivation
The delay in ListMetrics from cloudwatch API causes the classification metrics to update later than other metrics, and in the first few minutes of API startup, metrics show up and dissappear sporadically from the ListMetrics results.
Additional Context
- As a temporary workaround for not having metadata storage yet, have new classes encountered to S3 e.g. <api_id>/<class_name>
- Each replica on startup reads all classes in S3 and caches it, during prediction serving any new class encountered triggers a blind write to S3 and an update to the cache