Skip to content

Improve reliability of classification metrics #356

Closed
@vishalbollu

Description

@vishalbollu

Description

It takes a while for ListMetrics function in cloudwatch to yield consistent results. This behaviour is documented in their documentation https://docs.aws.amazon.com/sdk-for-go/api/service/cloudwatch/#CloudWatch.ListMetrics.

Motivation

The delay in ListMetrics from cloudwatch API causes the classification metrics to update later than other metrics, and in the first few minutes of API startup, metrics show up and dissappear sporadically from the ListMetrics results.

Additional Context

  • As a temporary workaround for not having metadata storage yet, have new classes encountered to S3 e.g. <api_id>/<class_name>
  • Each replica on startup reads all classes in S3 and caches it, during prediction serving any new class encountered triggers a blind write to S3 and an update to the cache

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions