Skip to content

Create an alert around thumbnail exceptions using tallies from Redis #3707

Open

Description

Problem

We are capturing exceptions that occur when attempting to generate a thumbnail with Photon and adding tallies to Redis (see #3324 for an example).

with tallies_conn.pipeline() as tallies:
tallies.incr(f"thumbnail_response_code:{month}:{response.status}")
tallies.incr(
f"thumbnail_response_code_by_domain:{domain}:" f"{month}:{response.status}"
)
tallies.incr(
f"thumbnail_response_code_by_provider:{media_info.media_provider}:"
f"{month}:{response.status}"
)
try:
tallies.execute()
except ConnectionError:
logger.warning(
"Redis connect failed, thumbnail response codes not tallied."
)

Description

We should build an alarm or alert around those tallies, particularly to detect anomalous behavior as it may relate to thumbnails generated for a specific provider. #2401 may be a pre-requisite for this, so we're not gathering tally information from production when running the alarm logic.

Airflow may be the best place for this (as opposed to Cloudwatch) since it might give us more control in defining the logic used for the alarm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    🌟 goal: additionAddition of new feature🟨 priority: mediumNot blocking but should be addressed soon🤖 aspect: dxConcerns developers' experience with the codebase🧱 stack: apiRelated to the Django API🧱 stack: catalogRelated to the catalog and Airflow DAGs🧱 stack: infraRelated to the Terraform config and other infrastructure

    Type

    No type

    Projects

    • Status

      📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions