Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(embedded): Retry when executing alert queries to avoid sending transient errors to users as alert failure notifications #20419

Merged
merged 1 commit into from
Jul 5, 2022

Conversation

zhaorui2022
Copy link
Contributor

@zhaorui2022 zhaorui2022 commented Jun 17, 2022

fix(embedded): Retry when executing alert queries to avoid sending transient errors to users as alert failure notifications

SUMMARY

Details documented at #20418

When there are transient errors, alert owners will receive a notification saying the alert fails even though the underlying dataset is working properly. Adding retries to avoid users receiving redundant notifications.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

This only happens when there are transient errors, so there is no stable way to reproduce the issue. Added unit tests to make sure execute_query function is retried when it should.

ADDITIONAL INFORMATION

@codecov
Copy link

codecov bot commented Jun 17, 2022

Codecov Report

Merging #20419 (1529de7) into master (467d8ef) will decrease coverage by 0.06%.
The diff coverage is 80.57%.

❗ Current head 1529de7 differs from pull request most recent head 7f69e78. Consider uploading reports for the commit 7f69e78 to get more accurate results

@@            Coverage Diff             @@
##           master   #20419      +/-   ##
==========================================
- Coverage   66.70%   66.64%   -0.07%     
==========================================
  Files        1739     1744       +5     
  Lines       65135    65381     +246     
  Branches     6897     6897              
==========================================
+ Hits        43450    43570     +120     
- Misses      19932    20058     +126     
  Partials     1753     1753              
Flag Coverage Δ
hive ?
mysql 82.36% <85.04%> (+0.07%) ⬆️
postgres 82.44% <85.22%> (+0.08%) ⬆️
presto ?
python 82.52% <85.22%> (-0.27%) ⬇️
sqlite 82.22% <85.04%> (+0.13%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...chart-controls/src/shared-controls/dndControls.tsx 26.92% <ø> (ø)
...d/packages/superset-ui-chart-controls/src/types.ts 100.00% <ø> (ø)
...ui-core/src/chart/components/FallbackComponent.tsx 100.00% <ø> (ø)
...packages/superset-ui-core/src/query/types/Query.ts 100.00% <ø> (ø)
.../superset-ui-core/src/query/types/QueryFormData.ts 100.00% <ø> (ø)
...s/legacy-plugin-chart-calendar/src/controlPanel.ts 100.00% <ø> (ø)
...egacy-plugin-chart-country-map/src/controlPanel.ts 100.00% <ø> (ø)
...ns/legacy-plugin-chart-heatmap/src/controlPanel.ts 66.66% <ø> (ø)
.../legacy-plugin-chart-histogram/src/controlPanel.ts 60.00% <ø> (ø)
...ns/legacy-plugin-chart-map-box/src/controlPanel.ts 33.33% <ø> (ø)
... and 322 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 467d8ef...7f69e78. Read the comment docs.

@zhaorui2022 zhaorui2022 force-pushed the alert_retry branch 2 times, most recently from d2ac5b0 to 6b8e8ea Compare June 21, 2022 18:19
@zhaorui2022 zhaorui2022 changed the title fix(embedded) Retry when executing alert queries to avoid sending transient errors to users as alert failure notifications fix(embedded): Retry when executing alert queries to avoid sending transient errors to users as alert failure notifications Jun 21, 2022
@zhaorui2022 zhaorui2022 force-pushed the alert_retry branch 3 times, most recently from d4e8834 to 77c3fd2 Compare June 24, 2022 17:22
@@ -46,6 +47,10 @@
# to avoid heavy loads done by a user mistake
OPERATOR_FUNCTIONS = {">=": ge, ">": gt, "<=": le, "<": lt, "==": eq, "!=": ne}

# Max tries to run queries to prevent false errors caused by transient errors
# being returned to users
EXECUTE_QUERY_MAX_TRIES = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it at 1 to keep the default behavior
also let's move it to the config to let other to decide what retry policy they are looking for

Copy link
Member

@bkyryliuk bkyryliuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move const to the config

@zhaorui2022 zhaorui2022 force-pushed the alert_retry branch 2 times, most recently from 367cc9b to 9525525 Compare June 28, 2022 16:03
@@ -27,6 +27,7 @@

from superset import app, jinja_context, security_manager
from superset.commands.base import BaseCommand
from superset.config import ALERT_REPORTS_QUERY_EXECUTION_MAX_TRIES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be accessed in a different way:

if app.config["ALERT_REPORTS_NOTIFICATION_DRY_RUN"]:

import pandas as pd
from pytest_mock import MockFixture

from superset.config import ALERT_REPORTS_QUERY_EXECUTION_MAX_TRIES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's overwrite it for the test to be set to 2, feel free to use a test config

@zhaorui2022 zhaorui2022 force-pushed the alert_retry branch 2 times, most recently from 2be5b46 to 962cd2e Compare June 29, 2022 20:51
Copy link
Member

@bkyryliuk bkyryliuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG%nit


command.validate()

assert execute_query_mock.call_count == expected_max_retries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do
assert execute_query_mock.call_count == 3
instead, it will make sure that you are using a right config

…ansient errors to users as alert failure notifications
@bkyryliuk bkyryliuk merged commit 818962c into apache:master Jul 5, 2022
michael-s-molina pushed a commit that referenced this pull request Jul 6, 2022
…ansient errors to users as alert failure notifications (#20419)

Co-authored-by: Rui Zhao <zhaorui@dropbox.com>
(cherry picked from commit 818962c)
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v2.0 🍒 2.0.0 🍒 2.0.1 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants