Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Issue 13956 #13980

Merged
merged 1 commit into from
Apr 9, 2021
Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Apr 7, 2021

SUMMARY

Fixes #13956. More specifically the issue with "caching" the the dataset health check message (#11970) on the associated datasource record was that if the callback was defined, the first time a user when to explorer the record was updated with the callback response where the user was defined as the updater.

Additionally tying the health check solely to the metadata of the datasource is likely too restrictive, i.e., it seems plausible that an institution may want to have custom logic which actually introspects the underlying data. Hence the fix was to remove the caching on the datasource record and instead place on ownership on the institution to write the necessary logic for memoizing the callback (which includes the specific datasource as an argument).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley requested a review from a team as a code owner April 7, 2021 02:10
@john-bodley john-bodley force-pushed the john-bodley--fix-13956 branch 2 times, most recently from a16244a to 7d7d16e Compare April 7, 2021 02:28
@codecov
Copy link

codecov bot commented Apr 7, 2021

Codecov Report

Merging #13980 (49b86a6) into master (667eb83) will decrease coverage by 0.10%.
The diff coverage is 85.71%.

❗ Current head 49b86a6 differs from pull request most recent head 57f4d74. Consider uploading reports for the commit 57f4d74 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13980      +/-   ##
==========================================
- Coverage   79.40%   79.30%   -0.11%     
==========================================
  Files         938      938              
  Lines       47541    47523      -18     
  Branches     5940     5940              
==========================================
- Hits        37749    37687      -62     
- Misses       9666     9710      +44     
  Partials      126      126              
Flag Coverage Δ
cypress 56.05% <ø> (ø)
hive 80.45% <85.71%> (-0.04%) ⬇️
mysql 80.72% <85.71%> (-0.04%) ⬇️
postgres 80.75% <85.71%> (-0.04%) ⬇️
presto ?
python 81.16% <85.71%> (-0.19%) ⬇️
sqlite 80.36% <85.71%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/views/core.py 75.63% <ø> (-0.04%) ⬇️
superset/views/datasource.py 88.33% <ø> (-0.38%) ⬇️
superset/config.py 90.65% <50.00%> (-0.32%) ⬇️
superset/connectors/sqla/models.py 88.68% <100.00%> (-2.10%) ⬇️
superset/datasets/dao.py 96.62% <100.00%> (-0.05%) ⬇️
superset/db_engine_specs/presto.py 84.10% <0.00%> (-5.44%) ⬇️
superset/models/core.py 89.10% <0.00%> (-0.28%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 667eb83...57f4d74. Read the comment docs.



def downgrade():
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we impl a down migration here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing to downgrade to, i.e., the upgrade removes all the key and associated values from the extra blob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but if one needed to revert, would that even make sense to put the keys back?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@craig-rueda the key is optional and thus unnecessary to re-add if a downgrade was performed.

DATASET_HEALTH_CHECK = None
# A SQL dataset health check. Note if enabled the callable should be cached to aid with
# performance.
DATASET_HEALTH_CHECK: Optional[Callable[["SqlaTable"], str]] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be a BaseDatasource not a SqlaTable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 this functionality is only provided for the SQLA connector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice do add an health check callable example here, or add it to the docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar I added an example.

Comment on lines +690 to +692
check = config["DATASET_HEALTH_CHECK"]
return check(self) if check else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's more pythonic? this or:

Suggested change
check = config["DATASET_HEALTH_CHECK"]
return check(self) if check else None
return (config["DATASET_HEALTH_CHECK"] or lambda: None)(self)

is this even valid python? 🤷

# It will get executed each time when user open a chart's explore view.
DATASET_HEALTH_CHECK = None
# A SQL dataset health check. Note if enabled the callable should be cached to aid with
# performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If the healthy status is in cache but not persist in database, when restart server will we lost all the status?
  2. right now we can run some queries to see healthy status. if use cache solution we will not be easy to know this number?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@graceguo-supercat:

  1. The Redis (or similar) cache rarely is restarted.
  2. It's definitely less clear though possible via a Redis (or similar) query.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution makes sense. I guess the lesson is there should be less implicit side-effects.

if config.get("DATASET_HEALTH_CHECK")
else None
)
data_["health_check_message"] = self.health_check_message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data_["health_check_message"] = self.health_check_message
data_["health_check_message"] = self.check_health()

If this will run an external function every time this line is visited, maybe it's better to change this property to an actual function call so it "looks" more expensive...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud I thought about that. I think we could also just use @functools.lru_cache for the property so at least the property is cached locally.

@graceguo-supercat
Copy link

graceguo-supercat commented Apr 7, 2021

@dpgaspar Do you have any suggestion for this issue and fix? When a user open explore view to see a chart, we want to add a health check for the sql in the datasource, and want to save the check results into dataset extra column so that don't run the check every time. The problem is, this update for "extra" will set the viewer of chart as the last updater for dataset. Is there anyway to avoid this behavior? thanks!

@robdiciuccio
Copy link
Member

Removing the health_check property in the extra JSON schema seems to qualify as a breaking change according to SIP-57. Can the migration be broken out as a separate PR and staged for cleanup in the next major release (SIP-59)? Is the migration strictly necessary to re-implement the datasource health checks?

@john-bodley
Copy link
Member Author

@robdiciuccio I can break the migration into a separate PR.

Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment to improve docs

DATASET_HEALTH_CHECK = None
# A SQL dataset health check. Note if enabled the callable should be cached to aid with
# performance.
DATASET_HEALTH_CHECK: Optional[Callable[["SqlaTable"], str]] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice do add an health check callable example here, or add it to the docs

@ktmud
Copy link
Member

ktmud commented Apr 7, 2021

On second thought, I'm still a little torn between having Superset manage the cache vs shifting the responsibility to the end users. IMO the config file should be as lightweight as possible. The more logics we force users to put in there, the more likely things would break.

I'm wondering whether we can utilize sqla's before_insert and before_update events to run health checks on db updates instead of calling check_health() in views, so that the check function won't trigger a new update? For existing datasets, we provide a CLI command for admins to run health checks on all of them in batch.

@dpgaspar
Copy link
Member

dpgaspar commented Apr 7, 2021

@dpgaspar Do you have any suggestion for this issue and fix? When a user open explore view to see a chart, we want to add a health check for the sql in the datasource, and want to save the check results into dataset extra column so that don't run the check every time. The problem is, this update for "extra" will set the viewer of chart as the last updater for dataset. Is there anyway to avoid this behavior? thanks!

I haven't tested it but explicitly setting the changed_by_fk on a separated merge maybe?

@john-bodley
Copy link
Member Author

@ktmud I prefer the proposed approach as it provides the user with a lot more flexibility and per the PR description it means that the health check can relate to the actual underlying data and just not the datasource metadata. Squeezing cached datasource health check information into the datasource record (the current approach) has pros and cons.

Additionally by defining a caching strategy one can leverage Flask-Caching to clear the cache whenever the function byte-code changes and thus there's no longer a requirement for the user to ensure that they'll need to bump the version.

Finally in terms of making the config as light as possible, this hasn't been the case in the past. Superset historically has provided endless flexibility (via callbacks etc.) in the configs which places the ownership on the user which can be problematic if implemented incorrectly, but the upside can be tremendous.

@graceguo-supercat
Copy link

graceguo-supercat commented Apr 7, 2021

I haven't tested it but explicitly setting the changed_by_fk on a separated merge maybe?

db migration can only fix the old records right? right now the behavior is: when user opens explore view, health check function will add some message in the "extra" column, and the dataset's changed_by_fk will be changed to chart viewer.

@john-bodley john-bodley force-pushed the john-bodley--fix-13956 branch 2 times, most recently from b009f90 to 9ab9ca8 Compare April 7, 2021 20:20
@john-bodley
Copy link
Member Author

Actually @robdiciuccio reading through SIP-57 this change is non-breaking, i.e., the functionality remains unchanged, though performance could be degraded if the callback function is expensive and non-cached.

@john-bodley john-bodley closed this Apr 7, 2021
@john-bodley john-bodley reopened this Apr 7, 2021
@john-bodley john-bodley force-pushed the john-bodley--fix-13956 branch 4 times, most recently from e769fec to 6164d19 Compare April 8, 2021 00:28
@john-bodley john-bodley force-pushed the john-bodley--fix-13956 branch 6 times, most recently from 6db8674 to f5bbaf9 Compare April 8, 2021 07:09
@john-bodley
Copy link
Member Author

john-bodley commented Apr 8, 2021

@dpgaspar @etr2460 @graceguo-supercat @ktmud et al. this is now ready for re-review. Note although this fix is not perfect it does:

  1. Remedy the issue in a non-breaking manner. This issue is somewhat egregious as it could pollute the metadata database with incorrect updater information which is then difficult to remedy.
  2. Provide an equally (or possibly more) efficient implementation for caching the health checks.
  3. Provides a more flexible and powerful health check which can leverage the physical data as opposed to only the metadata. 4. The health check isn't tied to explore etc. meaning it can be exposed anywhere.

@john-bodley john-bodley force-pushed the john-bodley--fix-13956 branch 2 times, most recently from df92188 to 9a7a7fe Compare April 8, 2021 18:34
#
# if cache_manager.cache.get(name) != code:
# cache_manager.cache.delete_memoized(func)
# cache_manager.cache.set(name, code, timeout=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice example!

@@ -734,7 +734,7 @@ def explore( # pylint: disable=too-many-locals,too-many-return-statements,too-m

# if feature enabled, run some health check rules for sqla datasource
if hasattr(datasource, "health_check"):
datasource.health_check()
datasource.health_check_message = datasource.health_check()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this override the default @property getter or just update the cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud good call. Lines 735–737 can actually be removed.

@@ -734,7 +734,7 @@ def explore( # pylint: disable=too-many-locals,too-many-return-statements,too-m

# if feature enabled, run some health check rules for sqla datasource
if hasattr(datasource, "health_check"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be always false because you remove the health_check method/attribute from SqlaDatasource?

session = db.Session(bind=bind)

for datasource in session.query(SqlaTable):
if datasource.extra:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if datasource.extra:
if datasource.extra and "health_check" in datasource.extra:

This should limit following ops to a smaller subset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is risky as your checking for the existence of a substring in a JSON encoded string rather than the existence of a key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is. It at least saves json.loads op for datasources without health_check info, but can be guaranteed to be correct as long as you keep all following steps.

To really save some database IO, we can do the pre-filtering in the SQLA query step:

    for datasource in session.query(SqlaTable).filter(
        SqlaTable.extra.like('%"health_check"%')
    ):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted the downstream logic ensures that the string is actually from a JSON key (I misspoke earlier). That said this migration only takes a few seconds and thus I think the existing logic is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me as well. Just thought it was a good practice to always do pre-filtering when running db migrations.

@john-bodley john-bodley force-pushed the john-bodley--fix-13956 branch 5 times, most recently from 3d4b74a to cc89ef5 Compare April 9, 2021 01:37
@john-bodley john-bodley merged commit a3b41e2 into apache:master Apr 9, 2021
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Apr 9, 2021
Co-authored-by: John Bodley <john.bodley@airbnb.com>
(cherry picked from commit a3b41e2)
allanco91 pushed a commit to allanco91/superset that referenced this pull request May 21, 2021
Co-authored-by: John Bodley <john.bodley@airbnb.com>
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset is marked as being updated even when it is not being changed
8 participants