Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(key-value): convert command to dao #29344

Merged
merged 16 commits into from
Jul 1, 2024

Conversation

villebro
Copy link
Member

@villebro villebro commented Jun 24, 2024

SUMMARY

This PR converts the Key Value commands to a DAO, which gives the caller much more control over when commits/rollbacks happen, and what to do about expired entries. Other changes:

  • Update all code that uses the KV commands to use the new DAO, making sure flushes/commits happen outside the DAO. In particular, this covers the following major components:
    • Explore permalinks
    • Dashboard permalinks
    • Distributed lock. During the refactor, the distributed lock was moved under a module of its own + break out the stateful logic into commands that use the new DAO.
    • Metastore cache. I considered breaking out the logic into commands, but it felt more sensible to keep the code as-is under a single python file, as it's under extensions. In addition, this doesn't use the new transaction decorator, as we need to adhere to the Flask-Caching API.
  • The old KV command integration tests are converted into functionally equivalent unit tests

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 93.86792% with 13 lines in your changes missing coverage. Please review.

Project coverage is 83.87%. Comparing base (76d897e) to head (d7752de).
Report is 393 commits behind head on master.

Files Patch % Lines
superset/distributed_lock/utils.py 64.28% 5 Missing ⚠️
superset/distributed_lock/__init__.py 85.00% 3 Missing ⚠️
superset/daos/key_value.py 97.05% 2 Missing ⚠️
superset/commands/distributed_lock/base.py 94.11% 1 Missing ⚠️
superset/commands/distributed_lock/create.py 95.23% 1 Missing ⚠️
superset/commands/distributed_lock/get.py 94.11% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #29344       +/-   ##
===========================================
+ Coverage   60.48%   83.87%   +23.38%     
===========================================
  Files        1931      519     -1412     
  Lines       76236    37409    -38827     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    31376    -14738     
+ Misses      28017     6033    -21984     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 49.09% <25.94%> (-0.08%) ⬇️
javascript ?
mysql 77.16% <50.94%> (?)
postgres 77.26% <50.94%> (?)
presto 53.71% <25.94%> (-0.09%) ⬇️
python 83.87% <93.86%> (+20.38%) ⬆️
sqlite 76.74% <50.94%> (?)
unit 59.77% <83.01%> (+2.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 35 to 40
@dataclass
class Key:
id: int | None
uuid: UUID | None


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed, as the DAO always returns the full entry

Comment on lines 81 to 93
@pytest.fixture
def admin_user() -> User:
role = db.session.query(Role).filter_by(name="Admin").one()
user = User(
first_name="Alice",
last_name="Doe",
email="adoe@example.org",
username="alice_admin",
roles=[role],
)
db.session.add(user)
db.session.flush()
yield user
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use the admin user that gets created during test startup, and that worked when running the key_value_test.py in isolation, but when running the full test suite the user disappears. So it appears there's a test that deletes it. I didn't have time to try to find the culprit, but at any rate I feel it may be a better idea to not rely on any pre-existing fixtures, hence I suggest using this fixture going forward for new tests needing an admin user.


class KeyValueDAO(BaseDAO[KeyValueEntry]):
@staticmethod
def get_entry(
Copy link
Member Author

@villebro villebro Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reviewers who think the _entry suffix seems redundant: BaseDAO already has create, update, delete, which expects passing an instance of the model. This doesn't fit this use case very well, as the interface here is slightly different from the typical ORM object:

  • resource specifies the namespace
  • key can be an int or UUID
  • The Key Value model encodes/decodes values using a codec that we want to enforce in our method signatures

For this reason I needed to name these slightly differently to not collide with the base methods.

return db.session.query(KeyValueEntry).filter_by(**filter_).first()

@classmethod
def get_value(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a convenience method for when one only wants the value, not the full entry

Comment on lines +63 to +67
value = GetDistributedLock(namespace=namespace, params=kwargs).run()
if value:
logger.debug("Lock on namespace %s for key %s already taken", namespace, key)
raise CreateKeyValueDistributedLockFailedException("Lock already taken")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a get before trying to create the lock, as I noticed that creating locks when it was already taken was producing ugly error logs on Postgres. While this adds one more roundtrip to the metastore, the locking operations are fairly rare, hance it felt cleaner to do it this way. Note, that we're still handling the rare case that the lock got created after the get, so that will still be handled gracefully.

"chartId": slice.id,
"datasetId": dataset.id,
"datasource": datasource_string,
"datasourceType": DatasourceType.TABLE.value,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bycatch: the return value was missing a required field

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the improvements and for addressing all comments @villebro!

@villebro villebro merged commit 7d6e933 into apache:master Jul 1, 2024
37 checks passed
@villebro villebro deleted the villebro/key-value-dao branch July 1, 2024 17:23
@rtexelm
Copy link
Member

rtexelm commented Jul 1, 2024

@villebro I can't pinpoint why but something in this PR has caused local dev builds to fail with a flask app_context error. Also observed by QA at Preset.

2024-07-01 17:47:46,317:ERROR:superset.app:Failed to create app
Traceback (most recent call last):
  File "/Users/user/superset/app.py", line 40, in create_app
    app_initializer.init_app()
  File "/Users/user/superset/initialization/__init__.py", line 479, in init_app
    self.configure_cache()
  File "/Users/user/superset/initialization/__init__.py", line 511, in configure_cache
    cache_manager.init_app(self.superset_app)
  File "/Users/user/superset/utils/cache_manager.py", line 93, in init_app
    self._init_cache(
  File "/Users/user/superset/utils/cache_manager.py", line 87, in _init_cache
    cache.init_app(app, cache_config)
  File "/Users/user/site-packages/flask_caching/__init__.py", line 145, in init_app
    self._set_cache(app, config)
  File "/Users/user/site-packages/flask_caching/__init__.py", line 155, in _set_cache
    cache_factory = import_string(import_me)
  File "/Users/user/site-packages/werkzeug/utils.py", line 596, in import_string
    __import__(import_name)
  File "/Users/user/superset/extensions/metastore_cache.py", line 27, in <module>
    from superset.daos.key_value import KeyValueDAO
  File "/Users/user/superset/daos/key_value.py", line 32, in <module>
    from superset.key_value.models import KeyValueEntry
  File "/Users/user/superset/key_value/models.py", line 24, in <module>
    from superset.models.helpers import AuditMixinNullable, ImportExportMixin
  File "/Users/user/superset/models/__init__.py", line 17, in <module>
    from . import core, dynamic_plugins, sql_lab, user_attributes  # noqa: F401
  File "/Users/user/superset/models/core.py", line 74, in <module>
    from superset.models.helpers import AuditMixinNullable, ImportExportMixin
  File "/Users/user/superset/models/helpers.py", line 107, in <module>
    config = app.config
  File "/Users/user/site-packages/werkzeug/local.py", line 318, in __get__
    obj = instance._get_current_object()
  File "/Users/user/site-packages/werkzeug/local.py", line 519, in _get_current_object
    raise RuntimeError(unbound_message) from None
RuntimeError: Working outside of application context.

This typically means that you attempted to use functionality that needed
the current application. To solve this, set up an application context
with app.app_context(). See the documentation for more information.

Followed by this

Traceback (most recent call last):
  File "/Users/user/bin/superset", line 8, in <module>
    sys.exit(superset())
  File "/Users/user/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/user/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/user/lib/python3.10/site-packages/click/core.py", line 1685, in invoke
    super().invoke(ctx)
  File "/Users/user/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/user/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/user/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/user/lib/python3.10/site-packages/flask/cli.py", line 355, in decorator
    app = __ctx.ensure_object(ScriptInfo).load_app()
  File "/Users/user/lib/python3.10/site-packages/flask/cli.py", line 309, in load_app
    app = locate_app(import_name, name)
  File "/Users/user/lib/python3.10/site-packages/flask/cli.py", line 238, in locate_app
    return find_app_by_string(module, app_name)
  File "/Users/user/lib/python3.10/site-packages/flask/cli.py", line 166, in find_app_by_string
    app = attr(*args, **kwargs)
  File "/Users/user/superset/app.py", line 40, in create_app
    app_initializer.init_app()
  File "/Users/user/superset/initialization/__init__.py", line 479, in init_app
    self.configure_cache()
  File "/Users/user/superset/initialization/__init__.py", line 511, in configure_cache
    cache_manager.init_app(self.superset_app)
  File "/Users/user/superset/utils/cache_manager.py", line 93, in init_app
    self._init_cache(
  File "/Users/user/superset/utils/cache_manager.py", line 87, in _init_cache
    cache.init_app(app, cache_config)
  File "/Users/user/lib/python3.10/site-packages/flask_caching/__init__.py", line 145, in init_app
    self._set_cache(app, config)
  File "/Users/user/lib/python3.10/site-packages/flask_caching/__init__.py", line 155, in _set_cache
    cache_factory = import_string(import_me)
  File "/Users/user/lib/python3.10/site-packages/werkzeug/utils.py", line 596, in import_string
    __import__(import_name)
  File "/Users/user/superset/extensions/metastore_cache.py", line 27, in <module>
    from superset.daos.key_value import KeyValueDAO
  File "/Users/user/superset/daos/key_value.py", line 32, in <module>
    from superset.key_value.models import KeyValueEntry
  File "/Users/user/superset/key_value/models.py", line 24, in <module>
    from superset.models.helpers import AuditMixinNullable, ImportExportMixin
  File "/Users/user/superset/models/__init__.py", line 17, in <module>
    from . import core, dynamic_plugins, sql_lab, user_attributes  # noqa: F401
  File "/Users/user/superset/models/core.py", line 74, in <module>
    from superset.models.helpers import AuditMixinNullable, ImportExportMixin
  File "/Users/user/superset/models/helpers.py", line 107, in <module>
    config = app.config
  File "/Users/user/lib/python3.10/site-packages/werkzeug/local.py", line 318, in __get__
    obj = instance._get_current_object()
  File "/Users/user/lib/python3.10/site-packages/werkzeug/local.py", line 519, in _get_current_object
    raise RuntimeError(unbound_message) from None
RuntimeError: Working outside of application context.

This typically means that you attempted to use functionality that needed
the current application. To solve this, set up an application context
with app.app_context(). See the documentation for more information.

@villebro
Copy link
Member Author

villebro commented Jul 2, 2024

@rtexelm I was able to reproduce, it was due to a last second cleanup where I moved imports to the top level from the methods. I'll open up a fix shortly after doing some more testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants