Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add key_storage, isolation_level and lock_manager configuration to cache policies #15382

Merged
merged 11 commits into from
Sep 17, 2024

Conversation

desertaxle
Copy link
Member

@desertaxle desertaxle commented Sep 13, 2024

This PR adds the ability to specify a storage location for cache entries via a CachePolicy. The isolation level for the cache and locking mechanism are also configurable via a CachePolicy.

Example

Cache task results based on the input and source code a task. Store those cache entries at ~/.prefect/cache. Allow only one actor to access a cache entry at a time and store lock files to enforce that at ~.prefect/locks.

from prefect import task
from prefect.cache_policies import INPUTS, TASK_SOURCE
from prefect.transactions import IsolationLevel


@task(
    cache_policy=(INPUTS + TASK_SOURCE).configure(
        key_storage="~/.prefect/cache",
        lock_manager="~/.prefect/locks",
        isolation_level=IsolationLevel.SERIALIZABLE,
    )
)
def my_task(x):
    return x


if __name__ == "__main__":
    print(my_task(10))

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

@desertaxle desertaxle changed the title Cache storage, isolation_level and locks` configuration to cache policies Cache storage, isolation_level and locks configuration to cache policies Sep 13, 2024
@desertaxle desertaxle changed the title Cache storage, isolation_level and locks configuration to cache policies Add storage, isolation_level and locks configuration to cache policies Sep 13, 2024
@desertaxle desertaxle added the feature A new feature label Sep 13, 2024
Copy link

codspeed-hq bot commented Sep 13, 2024

CodSpeed Performance Report

Merging #15382 will not alter performance

Comparing cache-policy-enhancements (8f18646) with main (387cbe3)

Summary

✅ 3 untouched benchmarks

@desertaxle desertaxle marked this pull request as ready for review September 13, 2024 21:48
Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note on locks as a kwarg name, otherwise this is great

src/prefect/cache_policies.py Show resolved Hide resolved
@@ -195,7 +195,9 @@ def __enter__(self):
and not self.store.supports_isolation_level(self.isolation_level)
):
raise ValueError(
f"Isolation level {self.isolation_level.name} is not supported by provided result store."
f"Isolation level {self.isolation_level.name} is not supported by provided "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for future: I think we should refactor many of these errors into special prefect.exceptions.ConfigurationError exceptions for clarity and consistency

src/prefect/cache_policies.py Outdated Show resolved Hide resolved
Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks awesome -- left one small comment about the deepcopy thing, and one question that isn't super important but I wanted to give it some airtime just in case. cc: @zzstoatzz in case you have thoughts about the kwarg name storage

"IsolationLevel",
None,
] = None
lock_manager: Optional["LockManager"] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it's longer but I do like this more - very clear


@dataclass
class CachePolicy:
"""
Base class for all cache policies.
"""

storage: Union["WritableFileSystem", str, Path, None] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only because now is the time to ask these questions - are there any other name options we have for this one, just to make crystal clear it's not the same thing as result storage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility could be cache_storage. It'd be a tad redundant, but it'd be explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few other ideas just to explore our options:

  • key_storage
  • metadata_storage (mirrors the internal API)
  • key_files
  • metadata_files
  • cache_files
  • cache_data
  • metadata

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_storage jumps out as a good option to me. I think I'll change to that.

src/prefect/cache_policies.py Outdated Show resolved Hide resolved
@desertaxle desertaxle changed the title Add storage, isolation_level and locks configuration to cache policies Add key_storage, isolation_level and lock_manager configuration to cache policies Sep 17, 2024
@desertaxle desertaxle merged commit 8e91cd5 into main Sep 17, 2024
33 checks passed
@desertaxle desertaxle deleted the cache-policy-enhancements branch September 17, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants