Open
Description
In combination with cache_directory
, it would be nice if there was an option to manually provide the key at which the cache result would be available, as a non-default alternative to automatically generating it by hashing the cloudpickle binary.
E.g. the under-the-hood change would probably look very much like
def serialize_funct_h5(
fn: Callable,
fn_args: Optional[list] = None,
fn_kwargs: Optional[dict] = None,
resource_dict: Optional[dict] = None,
key: Optional[str] = None,
) -> tuple[str, dict]:
"""
Serialize a function and its arguments and keyword arguments into an HDF5 file.
Args:
fn (Callable): The function to be serialized.
fn_args (list): The arguments of the function.
fn_kwargs (dict): The keyword arguments of the function.
resource_dict (dict): resource dictionary, which defines the resources used for the execution of the function.
Example resource dictionary: {
cores: 1,
threads_per_core: 1,
gpus_per_worker: 0,
oversubscribe: False,
cwd: None,
executor: None,
hostname_localhost: False,
}
Returns:
Tuple[str, dict]: A tuple containing the task key and the serialized data.
"""
if fn_args is None:
fn_args = []
if fn_kwargs is None:
fn_kwargs = {}
if resource_dict is None:
resource_dict = {}
if key is None:
binary_all = cloudpickle.dumps(
{"fn": fn, "args": fn_args, "kwargs": fn_kwargs, "resource_dict": resource_dict}
)
task_key = fn.__name__ + _get_hash(binary=binary_all)
else:
task_key = key
data = {
"fn": fn,
"args": fn_args,
"kwargs": fn_kwargs,
"resource_dict": resource_dict,
}
return task_key, data