Skip to content

Add backend routing, transform caching, and sklearn estimator caching#9

Open
robosimon wants to merge 3 commits intomainfrom
feature/backend-routing-transform-cache
Open

Add backend routing, transform caching, and sklearn estimator caching#9
robosimon wants to merge 3 commits intomainfrom
feature/backend-routing-transform-cache

Conversation

@robosimon
Copy link

Summary

This PR makes iTuna significantly smoother for large iterative workflows:

Backend routing

You can now route different operations to different backends, matching how users actually run experiments:

# Expensive estimator fits go distributed
config.register_backend_route(method="fit", model_class=ConsistencyEnsemble, backend="disk_cache_distributed")
# Cheap consistency transform fits stay local
config.register_backend_route(method="fit", model_class=ConsistencyTransform, backend="disk_cache")

Transform caching

ConsistencyTransform.fit is now cached in disk-cache workflows, so consistency alignment is never recomputed for identical inputs. The distributed backend also uses a local fast-path when all requested models are already cached, skipping redundant sweep registration.

sklearn estimator caching (ituna.sklearn)

New module for caching standalone sklearn estimators outside of ConsistencyEnsemble:

from ituna.sklearn import cached, enable_global_cache

# Cache a single estimator instance in-place
cached(my_estimator, cache_dir="./cache")
my_estimator.fit(X)  # cached on first call, loaded from disk on subsequent calls

# Or enable globally for all estimators of a class
enable_global_cache(FastICA, cache_dir="./cache")

Patched estimators pass sklearn compliance checks (check_estimator).

Changes

  • ituna/config.py: register_backend_route, remove_backend_route, clear_backend_routes, resolve_backend_route
  • ituna/_backends/: lazy backend resolution per operation; disk-cache transform fit caching; distributed fast-path for fully cached reruns
  • ituna/sklearn.py: new module — cached(), enable_global_cache(), disable_global_cache(), get_global_cache_status()
  • ituna/_cache_guard.py: recursion guard preventing double-caching when patched estimators are used inside ensembles
  • Docs: updated backends tutorial with routing examples, new sklearn caching guide, CEBRA best-practices demo
  • Tests: test_backend_routing.py, test_sklearn_cache_wrapper.py, updated disk-cache and distributed tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant