Description
This is half a feature request and half a request for comment on my current solution.
In PySR I have four optional Julia dependencies: LoopVectorization, Bumper, Zygote, and ClusterManagers. Installing all of these takes a bit of time, especially Julia precompilation, so I chose to install them only when the user needs such features. They trigger extensions in the upstream Julia packages.
The way I currently have this set up is as follows:
from typing import Optional
from .julia_import import Pkg, jl # Internal meta-loading file to set up the right ENV variables
def load_required_packages(
*,
turbo: bool = False,
bumper: bool = False,
enable_autodiff: bool = False,
cluster_manager: Optional[str] = None,
):
if turbo:
load_package("LoopVectorization", "bdcacae8-1622-11e9-2a5c-532679323890")
if bumper:
load_package("Bumper", "8ce10254-0962-460f-a3d8-1f77fea1446e")
if enable_autodiff:
load_package("Zygote", "e88e6eb3-aa80-5325-afca-941959d7151f")
if cluster_manager is not None:
load_package("ClusterManagers", "34f1f09b-3a8b-5176-ab39-66d58a4d544e")
def isinstalled(uuid_s: str):
return jl.haskey(Pkg.dependencies(), jl.Base.UUID(uuid_s))
def load_package(package_name: str, uuid_s: str) -> None:
if not isinstalled(uuid_s):
Pkg.add(name=package_name, uuid=uuid_s)
jl.seval(f"using {package_name}: {package_name}")
return None
Basically in my monolithic PySRRegressor
object, I check the options the user selected and then install each of the packages they need. It seems to work reasonably well.
Note I use the jl.haskey(...)
check instead of a try catch
because as I painfully discovered, the try catch
fails to detect a package not being present in the active environment (since you can technically load packages from @v1.10
). This is problematic because when doing distributed compute, the worker processes will not have access to @v1.10
; only the current environment! So the haskey method is needed...
I am wondering what the best way to do this is, and if we could have such a feature in juliacall or juliapkg at some point? I'm not really sure how it could work. I also don't want to increase complexity unnecessarily; maybe a isinstalled
function is all we need here. The other thing I'm worried about is for multiple Julia-accelerated Python packages being loaded simultaneously (if not now, at some point) – how would this workflow interact between those packages?
Maybe the simplest things is to just require all potential extensions be installed at once... But it would get a bit much once I enable CUDA support for PySR, there would be so many installs required.