Skip to content

Managing optional dependencies in Python projects #485

Open
@MilesCranmer

Description

@MilesCranmer

This is half a feature request and half a request for comment on my current solution.

In PySR I have four optional Julia dependencies: LoopVectorization, Bumper, Zygote, and ClusterManagers. Installing all of these takes a bit of time, especially Julia precompilation, so I chose to install them only when the user needs such features. They trigger extensions in the upstream Julia packages.

The way I currently have this set up is as follows:

from typing import Optional
from .julia_import import Pkg, jl  # Internal meta-loading file to set up the right ENV variables

def load_required_packages(
    *,
    turbo: bool = False,
    bumper: bool = False,
    enable_autodiff: bool = False,
    cluster_manager: Optional[str] = None,
):
    if turbo:
        load_package("LoopVectorization", "bdcacae8-1622-11e9-2a5c-532679323890")
    if bumper:
        load_package("Bumper", "8ce10254-0962-460f-a3d8-1f77fea1446e")
    if enable_autodiff:
        load_package("Zygote", "e88e6eb3-aa80-5325-afca-941959d7151f")
    if cluster_manager is not None:
        load_package("ClusterManagers", "34f1f09b-3a8b-5176-ab39-66d58a4d544e")


def isinstalled(uuid_s: str):
    return jl.haskey(Pkg.dependencies(), jl.Base.UUID(uuid_s))


def load_package(package_name: str, uuid_s: str) -> None:
    if not isinstalled(uuid_s):
        Pkg.add(name=package_name, uuid=uuid_s)

    jl.seval(f"using {package_name}: {package_name}")
    return None

Basically in my monolithic PySRRegressor object, I check the options the user selected and then install each of the packages they need. It seems to work reasonably well.

Note I use the jl.haskey(...) check instead of a try catch because as I painfully discovered, the try catch fails to detect a package not being present in the active environment (since you can technically load packages from @v1.10). This is problematic because when doing distributed compute, the worker processes will not have access to @v1.10; only the current environment! So the haskey method is needed...

I am wondering what the best way to do this is, and if we could have such a feature in juliacall or juliapkg at some point? I'm not really sure how it could work. I also don't want to increase complexity unnecessarily; maybe a isinstalled function is all we need here. The other thing I'm worried about is for multiple Julia-accelerated Python packages being loaded simultaneously (if not now, at some point) – how would this workflow interact between those packages?

Maybe the simplest things is to just require all potential extensions be installed at once... But it would get a bit much once I enable CUDA support for PySR, there would be so many installs required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions