Skip to content

Cross platform issues with Remote Workers / SSH Cluster Manager / Native Dependecies #22

Open

Description

Hi -- Using a head node (i.e. procid == 1) that is a mac on v0.3.6, I am trying to use linux based workers using SSHClusterManager.

I experience problems with e.g. using HDF5 --- the basic cause seems to be that

  • workers delegate include to node 1 (include == include_from_node1)
  • HDF5 (and many others) use BinDeps
  • BinDeps creates a custom deps.jl which is platform (and presumably even box) specific

so my linux boxes complain when the cannot locate the mac dylib

I've thought a bit about how to resolve this, but nothing obvious and elegant pops to mind. (For now I've just hacked my deps.jl on my mac to support both OS X and linux)

Have others seen this kind of issue? Is there some simple way to have the workers not pull code from node1 but simply rely on the locally installed packages?

I was thinking of hacking include_from_node1 in the .juliarc.jl on the linux boxes to simply not pull code from node1, but that seems a bit drastic -- any thoughts about whether this would work?

As an aside, while I can understand the motivation for include, using etc to work by delegating to node1 (e.g, simplify the need for code distribution), it does seems a bit difficult to do robustly or in a way that will scale nicely to dozens or hundreds of workers....

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions