-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Execution #128
Comments
Hi! Great that you're looking into doing this. In terms of the first point, I would tend to add an executor specific to the scheduler - we considered doing this for LSF, so I'd imagine this would be similar. An existing co-ordinator could be used, but one could simply submit the tasks using Resource management - yeah, I'd like ultimately to have something integrated into funflow for this, but until we do so then using a state model seems appropriate. #125 might be useful for you here. CWL has a resource requirement specification (https://www.commonwl.org/v1.0/CommandLineTool.html#ResourceRequirement), but it's not exactly comprehensive. There might be a better spec out there, though I haven't done much searching. I'm a little sceptical about dhall configs - every time I've used dhall for something, I've found it a difficult fit. In particular, where the underlying language is less typed, one has to reflect the validation at the dhall level as well, plus keep adapting the dhall spec to the underlying language. That having been said, in specific cases, maybe this would be a reasonable thing to do! |
Thanks! That first part makes sense. Having an executor per cluster seems like a good approach. Especially if they can use a common set of resources as in point 2. Do you have any code for the LSF case I could look at? I think LSF is quite similar to slurm, one of the two schedulers I'm most interested in getting working. I had a quick look at that issue, not sure I immediately understand the consequences, but I'll puzzle over it a bit. I think its suggesting that I can arbitrarily stack applicatives on top of my arrows without munging the internal effect. If so I think that is exactly what I want. I'll check out the CWL resource spec and see if it has most of the stuff I think I need. And for the dhall configs, I'm not imagining rendering the dhall back to JSON for use with CWL. I'm imagining a dhall alternative to the command-line tool spec used in CWL. These dhall configs could be used to maintain defaults and generate CLIs for running whole pipelines. Although I now see this is a totally separate issue, other than that I think the structure of the resource spec and command line tool spec should be similar. Thanks so much for your help and for this amazing library! |
@cfhammill I'm rewriting my scientific workflow manager based on funflow. Currently I've implemented a DRMAA coordinator (compatible with slurm or sge) that can submit native haskell codes to remote compute nodes. If you are interested, you can take a look at this dev branch: https://github.com/kaizhang/SciFlow/tree/v1.0. Here is an example application: https://github.com/kaizhang/SciFlow/blob/v1.0/tests/socket.hs |
This is really cool @kaizhang. I hadn't thought to use drmaa. That might solve a big chunk of what we're hoping to accomplish. Right now we're mostly concerned with wrapping external programs and running them on the cluster, but I can definitely imagine wanting to run native code in the future |
@kaizhang, do you think your coordinator code could be PRed into funflow? Seems like it would be good to have with the rest of the coordinators. |
In funflow the coordinator is used to distribute external tasks. But my coordinator can distribute arbitrary Haskell codes. Because the purpose is different, the interfaces of the two coordinators are quite different. So this cannot be merged into funflow without significant changes to the coordinator interface. But I would like to share my design and let tweag folks decide whether this should be merged into funflow: Because flows are free arrows, I can write multiple interpreters easily for the same type of workflow. I wrote two interpreters -- one for the frontend (coordinator) and the other for the compute nodes (executor). On the frontend, the coordinator spawns new workers and assign steps to workers (using DRMAA). On the compute nodes, the executor queries the job assignments from the coordinator through the socket. mainWith :: (MonadMask m, MonadIO m, MonadBaseControl IO m)
=> RunMode
-> FilePath
-> DrmaaConfig
-> SciFlow m a b
-> a
-> m b
mainWith runMode p config wf input = do
dir <- liftIO $ makeAbsolute p >>= parseAbsDir
res <- CS.withStore dir $ \store -> case runMode of
Master -> withDrmaa config $ \d ->
runCoordinator d store wf input -- On master node
Slave -> withConnection config $ \conn ->
runSciFlow conn store wf input -- On compute nodes I think such design patterns can be used for other types of distributed computing as well (not limited to DRMAA). And it solves the problem of distributing native Haskell codes. |
I'm thinking can we combine funflow and Cloud Haskell together to distribute the workflow? |
I was somewhat thinking the same thing. Could unify the external/jobs interfaces. |
@cfhammill In case you are still interested, I've developed a prototype that uses Cloud Haskell to distribute workflows. This is an example code:
To use the DRMAA backend, just replace "Control.Workflow.Coordinator.Local" with "Control.Workflow.Coordinator.DRMAA". This is currently just a proof-of-concept. |
I'm interested in running funflow pipelines shipping external jobs to a cluster scheduler e.g. torque/slurm.
I was hoping to get some ideas on how to do this. I'm happy to write code and contribute it back to funflow if it doesn't exist yet.
mapA
would multiply the requirements of memory/cores/nodes.Maybe some of these can be fragmented out as separate issues, let me know what I can do.
The text was updated successfully, but these errors were encountered: