You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this [research project - [private repo]](https://github.com/floswald/FMig.jl/pull/17/commits/119b34bbf621374be187a023399d74a5e16c3934) we encountered a situation very similar to the above.
451
+
452
+
448
453
## 3. Distributed Computing
449
454
450
455
The [manual](https://docs.julialang.org/en/v1/manual/distributed-computing/) is again very helpful here.
@@ -470,5 +475,199 @@ So, by default we have process number 1 (which is the one where we type stuff in
470
475
471
476
Distributed programming in Julia is built on two primitives: remote references and remote calls. A remote reference is an object that can be used from any process to refer to an object stored on a particular process. A remote call is a request by one process to call a certain function on certain arguments on another (possibly the same) process.
472
477
473
-
Remote references come in two flavors: Future and RemoteChannel.
478
+
In the manual you can see a low level API which allows you to directly call a function on a remote worker, but that's most of the time not what you want. We'll concentrate on the higher-level API here. One big issue here is:
479
+
480
+
### Code and Data Availability
481
+
482
+
We must ensure that the code we want to execute is available on the process that runs the computation. That sounds fairly obvious. But now try to do this. First we define a new function on the REPL, and we call it on the master, as usual:
483
+
484
+
485
+
```julia
486
+
julia>functionnew_rand(dims...)
487
+
return3*rand(dims...)
488
+
end
489
+
new_rand (generic function with 1 method)
490
+
491
+
julia>new_rand(2,2)
492
+
2×2 Matrix{Float64}:
493
+
0.4073472.23388
494
+
1.299140.985813
495
+
```
496
+
497
+
Now, we want to `spawn` running of that function on `any` available process, and we immediately `fetch` it to trigger execution:
498
+
499
+
```
500
+
julia> fetch(@spawnat :any new_rand(2,2))
501
+
ERROR: On worker 3:
502
+
UndefVarError: `#new_rand` not defined
503
+
Stacktrace:
504
+
```
505
+
506
+
It seems that worker 3, where the job was sent with `@spawnat`, does not know about our function `new_rand`. 🧐
507
+
508
+
Probably the best approach to this is to define your functions inside a module, as we already discussed. This way, you will find it easy to share code and data across worker processes. Let's define this module in a file in the current directory. let's call it `DummyModule.jl`:
509
+
510
+
```julia
511
+
module DummyModule
512
+
513
+
export MyType, new_rand
514
+
515
+
mutable struct MyType
516
+
a::Int
517
+
end
518
+
519
+
functionnew_rand(dims...)
520
+
return3*rand(dims...)
521
+
end
522
+
523
+
println("loaded") # just to check
524
+
525
+
end
526
+
```
527
+
528
+
Restart julia with `-p 2`. Now, to load this module an all processes, we use the `@everywhere` macro. In short, we have this situation:
529
+
530
+
531
+
532
+
```julia
533
+
floswald@PTL11077~/compecon> ls
534
+
DummyModule.jl
535
+
floswald@PTL11077~/compecon> julia -p 2
536
+
537
+
julia>@everywhereinclude("DummyModule.jl")
538
+
loaded # message from process 1
539
+
From worker 2: loaded # message from process 2
540
+
From worker 3: loaded # message from process 3
541
+
542
+
julia>
543
+
544
+
```
545
+
546
+
Now in order to use the code, we need to bring it into scope with `using`. Here is the master process:
547
+
548
+
```julia
549
+
julia>using.DummyModule # . for locally defined package
550
+
551
+
julia>MyType(9)
552
+
MyType(9)
553
+
554
+
julia>fetch(@spawnat2MyType(9))
555
+
ERROR: On worker 2:
556
+
UndefVarError:`MyType` not defined
557
+
Stacktrace:
558
+
559
+
julia>fetch(@spawnat2 DummyModule.MyType(7))
560
+
Main.DummyModule.MyType(7)
561
+
```
562
+
563
+
Also, we can execute a function on a worker. Notice, `remotecall_fetch` is like `fetch(remotecall(...))`, but more efficient:
* define all required data within the `module` you load on the workers, such that each of them has access to all required data. This may not be feasible if you require huge amounts of input data.
583
+
* Example: [private repo again]
584
+
585
+
### Example Setup Real World Project
586
+
587
+
Suppose we have the following structure on an HPC cluster.
588
+
589
+
590
+
```bash
591
+
592
+
floswald@PTL11077 ~/.j/d/LandUse (rev2)> tree -L 1
593
+
.
594
+
├── Manifest.toml
595
+
├── Project.toml
596
+
├── slurm_runner.run
597
+
├── run.jl
598
+
├── src
599
+
├── test
600
+
601
+
```
602
+
603
+
with this content for the file `run.jl`:
604
+
605
+
606
+
```julia
607
+
using Distributed
608
+
609
+
println("some welcome message from master")
610
+
611
+
# add 10 processes from running master
612
+
# notice that we start them in the same project environment!
613
+
addprocs(10, exeflags ="--project=.")
614
+
615
+
# make sure all packages are available everywhere
616
+
@everywhereusing Pkg
617
+
@everywhere Pkg.instantiate()
618
+
619
+
# load code for our application
620
+
@everywhereusing LandUse
621
+
622
+
623
+
LandUse.bigtask(some_arg1 =10, some_arg2 =4)
624
+
```
625
+
626
+
The corresponding submit script for the HPC scheduler (SLURM in this case) would then just call this file:
627
+
628
+
```bash
629
+
#!/bin/bash
630
+
#SBATCH --job-name=landuse
631
+
#SBATCH --output=est.out
632
+
#SBATCH --error=est.err
633
+
#SBATCH --partition=ncpulong
634
+
#SBATCH --nodes=1
635
+
#SBATCH --cpus-per-task=11 # same number as we addproc'ed + 1
636
+
#SBATCH --mem-per-cpu=4G # memory per cpu-core
637
+
638
+
julia --project=. run.jl
639
+
```
640
+
641
+
642
+
### JuliaHub
643
+
644
+
The best alternative out there IMHO is juliahub. Instead of `run.jl`, you'd have this instead:
645
+
646
+
```julia
647
+
using Distributed
648
+
using JSON3
649
+
using CSV
650
+
using DelimitedFiles
651
+
652
+
@everywhereusing bk
653
+
654
+
results = bk.bigjob() # runs a parallel map over workers with `pmap` or similar.
655
+
656
+
# bigjob writes plots and other stuff to path_results
657
+
658
+
# oputput path
659
+
path_results ="$(@__DIR__)/outputs"
660
+
mkpath(path_results)
661
+
ENV["RESULTS_FILE"] = path_results
662
+
663
+
# write text results to JSON (numbers etc)
664
+
open("results.json", "w") do io
665
+
JSON3.pretty(io, results)
666
+
end
667
+
668
+
ENV["RESULTS"] = JSON3.write(results)
669
+
```
670
+
671
+
### Parallel Map and Loops
474
672
673
+
This is most relevant use case in most of our applications.
0 commit comments