1
1
---
2
2
title : " Multiprocessing with Pathos"
3
- teaching : 40
4
- exercises : 20
3
+ teaching : 30
4
+ exercises : 30
5
5
questions :
6
6
- " How can I distribute subtasks across multiple cores or nodes?"
7
+ objectives :
8
+ - " Be able to distribute work to multiple cores using parallel pools"
9
+ - " Know where to look to extend this to multiple nodes"
10
+ - " Understand how to break up more monolithic tasks into subtasks that can be parallelised"
11
+ keypoints :
12
+ - " The `ParallelPool` in Pathos can spread work to multiple cores."
13
+ - " Look closely at loops and other large operations in your program to identify where there is no data dependency, and so parallelism can be added"
14
+ - " Pyina has functions that extend the parallel map to multiple nodes"
7
15
---
8
16
9
17
So far, we have distributed processes to multiple cores and nodes
@@ -83,7 +91,7 @@ def run_in_parallel(betas, ms, h, iteration_count):
83
91
iteration_counts = [iteration_count] * len(betas)
84
92
85
93
# Generate a list of filenames to output results to
86
- filenames = [f'b{beta}_m{m}.dat'
94
+ filenames = [f'mc_data/ b{beta}_m{m}.dat'
87
95
for beta, m in zip(betas, ms)]
88
96
89
97
# Create a parallel pool to run the functions
@@ -102,10 +110,242 @@ if __name__ == '__main__':
102
110
~~~
103
111
{: .language-python}
104
112
113
+ Save this as ` mc_pathos.py ` , and run it via:
114
+
115
+ ~~~
116
+ $ mkdir mc_data # Make sure that the directory to hold the output data is created
117
+ $ python mc_pathos.py
118
+ ~~~
119
+ {: .language-bash}
120
+
105
121
This will run the Monte Carlo simulation for the seven parameter sets
106
122
$(\beta=0.5, m=0.5)$, $(\beta=0.5, m=1.0)$, $(\beta=1.0, m=0.5)$,
107
123
$(\beta=1.0, m=1.0)$, $(\beta=1.0, m=2.0)$, $(\beta=2.0, m=1.0)$,
108
124
$(\beta=2.0, m=2.0)$. Each uses the same values of ` h ` and
109
125
` iteration_count ` , and the filename in each case is decided
110
126
programmatically.
111
127
128
+
129
+ ## Parameter scans
130
+
131
+ We've just used 10 cores to perform 7 independent computations. Even
132
+ if the computations all take the same length of time, three cores are
133
+ sitting idle, so we are 70% efficient at best. If we have seven jobs
134
+ that are long enough that we want to use HPC for them, then we'd be
135
+ better off using seven independent SLURM jobs; Pathos isn't useful
136
+ there. This kind of workflow comes into its own when you have hundreds
137
+ or thousands of different values to scan through. But hardcoding lists
138
+ of hundreds or thousands of elements is tedious, especially when there
139
+ will be lots of repeated values.
140
+
141
+ Quite often what we'd really like to do is take a set of possible
142
+ values for each variable, and then scan over all possible
143
+ combinations. With two variables this allows a heatmap or contour plot
144
+ to be produced, and with three a volumetric representation could be
145
+ generated.
146
+
147
+ The first tool in our arsenal is the ` product ` function from the
148
+ ` itertools ` module. This generates tuples containing all possible
149
+ combinations of the lists it is given as an argument. Let's check this
150
+ at a Python interpreter:
151
+
152
+ ~~~
153
+ $ python
154
+ ~~~
155
+ {: .language-bash}
156
+
157
+ ~~~
158
+ >>> from itertools import product
159
+ >>> numbers = [1, 2, 3, 4, 5]
160
+ >>> letters = ['a', 'b', 'c', 'd', 'e']
161
+ >>> list(product(numbers, letters))
162
+ ~~~
163
+ {: .language-python}
164
+
165
+ ~~~
166
+ [(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e'), (2, 'a'), (2, 'b'),
167
+ (2, 'c'), (2, 'd'), (2, 'e'), (3, 'a'), (3, 'b'), (3, 'c'), (3, 'd'),
168
+ (3, 'e'), (4, 'a'), (4, 'b'), (4, 'c'), (4, 'd'), (4, 'e'), (5, 'a'),
169
+ (5, 'b'), (5, 'c'), (5, 'd'), (5, 'e')]
170
+ ~~~
171
+ {: .output}
172
+
173
+ We need to use the ` list ` function here to create a list that we can
174
+ read, as by default ` product ` produces a generator. (Great for looping
175
+ through, and good for passing to other functions, but not useful for
176
+ seeing the results on screen.)
177
+
178
+ Looking closely, this isn't quite what we want yet. We need all the
179
+ first elements to be in one list, and all the second elements to be in
180
+ another (and so on). Fortunately, there's another built-in function
181
+ that can help us here.
182
+
183
+ ` zip ` is a function that is most often used to take a few lists of
184
+ many elements, and return one long list, of tuples of few elements.
185
+ But if instead we give it many small tuples, we'll get back a short
186
+ list of long tuples&mdash ; one containing the first elements, one
187
+ containng the second elements, and so on.
188
+
189
+ The only extra ingredient we need to make this work is to be able to
190
+ pass in the elements of the result of ` product ` , rather than the
191
+ iterable as a single argument. This is done with the ` * ` , which when
192
+ placed before a list (or other iterable) means "expand this iterable
193
+ into multiple arguments". Putting these together:
194
+
195
+ ~~~
196
+ >>> list(zip(*product(numbers, letters)))
197
+ ~~~
198
+ {: .language-python}
199
+
200
+ ~~~
201
+ [(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5,
202
+ 5, 5), ('a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b',
203
+ 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e')]
204
+ ~~~
205
+ {: .output}
206
+
207
+ We now have two tuples of the form that we want to pass in to
208
+ ` map ` . All that remains is to extract them from the list that they're
209
+ in. Just like before, this is done with ` * ` .
210
+
211
+ Adapting the previous example to scan $\beta$, $m$, and $h$:
212
+
213
+ ~~~
214
+ from pathos.pools import ParallelPool
215
+ from itertools import product
216
+ from mc import run_mc
217
+
218
+ def run_in_parallel(betas, ms, hs, iteration_count):
219
+ # Check that lists are all the same length
220
+ assert len(betas) == len(ms)
221
+ assert len(betas) == len(hs)
222
+
223
+ # Generate a list of the same value of iteration_count
224
+ iteration_counts = [iteration_count] * len(betas)
225
+
226
+ # Generate a list of filenames to output results to
227
+ filenames = [f'mc_data/b{beta}_m{m}_h{h}.dat'
228
+ for beta, m, h in zip(betas, ms, hs)]
229
+
230
+ # Create a parallel pool to run the functions
231
+ pool = ParallelPool(nodes=10)
232
+
233
+ # Run the work
234
+ pool.map(run_mc, ms, hs, betas, iteration_counts, filenames)
235
+
236
+ if __name__ == '__main__':
237
+ betas = [0.1, 0.2, 0.3, 0.5, 1.0, 2.0, 3.0, 5.0, 10.0]
238
+ ms = [0.1, 0.2, 0.3, 0.5, 1.0, 2.0, 3.0, 5.0, 10.0]
239
+ hs = [0.25, 0.5, 1.0, 2.5]
240
+ run_in_parallel(
241
+ *zip(*product(betas, ms, hs)),
242
+ 1000
243
+ )
244
+ ~~~
245
+ {: .language-python}
246
+
247
+ Saving this as ` mc_pathos_scan.py ` and running it will now generate
248
+ 324 output files in the ` mc_data ` directory.
249
+
250
+ > ## Processing file lists
251
+ >
252
+ > Not all the workloads of this kind will be parameter scans; some
253
+ > might be processing large numbers of files instead. Revisit the
254
+ > Fourier example from the section on GNU Parallel, and try converting
255
+ > it to work with Pathos instead.
256
+ >
257
+ > You might want to do this in the following steps:
258
+ >
259
+ > 1 . Convert the program to be a function, separating out the
260
+ > behaviour that relates to ` argparse ` from the behaviour that is
261
+ > the desired computational behaviour.
262
+ > 2 . Create a second file, as we have done here, to call this new
263
+ > function using Pathos. For now, hardcode a list of images to
264
+ > process.
265
+ > 3 . Now, adjust the program to read the image list from a file, but
266
+ > hardcode the filename.
267
+ > 4 . Finally, adjust the program to read the image list filename from
268
+ > the commandline, e.g. using ` argparse ` .
269
+ {: .challenge}
270
+
271
+ > ## Parameters to scan on the command line
272
+ >
273
+ > In general, we don't want to hard-code our parameters, even if there
274
+ > are only a few and we are using ` itertools.product ` to generate the
275
+ > longer lists.
276
+ >
277
+ > Adapt the Monte Carlo example above so that it can read the ` betas ` ,
278
+ > ` ms ` , and ` hs ` lists from the command-line. (The ` action="append" `
279
+ > keyword argument to ` add_argument ` will be useful here; the
280
+ > ` argparse ` documentation will tell you more.)
281
+ {: .challenge}
282
+
283
+
284
+ <!-- - REMOVING THIS SECTION UNTIL IT IS MORE PERFORMANT
285
+
286
+ ## Chunking
287
+
288
+ Sometimes, we have a large problem that is taking too long, and we
289
+ want to parallelise it so that it will finish faster. Unlike the above
290
+ cases, it's not immediately obvious how one could split it into subtasks
291
+ that can be run independently.
292
+
293
+ Take for example the multi-dimensional array-broadcasting example that
294
+ we followed through in the Numpy episode. This is just performing a
295
+ single operation on a large array—surely that doesn't decompose
296
+ into many small tasks?
297
+
298
+
299
+ --->
300
+
301
+
302
+ > ## Surprisingly parallelisable tasks
303
+ >
304
+ > Sometimes you need to think more carefully about how a task can be
305
+ > parallelised. For example, looking back to the example of
306
+ > calculating $\pi$ from the previous section, on the surface we are
307
+ > only performing a single computation, that returns a single
308
+ > number. However, in fact each Monte Carlo sample is independent, so
309
+ > could be generated on a separate processor.
310
+ >
311
+ > Try adapting one of the solutions to the $\pi$ example in the Numpy
312
+ > episode to run multiple streams in parallel. Each stream will want
313
+ > to report back the count of points inside the circle; totalising and
314
+ > calculating the value of $\pi$ then needs to happen in serial.
315
+ {: .challenge}
316
+
317
+
318
+ ## Multiple nodes
319
+
320
+ While it is possible to start processes on more than one node using
321
+ the Pathos library directly, this is easier to do using Pyina, which
322
+ is another part of the Pathos framework.
323
+
324
+ It can be used very similarly to the Pathos library, by creating a
325
+ process pool and then using a map function across that pool. The
326
+ difference is that Pyina will interact with Slurm to correctly
327
+ position tasks on each node.
328
+
329
+ A toy example:
330
+
331
+ ~~~
332
+ from pyina.ez_map import ez_map
333
+
334
+ # an example function
335
+ def host(id):
336
+ import socket
337
+ return "Rank: %d -- %s" % (id, socket.gethostname())
338
+
339
+ # launch the parallel map of the target function
340
+ results = ez_map(host, range(100), nodes = 10)
341
+ for result in results:
342
+ print(result)
343
+ ~~~
344
+ {: .language-python}
345
+
346
+ Pyina makes use of the Message Passing Interface (MPI) to communicate
347
+ between nodes and launch instances of Python; "rank" in this context
348
+ is an MPI term referring to the index of the process, which is used
349
+ for bookkeeping. A full study of what MPI can do (in Python or
350
+ otherwise) is beyond the scope of this lesson!
351
+
0 commit comments