fix all smaller issues discovered on first run-through

edbennett · edbennett · commit ad90f5194c2a · 2019-06-27T10:13:57.000+01:00
diff --git a/_episodes/01-getting-started.md b/_episodes/01-getting-started.md
@@ -6,6 +6,9 @@ questions:
 - "How do I run Python on Supercomputing Wales?"
 - "How do I install packages and other Python software on
 Supercomputing Wales?"
+keypoints:
+- "Use `module load anaconda/2019.03` and `source activate` to get started with Anaconda on Sunbird"
+- "Create new conda environments with `conda create` when the `base` Anaconda set of packages doesn't meet your needs"
 ---
 
 Python is one of the most popular programming languages currently
@@ -228,8 +231,22 @@ root permissions.
 
 > ## More packages for today
 >
-> We'll also need the Matplotlib package for some of today's examples
-> Decide whether to install it via Conda or Pip, and install it.
+> We'll also need the IPython, Matplotlib, Numba, and Pillow packages 
+> for some of today's examples.
+> For each of these, decide whether to install it via Conda or Pip,
+> and install it.
+>
+> > ## Solution
+> >
+> > IPython, Matplotlib, Numba, and Pillow are all common packages, and all 
+> > are included in the base Anaconda distribution. They can be
+> > installed with
+> >
+> > ~~~
+> > $ conda install ipython matplotlib numba pillow
+> > ~~~
+> > {: .language-bash}
+> {: .solution}
 {: .challenge}
 
 > ## An environment for your research
diff --git a/_episodes/02-design-constraints.md b/_episodes/02-design-constraints.md
@@ -133,10 +133,10 @@ depending on the processor and data type, then 4, 8, or even 16
 iterations of this loop can happen at a single time.
 
 Compare the execution of a loop element-by-element sequentially:
-![An illustration of a loop happening sequentially](/fig/non-vector.svg)
+![An illustration of a loop happening sequentially](../fig/non-vector.svg)
 
 with the execution of a vectorised loop:
-![An illustration of a vectorised loop](/fig/vector.svg)
+![An illustration of a vectorised loop](../fig/vector.svg)
 
 The vectorised loop can be up to N times faster, where N is the number
 of elements that fit into the processor's vector units.
diff --git a/_episodes/04-gnu-parallel.md b/_episodes/04-gnu-parallel.md
@@ -64,7 +64,7 @@ Parallel to be able to control.
 > will need to convert it to running as an independent Python script
 > in order to use GNU Parallel and command-line arguments. For more
 > information on this, see the
-> 
+> [Command-line arguments](http://swcarpentry.github.io/python-novice-inflammation/10-cmdline/)
 > episode of the Software Carpentry Python lesson.
 {: .callout}
 
@@ -81,7 +81,7 @@ see what it does:
 ~~~
 $ python fourier_orig.py
 ~~~
-{: .language-python}
+{: .language-bash}
 
 This should take a few seconds to run; use `ls -lrt` to see the most
 recently created files in the directory once it finishes to see what
@@ -268,7 +268,8 @@ carries out.
 With this done, we can test that the program still works, by running:
 
 ~~~
-$ python fourier_new.py --fourier_restricted_output=fourier_restricted.pdf \
+$ python fourier_new.py einstein1_7.jpg \
+      --fourier_restricted_output=fourier_restricted.pdf \
       --noise_isolation_output=noise_isolation.pdf \
       --phase_contrast_output=phase_contrast.pdf
 ~~~
diff --git a/_episodes/05-profiling.md b/_episodes/05-profiling.md
@@ -247,8 +247,11 @@ $ python -m cProfile -o mc.prof mc.py 0 1.0 0.1 1000000 mc.dat
 This will create a file called `mc.prof`, containing the profiling
 data. Now, since displaying graphics from the cluster on your own
 machine isn't always easy, instead we'll copy the profile to our local
-machine to view. This can be done with FileZilla, or at a Bash prompt
-with the command:
+machine to view. This can be done with FileZilla, or at a Bash prompt.
+
+
+To do this at the shell, open a new terminal (running on your own
+machine), and run the command:
 
 ~~~
 $ # This runs on your computer, not on the supercomputer
@@ -259,6 +262,7 @@ $ scp s.your.username@sunbird.swansea.ac.uk:hpp-examples/mc.prof ~/Desktop/
 Now we can install SnakeViz and visualise the profile:
 
 ~~~
+$ # This should also happen on your own computer
 $ pip install snakeviz
 $ snakeviz ~/Desktop/mc.prof
 ~~~
@@ -432,7 +436,9 @@ $ python -m timeit --setup 'import mc' 'mc.metropolis(1.0, 0.1, 1.0)'
 >
 > You can use `timeit` within a Jupyter notebook to test the
 > performance of code you are writing there, too. In a new cell, use
-> `%timeit` followed by the function or expresion you want to time.
+> `%timeit` followed by the function or expresion you want to time,
+> and use `%%timeit` at the top of a cell to time the execution of the
+> entire cell.
 >
 > If you have Jupyter installed on your machine, open a new notebook
 > and try this now for the list comprehension and loop
diff --git a/_episodes/06-numpy-scipy.md b/_episodes/06-numpy-scipy.md
@@ -361,6 +361,8 @@ express any other way without using explicit `for` loops.
 > square, and counting the number that lie within the unit circle, we
 > can find an estimate for $\frac{\pi}{4}$, and by extension, $\pi$.
 >
+> ![A diagram illustrating the description above.](../fig/pi_dartboard.svg)
+>
 > A plain Python function that would achieve this might look as
 > follows:
 >
diff --git a/_episodes/08-pathos.md b/_episodes/08-pathos.md
@@ -27,6 +27,14 @@ tasks across more than one CPU. Pathos is a tool that extends this
 to work across multiple nodes, and provides other convenience
 improvements over Python's built-in tools.
 
+To start with, we need to install Pathos. 
+Pathos isn't installed as part of the standard Anaconda distribution;
+it can be installed from the `conda-forge` channel though.
+
+~~~
+$ conda install pathos
+~~~
+{: .language-bash}
 
 ## A toy example
 
@@ -67,7 +75,7 @@ that we have written ourselves.
 
 Before we can ask Pathos to run our code in parallel, we need to
 structure it in a way that Pathos can do this easily. This is
-a similar process process to the one  we used for accepting command-line
+a similar process to the one  we used for accepting command-line
 arguments; the difference is that now instead of using the `argparse`
 module and declaring the arguments that way, we declare a function
 that accepts the arguments in question. (It's good practice to do this
@@ -239,14 +247,37 @@ if __name__ == '__main__':
     hs = [0.25, 0.5, 1.0, 2.5]
     run_in_parallel(
         *zip(*product(betas, ms, hs)),
-        1000
+        10000
     )
 ~~~
 {: .language-python}
 
 Saving this as `mc_pathos_scan.py` and running it will now generate
 324 output files in the `mc_data` directory.
 
+> ## Verifying that it is parallel
+>
+> When Slurm is running a job on a particular node, it will let us SSH
+> directly to that note to check its behaviour. We can use this to
+> verify that we are running in parallel as we expect.
+>
+> Open a new terminal, and SSH to Sunbird again. Use `squeue -u $USER`
+> to find out what node your job is running on&mdash;this is the
+> right-most column. Then SSH into that node, using `ssh scsXXXX`,
+> where `XXXX` is replaced with the node number you got from `squeue`.
+>
+> Once running on the node, you can use the `top` command to get a
+> list of the processes using the most CPU resource, updating every
+> second. If your program is parallelising properly, you should see
+> multiple `python` processes, all consuming somewhere near 100%
+> CPU. (The percentages refer to a single CPU core rather than to the
+> available CPU in the machine as a whole.)
+>
+> If your job (or interactive allocation) ends while you're SSHed into
+> the node, then the SSH session will be killed by the system
+> automatically.
+{: .callout}
+
 > ## Processing file lists
 >
 > Not all the workloads of this kind will be parameter scans; some
@@ -321,6 +352,19 @@ While it is possible to start processes on more than one node using
 the Pathos library directly, this is easier to do using Pyina, which
 is another part of the Pathos framework.
 
+Since Pyina depends on MPI, it's not available via Conda (as the MPI
+installation will change from machine to machine). To install Pyina,
+we first need to choose an MPI library, and then install via Pip. On
+Sunbird, the first step can be done by loading the appropriate module.
+
+~~~
+$ # Get the latest version of the Intel MPI library
+$ module load mpi/intel/2019/4
+$ # Now install Pyina using this MPI library
+$ pip install pyina
+~~~
+{: .language-bash}
+
 It can be used very similarly to the Pathos library, by creating a
 process pool and then using a map function across that pool. The
 difference is that Pyina will interact with Slurm to correctly
diff --git a/_episodes/10-numba.md b/_episodes/10-numba.md
@@ -235,6 +235,7 @@ def numpy_trig(a, b):
 
 ~~~
 $ python -m timeit --setup='import numpy as np; \
+    import trig; \
     a = np.random.random((1000, 1000)); \
 	b = np.random.random((1000, 1000))' \
 	'trig.vec_trig(a, b)'
diff --git a/fig/pi_dartboard.svg b/fig/pi_dartboard.svg

Original file line number	Diff line number	Diff line change
@@ -361,6 +361,8 @@ express any other way without using explicit `for` loops.
`361`	`361`	`> square, and counting the number that lie within the unit circle, we`
`362`	`362`	`> can find an estimate for $\frac{\pi}{4}$, and by extension, $\pi$.`
`363`	`363`	`>`
	`364`	`+> ![A diagram illustrating the description above.](../fig/pi_dartboard.svg)`
	`365`	`+>`
`364`	`366`	`> A plain Python function that would achieve this might look as`
`365`	`367`	`> follows:`
`366`	`368`	`>`