ubccr · andreakyaw · May 19, 2025 · May 19, 2025 · May 22, 2025 · May 22, 2025
diff --git a/slurm/2_ApplicationSpecific/python/README.md b/slurm/2_ApplicationSpecific/python/README.md
@@ -1,6 +1,6 @@
 # Python on the CCR Clusters
 
-This directory includes examples of a serial Python job, with mutlithreaded and GPU examples coming soon.  Additional documentation about the use of Python at CCR can be found in the CCR's [Python documentation](https://docs.ccr.buffalo.edu/en/latest/howto/python/).  Users affiliated with the University at Buffalo can access an open enrollment self paced course about [Using Python at CCR](https://ublearns.buffalo.edu/d2l/le/discovery/view/course/288741) through UB Learns.  The pre-recorded video portions of the course are available to all users on [CCR's YouTube channel](https://youtube.com/@ubccr).
+This directory includes examples of a serial Python job, with multithreaded and GPU examples coming soon.  Additional documentation about the use of Python at CCR can be found in the CCR's [Python documentation](https://docs.ccr.buffalo.edu/en/latest/howto/python/).  Users affiliated with the University at Buffalo can access an open enrollment self paced course about [Using Python at CCR](https://ublearns.buffalo.edu/d2l/le/discovery/view/course/288741) through UB Learns.  The pre-recorded video portions of the course are available to all users on [CCR's YouTube channel](https://youtube.com/@ubccr).
 
 ## Serial Python job ([serial/](./serial))
 
@@ -12,3 +12,27 @@ To run the Python script, simply submit the job to the scheduler from a login no
 ```
 $ sbatch python-sp.sh
 ```
+
+# Parallel Python Tutorial
+Parallel processing is a technique that executes multiple tasks at the same time using multiple CPU cores. This directory includes examples of two ways to perform parallel processing in Python.
+
+## Multiprocessing ([fibonacci_joblib.py](./fibonacci_multiproc.py))
+There are numerous APIs available to run python code in parallel, each with their strengths and weaknesses. A common API for parallel python processing is called `multiprocessing`. This library is powerful, enabling deep functionality like interprocess communication. However, for this simple demo we will stick to a very basic example.
+
+The `fibonacci_multiproc.py` script demonstrates using a process pool to parallelized computations. The `with with Pool(n_jobs) as p:` line creates a pool of `n_jobs` number of processes which can then execute code in parallel. The `p.map(fib, my_values)` line then applies the `fib` function from the serial example to a list of integers called `my_values`. The `multiprocessing` library then handles all of the process management for you as computation is run in parallel. The `multiprocessing` API provides many tools to handle process management beyond this simple example, you can find more information on all of these functions in [Python's documentation](https://docs.python.org/3/library/multiprocessing.html). 
+
+Please note, we specify the number of parallel processes with the `n_jobs` variable in our `for` loop. The value you select for the number of parallel processes should match the number of CPUs or tasks you request for your job in order to see runtime improvements in your parallel processes as Python cannot run multiple processes on a single CPU. Furthermore, there is overhead when creating and managing each process, so arbitrarily increasing `n_jobs` may not always yield faster program runtimes.
+
+## Joblib ([fibonacci_joblib.py](./fibonacci_joblib.py))
+For tasks that are embarrassingly parallel or those using NumPy arrays, `joblib` can be a more efficient and convenient solution. Since our `multiprocessing` example above involves computing fibonacci numbers in separate processes without any dependencies across processes, this computation is considered **embarassingly parallel**.  Thus, we can use `joblib` to compute Fibonacci numbers in parallel.
+
+The following line in our `fibonacci_joblib.py` example script shows how to apply the function to compute fibonacci numbers across an array of input values:
+```results = Parallel(n_jobs=8)(delayed(fib)(n) for n in my_values)```
+
+In this case, we are applying the `fib` function to each value `n` in our `my_values` list. These computations will run in parallel across 8 total processes, specified by the `n_jobs` parameter for the parallel computation. Please note, in order to see runtime improvements across processes, you will need to make sure to request as many CPUs for your job as the number of processes you want to run. These can be requested using the slurm `ntasks_per_node` or `cpus_per_task` options, where `n_jobs = ntasks_per_node * cpus_per_task`.
+
+Our example slurm script only uses 8 CPUs, so you will not see any performance improvement as `n_jobs` increases beyond 8. Furthermore, increasing the amount of processes running in parallel may not improve runtime in all cases, as there is overhead to managing each additional process.
+
+For a more in depth discussion on `joblib`, please refer to its [documentation](https://joblib.readthedocs.io/en/stable/).
+
+As with the multiprocessing example above, the number of parallel processes (or `n_jobs` in the script) should match the number of CPUs or tasks you request in order to see any runtime improvements.
diff --git a/slurm/2_ApplicationSpecific/python/parallel/fibonacci_multiproc.py b/slurm/2_ApplicationSpecific/python/parallel/fibonacci_multiproc.py
@@ -0,0 +1,27 @@
+import time
+from multiprocessing import Pool
+
+def fib(n: int) -> int:
+    a, b = 0, 1
+    count = 0
+    while count < n:
+        a, b = b, a + b
+        count += 1
+    return a
+
+if __name__ == "__main__":
+
+    my_values = list(range(5000, 10000))
+
+    start = time.time()
+    serial_test = list(map(fib, my_values))
+    end = time.time()
+    print("Serial execution time:", end-start)
+
+    print("Parallel using joblib:")
+    for n_jobs in [1,2,4,8,16,32]:
+        start = time.time()
+        with Pool(n_jobs) as p:
+            p.map(fib, my_values)
+        end = time.time()
+        print(f"Joblib execution time with n_jobs = {n_jobs}: {end - start}")
diff --git a/slurm/2_ApplicationSpecific/python/parallel/multiproc_script.sh b/slurm/2_ApplicationSpecific/python/parallel/multiproc_script.sh
@@ -0,0 +1,17 @@
+#!/bin/bash -l
+#SBATCH --cluster=ub-hpc
+#SBATCH --partition=general-compute
+#SBATCH --qos=general-compute
+#SBATCH --time=1:00:00
+#SBATCH --nodes=1
+#SBATCH --ntasks-per-node=8
+#SBATCH --cpus-per-task=1
+#SBATCH --mem=16G
+#SBATCH --job-name="my-8cpu-job"
+#SBATCH --output=my-8cpu-job_%j.out
+#SBATCH --mail-user=[yourname]@buffalo.edu
+#SBATCH --mail-type=END
+
+module load gcc python
+python fibonacci_multiproc.py
+echo "Job ended at $(date)"