-
Notifications
You must be signed in to change notification settings - Fork 9
Data Parallel Programming
Previously, we learned about task-parallel programming: A form of parallelization that distributes execution processes across computing nodes.
We know how to express parallel programs with task and parallel constructs.
Next, we learn about the data-parallel programming: A form of parallelization that distributes data across computing nodes.
Synchronous vs. Asynchronous: http://stackoverflow.com/questions/748175/asynchronous-vs-synchronous-execution-what-does-it-really-mean
Data parallelism vs. Task parallelism: https://en.wikipedia.org/wiki/Data_parallelism#Data_parallelism_vs._task_parallelism
The simplest form of data-parallel programming is the parallel for
loop.
The method takes in an array and an int, and writes the int to every array entry in parallel. All iterations of the loop are executed concurrently with each other.
def initializeArray(xs: Array[Int])(v: Int): Unit = {
for (i <- (0 until xs.length).par) { // <- notice the .par
xs(i) = v
}
}
The parallel for loop is not functional – it can only affect the program through side-effects. As long as iterations of the parallel loop write to separate memory locations, the program is correct. This is valid for our example.
Mandelbrot Set is a set of complex numbers in the plane for which the sequence:
Zn+1 = Z2n + c
does not approach infinity.
Demo Summary:
- task-parallel implementation – the slowest.
- parallel for loop using scala-parallel-collections - intermediate.
- parallel for loop using experimental data parallel scheduler – about 2× faster.
Different data-parallel programs have different workloads.
Workload is a function that maps each input element to the amount of work required to process it:
- Uniform Workload: Defined by a constant function:
w(i) = const
. (Easy to parallelize) - left image. - Irregular Workload: Defined by an arbitrary function:
w(i) = f(i)
- right image.
The goal of the data parallel scheduler is to efficiently balance the workload across processors without necessarily having any knowledge about w(i). Thanks to the scheduler, the task of balancing the workload is shifted away from the programmer. This is one of the advantages of data parallel programming.
Week 1
- Introduction to Parallel Computing
- Parallelism on the JVM
- Running Computations in Parallel
- Monte Carlo Method to Estimate Pi
- First Class Tasks
- How fast are parallel programs?
- Benchmarking Parallel Programs
Week 2
- Parallel Sorting
- Data Operations
- Parallel map()
- Parallel fold()
- Associativity I
- Associativity II
- Parallel Scan (Prefix Sum) Operation
Week 3
- Data Parallel Programming
- Data Parallel Operations
- Scala Parallel Collections
- Splitters and Combiners
Week 4