-
Notifications
You must be signed in to change notification settings - Fork 9
Data Operations
Rohit edited this page Jan 25, 2017
·
1 revision
Parallel processing of collections is important. It is one of the main applications of parallelism today. We examine conditions when this can be done:
- properties of collections: ability to split, combine
- properties of operations: associativity, independence
Operations on collections are key to functional programming
-
map: apply function to each element
List(1,3,8).map(x => x*x) == List(1, 9, 64)
-
fold: combine elements with a given operation
List(1,3,8).fold(100)((s,x) => s + x) == 112
-
scan: combine folds of all list prefixes
List(1,3,8).scan(100)((s,x) => s + x) == List(100, 101, 104, 112)
These operations are even more important for parallel than sequential collections: they encapsulate more complex algorithms.
We use List
to specify the results of operations. Lists are not good for parallel implementations because we cannot
efficiently:
- split them in half (need to search for the middle)
- combine them (concatenation needs linear time)
We use these alternatives for now:
- arrays: imperative (recall array sum)
- trees: can be implemented functionally
Subsequent lectures examine Scala’s parallel collection libraries
- includes many more data structures, implemented efficiently
Week 1
- Introduction to Parallel Computing
- Parallelism on the JVM
- Running Computations in Parallel
- Monte Carlo Method to Estimate Pi
- First Class Tasks
- How fast are parallel programs?
- Benchmarking Parallel Programs
Week 2
- Parallel Sorting
- Data Operations
- Parallel map()
- Parallel fold()
- Associativity I
- Associativity II
- Parallel Scan (Prefix Sum) Operation
Week 3
- Data Parallel Programming
- Data Parallel Operations
- Scala Parallel Collections
- Splitters and Combiners
Week 4