This is an easy-to-use Map Reduce Go framework inspired by 2021 6.824 lab1.
- Multiple workers goroutine in a program on a single machine.
- Multiple workers process in separate program on a single machine.
- Fault tolerance.
- Easy to parallel your code with just Map and Reduce function.
Here's a simply example for word count program. wc.go
package main
import (
"strconv"
"strings"
"unicode"
"github.com/BWbwchen/MapReduce/worker"
)
func Map(filename string, contents string, ctx worker.MrContext) {
// function to detect word separators.
ff := func(r rune) bool { return !unicode.IsLetter(r) }
// split contents into an array of words.
words := strings.FieldsFunc(contents, ff)
for _, w := range words {
ctx.EmitIntermediate(w, "1")
}
}
func Reduce(key string, values []string, ctx worker.MrContext) {
// return the number of occurrences of this word.
ctx.Emit(key, strconv.Itoa(len(values)))
}
main.go
package main
import (
mp "github.com/BWbwchen/MapReduce"
)
func main() {
mp.StartSingleMachineJob(mp.ParseArg())
}
Run with :
# Compile plugin
go build -race -buildmode=plugin -o wc.so wc.go
# Word count
go run -race main.go -i 'input/files' -p 'wc.so' -r 1 -w 8
Output file name is mr-out-0.txt
More example can be found in the mrapps/
folder, and we will add more example in the future.
master.go
package main
import (
mp "github.com/BWbwchen/MapReduce"
)
func main() {
mp.StartMaster(mp.ParseArg())
}
worker.go
package main
import (
mp "github.com/BWbwchen/MapReduce"
)
func main() {
mp.StartWorker(mp.ParseArg())
}
Run with :
# Compile plugin
go build -race -buildmode=plugin -o wc.so wc.go
# Word count
go run -race cmd/master.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 8 &
sleep 1
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 1 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 2 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 3 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 4 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 5 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 6 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 7 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 8
MapReudce is an easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1.
It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Usage:
mapreduce [flags]
Flags:
-h, --help help for mapreduce
-m, --inRAM Whether write the intermediate file in RAM (default true)
-i, --input strings Input files
-p, --plugin string Plugin .so file
--port int Port number (default 10000)
-r, --reduce int Number of Reducers (default 1)
-w, --worker int Number of Workers(for master node)
ID of worker(for worker node) (default 4)
Pull requests are always welcome!
Made by Bo-Wei Chen. All code is licensed under the MIT License.