Skip to content

Commit 728d7f1

Browse files
committed
Add doc
1 parent 1156176 commit 728d7f1

File tree

4 files changed

+60
-0
lines changed

4 files changed

+60
-0
lines changed

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Image denoising using a Spark cluster
2+
This project aims to be a parallel and distributed implementation of the Gibbs denoising algorithm. Later, more algorithms for image processing based on the convolution method were added. Other implementations can be added extending the <code>Algorithm</code> trait and providing a pipeline for that.
3+
4+
The Gibbs sampling algorithm details are showed in the following [paper](http://stanford.edu/class/ee367/Winter2018/yue_ee367_win18_report.pdf).
5+
6+
### Processing pipeline
7+
The program get the image that you want to process and split it into smaller chunks. Each sub-image, and its corresponding position, is pushed into a RDD and then processed by a Spark worker.
8+
9+
Multiple operations can be performed on the same chunk, these can be implemented exenting the <code>Pipeline</code> class and setting all the tasks needed. Tasks can also be implemented by extending the <code>Algorithm</code> trait.
10+
11+
By deafult these pipelines are implemented:
12+
- **GibbsDenoiser**, applies a Gibbs denoiser task
13+
- **MedianDenoiser**, applies two median filters
14+
- **GibbsEdgeDetection**, applies a Gibbs denoiser task and the an edge detection kernel
15+
- **EdgeDetection**, aplies a kernel to detect edges and then inverts th image
16+
17+
In the last stage al the processed sub-images are collected and the image is returned.
18+
19+
Example of a pipeline:
20+
![alt text](./docs/pipeline.png)
21+
22+
### Make your own tasks and pipelines
23+
24+
25+
26+
### Run on a local Spark cluster
27+
The program takes some parameters as input:
28+
```bash
29+
Usage: [--debug] [--padding] [--sub_matrix_size] [--pipeline] [--denoiser_runs] [--output_file_json] [--output_file_image] input_file_image
30+
```
31+
32+
|Param|Description|
33+
|-|-|
34+
|--debug|Enable debug prints (enable by default = 1 ) |
35+
|--padding|How many padding pixels to use when splitting the image|
36+
|--sub_matrix_size|Size of sub-matrixes|
37+
|--pipeline|Set the pipeline type|
38+
|--denoiser_runs|Gibbs denoiser runs (only for pipelines that use this)|
39+
|--output_file_json| Path to report file|
40+
|--output_file_image|Path to output image|
41+
42+
Software requirements:
43+
* sbt >= 1.6.1
44+
* Spark >= 3.2.1
45+
46+
```bash
47+
|> sbt assembly
48+
|> spark-submit --driver-memory 8g --master local[*] ./jar/binary.jar ./data/nike_noisy.png
49+
```
50+
If the program crashes on the collect phase, especially on big images, it is due to insufficient memory on the spark driver. You can change the driver memory settings using the param <code>--driver-memory</code>.
51+
52+
53+
### Google Cloud setup
54+
55+
56+
### Web interface

docs/pipeline.png

361 KB
Loading

src/main/scala/Pipelines/EdgeDetection.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ package Pipelines
33
import _root_.Algorithms.{Convolution, Invert}
44
import breeze.linalg.DenseMatrix
55

6+
/**
7+
* Applies a kernel to detect edges and then inverts th image
8+
*/
69
object EdgeDetection extends Pipeline (
710
List(
811
new Convolution( DenseMatrix((-1.0, -1.0, -1.0), (-1.0, 8.0, -1.0), (-1.0, -1.0, -1.0))),

src/main/scala/SparkJob.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import scala.collection.parallel.immutable.ParSeq
1717
import org.apache.spark.HashPartitioner
1818
import org.apache.spark.storage.StorageLevel
1919

20+
2021
class SparkJob(val padding: Int = 3,
2122
val subHeight: Int = 100,
2223
val subWidth: Int = 100,

0 commit comments

Comments
 (0)