Skip to content

Commit 6dab0e1

Browse files
committed
Update docs
1 parent 728d7f1 commit 6dab0e1

File tree

9 files changed

+81
-11
lines changed

9 files changed

+81
-11
lines changed

README.md

Lines changed: 81 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# Image denoising using a Spark cluster
2-
This project aims to be a parallel and distributed implementation of the Gibbs denoising algorithm. Later, more algorithms for image processing based on the convolution method were added. Other implementations can be added extending the <code>Algorithm</code> trait and providing a pipeline for that.
1+
# Image processing using a Spark cluster
2+
This project aims to be a parallel and distributed implementation of the Gibbs denoiser algorithm. Later, more algorithms for image processing based on the convolution method were added. Other implementations can be added extending the <code>Algorithm</code> trait and providing a pipeline for that.
33

44
The Gibbs sampling algorithm details are showed in the following [paper](http://stanford.edu/class/ee367/Winter2018/yue_ee367_win18_report.pdf).
55

@@ -16,12 +16,32 @@ By deafult these pipelines are implemented:
1616

1717
In the last stage al the processed sub-images are collected and the image is returned.
1818

19-
Example of a pipeline:
20-
![alt text](./docs/pipeline.png)
19+
#### Example of a pipeline:
20+
![example of a pipeline](./docs/pipeline.png)
2121

2222
### Make your own tasks and pipelines
2323

24-
24+
You can implement your own image processing tasks by extending the <code>Algorithm</code> trait and implementing the <code>run</code> method. It takes an unprocessed image matrix and outputs the processed one.
25+
```scala
26+
object YourOwnPipelineAlgorithm extends Algorithm {
27+
28+
/**
29+
* Main method, takes a matrix and return a matrix
30+
*
31+
* @param imageMatrix unprocessed image matrix
32+
* @return processed image matrix
33+
*/
34+
def run (imageMatrix :DenseMatrix[Double]): DenseMatrix[Double] {...}
35+
}
36+
```
37+
If you want to create your custom pipeline you can extend the <code>Pipeline</code> class and then call the constructor with a list of all the tasks.
38+
```scala
39+
object YourOwnPipeline extends Pipeline (
40+
List(
41+
YourOwnTask,
42+
YourOwnTask
43+
)) {}
44+
```
2545

2646
### Run on a local Spark cluster
2747
The program takes some parameters as input:
@@ -43,14 +63,68 @@ Software requirements:
4363
* sbt >= 1.6.1
4464
* Spark >= 3.2.1
4565

66+
#### Shell commands:
4667
```bash
4768
|> sbt assembly
4869
|> spark-submit --driver-memory 8g --master local[*] ./jar/binary.jar ./data/nike_noisy.png
4970
```
5071
If the program crashes on the collect phase, especially on big images, it is due to insufficient memory on the spark driver. You can change the driver memory settings using the param <code>--driver-memory</code>.
5172

52-
5373
### Google Cloud setup
74+
For convenience all gcloud commands (Google Cloud SDK) are written in a Colab notebook that is in the root of this repo. Nothing prohibits you from taking them separately and running them on other machines.
75+
76+
To run the program on the Google Cloud platform you have to create a new project and enable the following services:
77+
* **Dataproc**
78+
* **Cloud Storage**
79+
80+
And if you haven't done it yet you have to enable billing for the project.
81+
82+
The first step is to do the setup of the notebook environment variables. You will be asked to enable access to your google drive.
83+
![create env file](./docs/env.png)
84+
Then you need to compile the file that is created in the root of your google drive with your project id and the name you want to give the bucket.
85+
86+
#### Simple job
87+
To run a simple cluster with 2 workers (8 core) execute the cell *Simple cluster*. You can change the commands parameters to meet your needs.
88+
89+
![simple cluster](./docs/simple.png)
90+
91+
#### Delete resources
92+
To delete all resources allocated by the cluster and also all the bucket content, you can run the cell *Delete cluster and data*.
93+
![delete resources](./docs/delete.png)
94+
### Performance tests
95+
There are also tests in the notebook to evaluate the performance of the cluster.
96+
#### Strong scalability
97+
In this test we want to see how the execution time changes by keeping the workload fixed and increasing the number of computational resources.
98+
99+
Cluster setup:
100+
* Master machine N1, 16GB RAM, 4 cores, SSD boot drive
101+
* Worker machine N1, 8GB RAM, 4 cores
102+
103+
Runs:
104+
* 1 worker, 4 cores, 8k test image
105+
* 2 workers, 8 cores, 8k test image
106+
* 3 workers, 12 cores, 8k test image
107+
* 4 workers, 16 cores, 8k test image
108+
* 5 workers, 20 cores, 8k test image
109+
110+
![strong scalability chart](./docs/strong.png)
111+
As you can see by doubling the number of cores the execution time is reduced by about half each time.
112+
113+
#### Weak scalability
114+
In this test we want to see how the system reacts by doubling the workload and doubling the available computational resources.
115+
116+
Cluster setup:
117+
* Master machine N1, 16GB RAM, 4 cores, SSD boot drive
118+
* Worker machine N1, 8GB RAM, 4 cores
119+
120+
Runs:
121+
* 1 worker, 2 cores, 2x 2k image
122+
* 1 worker, 4 cores, 4k image
123+
* 2 workers, 8 cores, 2x 4k image
124+
* 4 workers, 16 cores, 8k image
54125

126+
![strong scalability chart](./docs/weak.png)
127+
As you can see the execution is always around 4 minutes.
55128

56-
### Web interface
129+
### Web interface
130+
There is also a web interface that you can use by creating a cluster and running the *Web interface* cell.

docs/delete.png

10.6 KB
Loading

docs/env.png

24.7 KB
Loading

docs/setup.png

8.35 KB
Loading

docs/simple.png

764 KB
Loading

docs/strong.png

9.5 KB
Loading

docs/weak.png

11.4 KB
Loading

docs/weak_res.png

10.7 KB
Loading

web/public/index.html

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,6 @@
1111
<ul>
1212
<li><strong>Gibbs image deonoiser on a Spark cluster</strong></li>
1313
</ul>
14-
<ul>
15-
<li><a href="#" role="button">Single image</a></li>
16-
<li><a href="#" role="button">Benchmark</a></li>
17-
</ul>
1814
</nav>
1915
</header>
2016
<main class="container">

0 commit comments

Comments
 (0)