You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+81-7Lines changed: 81 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
-
# Image denoising using a Spark cluster
2
-
This project aims to be a parallel and distributed implementation of the Gibbs denoising algorithm. Later, more algorithms for image processing based on the convolution method were added. Other implementations can be added extending the <code>Algorithm</code> trait and providing a pipeline for that.
1
+
# Image processing using a Spark cluster
2
+
This project aims to be a parallel and distributed implementation of the Gibbs denoiser algorithm. Later, more algorithms for image processing based on the convolution method were added. Other implementations can be added extending the <code>Algorithm</code> trait and providing a pipeline for that.
3
3
4
4
The Gibbs sampling algorithm details are showed in the following [paper](http://stanford.edu/class/ee367/Winter2018/yue_ee367_win18_report.pdf).
5
5
@@ -16,12 +16,32 @@ By deafult these pipelines are implemented:
16
16
17
17
In the last stage al the processed sub-images are collected and the image is returned.
18
18
19
-
Example of a pipeline:
20
-

19
+
#### Example of a pipeline:
20
+

21
21
22
22
### Make your own tasks and pipelines
23
23
24
-
24
+
You can implement your own image processing tasks by extending the <code>Algorithm</code> trait and implementing the <code>run</code> method. It takes an unprocessed image matrix and outputs the processed one.
If the program crashes on the collect phase, especially on big images, it is due to insufficient memory on the spark driver. You can change the driver memory settings using the param <code>--driver-memory</code>.
51
72
52
-
53
73
### Google Cloud setup
74
+
For convenience all gcloud commands (Google Cloud SDK) are written in a Colab notebook that is in the root of this repo. Nothing prohibits you from taking them separately and running them on other machines.
75
+
76
+
To run the program on the Google Cloud platform you have to create a new project and enable the following services:
77
+
***Dataproc**
78
+
***Cloud Storage**
79
+
80
+
And if you haven't done it yet you have to enable billing for the project.
81
+
82
+
The first step is to do the setup of the notebook environment variables. You will be asked to enable access to your google drive.
83
+

84
+
Then you need to compile the file that is created in the root of your google drive with your project id and the name you want to give the bucket.
85
+
86
+
#### Simple job
87
+
To run a simple cluster with 2 workers (8 core) execute the cell *Simple cluster*. You can change the commands parameters to meet your needs.
88
+
89
+

90
+
91
+
#### Delete resources
92
+
To delete all resources allocated by the cluster and also all the bucket content, you can run the cell *Delete cluster and data*.
93
+

94
+
### Performance tests
95
+
There are also tests in the notebook to evaluate the performance of the cluster.
96
+
#### Strong scalability
97
+
In this test we want to see how the execution time changes by keeping the workload fixed and increasing the number of computational resources.
0 commit comments