Skip to content

Commit f60721c

Browse files
committed
Update doc
1 parent 6dab0e1 commit f60721c

File tree

2 files changed

+12
-3
lines changed

2 files changed

+12
-3
lines changed

README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This project aims to be a parallel and distributed implementation of the Gibbs d
44
The Gibbs sampling algorithm details are showed in the following [paper](http://stanford.edu/class/ee367/Winter2018/yue_ee367_win18_report.pdf).
55

66
### Processing pipeline
7-
The program get the image that you want to process and split it into smaller chunks. Each sub-image, and its corresponding position, is pushed into a RDD and then processed by a Spark worker.
7+
The program get the image that you want to process and split it into smaller chunks. Each sub-image, and its corresponding position, is pushed into a RDD (Resilient Distributed Dataset) and then processed by a Spark worker.
88

99
Multiple operations can be performed on the same chunk, these can be implemented exenting the <code>Pipeline</code> class and setting all the tasks needed. Tasks can also be implemented by extending the <code>Algorithm</code> trait.
1010

@@ -79,9 +79,11 @@ To run the program on the Google Cloud platform you have to create a new project
7979

8080
And if you haven't done it yet you have to enable billing for the project.
8181

82-
The first step is to do the setup of the notebook environment variables. You will be asked to enable access to your google drive.
82+
The first step is to do the setup of the notebook environment variables. You will be asked to enable access to your Google Drive and Google credentials.
8383
![create env file](./docs/env.png)
84-
Then you need to compile the file that is created in the root of your google drive with your project id and the name you want to give the bucket.
84+
Then you need to fill the file that is created in the root of your google drive with your project id and the name you want to give the bucket.
85+
86+
Finally run all the cells in *Setup environment*, this will create a new bucket and all the the *./data* files are copied into it. It also create a new directory (*./bucket*) on the Colab runtime that is directly binded to the cloud storage bucket.
8587

8688
#### Simple job
8789
To run a simple cluster with 2 workers (8 core) execute the cell *Simple cluster*. You can change the commands parameters to meet your needs.
@@ -90,12 +92,15 @@ To run a simple cluster with 2 workers (8 core) execute the cell *Simple cluster
9092

9193
#### Delete resources
9294
To delete all resources allocated by the cluster and also all the bucket content, you can run the cell *Delete cluster and data*.
95+
9396
![delete resources](./docs/delete.png)
9497
### Performance tests
9598
There are also tests in the notebook to evaluate the performance of the cluster.
9699
#### Strong scalability
97100
In this test we want to see how the execution time changes by keeping the workload fixed and increasing the number of computational resources.
98101

102+
**REMEMBER TO DO THIS** to avoid paying for what you are not using.
103+
99104
Cluster setup:
100105
* Master machine N1, 16GB RAM, 4 cores, SSD boot drive
101106
* Worker machine N1, 8GB RAM, 4 cores
@@ -128,3 +133,7 @@ As you can see the execution is always around 4 minutes.
128133

129134
### Web interface
130135
There is also a web interface that you can use by creating a cluster and running the *Web interface* cell.
136+
137+
![strong scalability chart](./docs/web.png)
138+
139+
Remember to **DELETE THE CLUSTER** when you have done.

docs/web.png

381 KB
Loading

0 commit comments

Comments
 (0)