Skip to content

Commit 5c3ec4f

Browse files
authored
Merge pull request #167 from DistributedScience/aws_split
Add WORKSPACE_BUCKET
2 parents 0093fef + 8e47a04 commit 5c3ec4f

File tree

22 files changed

+654
-94
lines changed

22 files changed

+654
-94
lines changed

config.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,24 @@
44
LOG_GROUP_NAME = APP_NAME
55

66
# DOCKER REGISTRY INFORMATION:
7-
DOCKERHUB_TAG = 'cellprofiler/distributed-cellprofiler:2.0.0_4.1.3'
7+
DOCKERHUB_TAG = 'cellprofiler/distributed-cellprofiler:2.0.0_4.2.4'
88

99
# AWS GENERAL SETTINGS:
1010
AWS_REGION = 'us-east-1'
1111
AWS_PROFILE = 'default' # The same profile used by your AWS CLI installation
1212
SSH_KEY_NAME = 'your-key-file.pem' # Expected to be in ~/.ssh
1313
AWS_BUCKET = 'your-bucket-name' # Bucket to use for logging
14-
SOURCE_BUCKET = 'bucket-name' # Bucket to download files from
14+
SOURCE_BUCKET = 'bucket-name' # Bucket to download image files from
15+
WORKSPACE_BUCKET = 'bucket-name' # Bucket to download non-image files from
1516
DESTINATION_BUCKET = 'bucket-name' # Bucket to upload files to
1617
UPLOAD_FLAGS = '' # Any flags needed for upload to destination bucket
1718

1819
# EC2 AND ECS INFORMATION:
1920
ECS_CLUSTER = 'default'
2021
CLUSTER_MACHINES = 3
2122
TASKS_PER_MACHINE = 1
22-
MACHINE_TYPE = ['m4.xlarge']
23-
MACHINE_PRICE = 0.10
23+
MACHINE_TYPE = ['m5.xlarge']
24+
MACHINE_PRICE = 0.20
2425
EBS_VOL_SIZE = 30 # In GB. Minimum allowed is 22.
2526
DOWNLOAD_FILES = 'False'
2627

documentation/DCP-documentation/step_1_configuration.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,9 @@ For more information and examples, see [External Buckets](external_buckets.md).
2525

2626
* **AWS_BUCKET:** The bucket to which you would like to write log files.
2727
This is generally the bucket in the account in which you are running compute.
28-
* **SOURCE_BUCKET:** The bucket where the files you will be reading are.
28+
* **SOURCE_BUCKET:** The bucket where the image files you will be reading are.
29+
Often, this is the same as AWS_BUCKET.
30+
* **WORKSPACE:** The bucket where non-image files you will be reading are (e.g. pipeline, load_data.csv, etc.).
2931
Often, this is the same as AWS_BUCKET.
3032
* **DESTINATION_BUCKET:** The bucket where you want to write your output files.
3133
Often, this is the same as AWS_BUCKET.

example_project/README.md

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# Distributed-CellProfiler Minimal Example
2+
13
Included in this folder is all of the resources for running a complete mini-example of Distributed-Cellprofiler.
24
It includes 3 sample image sets and a CellProfiler pipeline that identifies cells within the images and makes measuremements.
35
It also includes the Distributed-CellProfiler files pre-configured to create a queue of all 3 jobs and spin up a spot fleet of 3 instances, each of which will process a single image set.
@@ -9,21 +11,23 @@ It also includes the Distributed-CellProfiler files pre-configured to create a q
911
Before running this mini-example, you will need to set up your AWS infrastructure as described in our [online documentation](https://distributedscience.github.io/Distributed-CellProfiler/step_0_prep.html).
1012
This includes creating the fleet file that you will use in Step 3.
1113

12-
Upload the 'sample_project' folder to the top level of your bucket.
14+
Upload the 'sample_project' folder to the top level of your bucket.
1315
While in the `Distributed-CellProfiler` folder, use the following command, replacing `yourbucket` with your bucket name:
1416

1517
```bash
1618
# Copy example files to S3
1719
BUCKET=yourbucket
18-
aws s3 sync example_project/project_folder s3://${BUCKET}/project_folder
20+
aws s3 sync example_project/demo_project_folder s3://${BUCKET}/demo_project_folder
1921

2022
# Replace the default config with the example config
2123
cp example_project/config.py config.py
2224
```
2325

2426
### Step 1
27+
2528
In config.py you will need to update the following fields specific to your AWS configuration:
26-
```
29+
30+
```python
2731
# AWS GENERAL SETTINGS:
2832
AWS_REGION = 'us-east-1'
2933
AWS_PROFILE = 'default' # The same profile used by your AWS CLI installation
@@ -32,17 +36,21 @@ AWS_BUCKET = 'your-bucket-name'
3236
SOURCE_BUCKET = 'your-bucket-name' # Only differs from AWS_BUCKET with advanced configuration
3337
DESTINATION_BUCKET = 'your-bucket-name' # Only differs from AWS_BUCKET with advanced configuration
3438
```
39+
3540
Then run `python3 run.py setup`
3641

3742
### Step 2
38-
This command points to the job file created for this demonstartion and should be run as-is.
43+
44+
This command points to the job file created for this demonstration and should be run as-is.
3945
`python3 run.py submitJob example_project/files/exampleJob.json`
4046

4147
### Step 3
48+
4249
This command should point to whatever fleet file you created in Step 0 so you may need to update the `exampleFleet.json` file name.
4350
`python3 run.py startCluster files/exampleFleet.json`
4451

4552
### Step 4
53+
4654
This command points to the monitor file that is automatically created with your run and should be run as-is.
4755
`python3 run.py monitor files/FlyExampleSpotFleetRequestId.json`
4856

@@ -51,4 +59,23 @@ This command points to the monitor file that is automatically created with your
5159
While the run is happening, you can watch real-time metrics in your Cloudwatch Dashboard by navigating in the [Cloudwatch console](https://console.aws.amazon.com/cloudwatch).
5260
Note that the metrics update at intervals that may not be helpful with this fast, minimal example.
5361

54-
After the run is done, you should see your CellProfiler output files in S3 at s3://${BUCKET}/project_folder/output in per-image folders.
62+
After the run is done, you should see your CellProfiler output files in S3 at s3://${BUCKET}/project_folder/output in per-image folders.
63+
64+
## Cleanup
65+
66+
The spot fleet, queue, and task definition will be automatically cleaned up after your demo is complete because you are running `monitor`.
67+
68+
To remove everything else:
69+
70+
```bash
71+
# Remove files added to S3 bucket
72+
BUCKET=yourbucket
73+
aws s3 rm --recursive s3://${BUCKET}/demo_project_folder
74+
75+
# Remove Cloudwatch logs
76+
aws logs delete-log-group --log-group-name FlyExample
77+
aws logs delete-log-group --log-group-name FlyExample_perInstance
78+
79+
# Delete DeadMessages queue
80+
aws sqs delete-queue --queue-url ExampleProject_DeadMessages
81+
```

0 commit comments

Comments
 (0)