Skip to content
This repository was archived by the owner on Jun 16, 2023. It is now read-only.

Commit 0f7574e

Browse files
author
Kari Stromsland
committed
Release 3.5 - 02/21/23-07:45am PST
1 parent 6234b4f commit 0f7574e

File tree

72 files changed

+1910
-1157
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+1910
-1157
lines changed

doc_source/HeadNode-v3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -448,5 +448,5 @@ For example, if a custom AMI has an encrypted snapshot associated with it, the f
448448
]
449449
}
450450
```
451-
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3.md#troubleshooting-v3-custom-amis)\.
451+
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3-custom-amis.md)\.
452452
[Update policy: If this setting is changed, the update is not allowed.](using-pcluster-update-cluster-v3.md#update-policy-fail-v3)

doc_source/Image-v3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,5 +40,5 @@ For example, if a custom AMI has an encrypted snapshot associated with it, the f
4040
]
4141
}
4242
```
43-
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3.md#troubleshooting-v3-custom-amis)\.
43+
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3-custom-amis.md)\.
4444
[Update policy: If this setting is changed, the update is not allowed.](using-pcluster-update-cluster-v3.md#update-policy-fail-v3)

doc_source/Monitoring-v3.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44

55
```
66
Monitoring:
7-
DetailedMonitoring: boolean
87
Logs:
98
CloudWatch:
109
Enabled: boolean
@@ -19,10 +18,6 @@ Monitoring:
1918

2019
## `Monitoring` properties<a name="Monitoring-v3.properties"></a>
2120

22-
`DetailedMonitoring` \(**Optional**, `Boolean`\)
23-
If `true`, detailed monitoring is enabled for all cluster nodes\. This enables 1 minute monitoring in the Amazon EC2 console\. The default value is `false`\.
24-
[Update policy: If this setting is changed, the update is not allowed.](using-pcluster-update-cluster-v3.md#update-policy-fail-v3)
25-
2621
`Logs` \(**Optional**\)
2722
The log settings for the cluster\.
2823
[Update policy: If this setting is changed, the update is not allowed.](using-pcluster-update-cluster-v3.md#update-policy-fail-v3)

doc_source/Scheduling-v3.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -410,12 +410,12 @@ For example, suppose you define subnet\-1 and subnet\-2 for your queue\.
410410
`subnet-1` can be in AZ\-1 and `subnet-2` can be in AZ\-2\.
411411
If you configure only one instance type and want to use multiple subnets, define your instance type in `Instances` rather than `InstanceType`\.
412412
For example, define `ComputeResources` / `Instances` / `InstanceType`=`instance.type` instead of `ComputeResources` / `InstanceType`=`instance.type`\.
413+
Elastic Fabric Adapter \(EFA\) isn't supported over different availability zones\.
413414
The use of multiple Availability Zones might cause increases in storage networking latency and added inter\-AZ data transfer costs\. For example, this could occur when an instance accesses file storage that's located in a different AZ\. For more information, see [Data Transfer within the same AWS Region](https://aws.amazon.com/ec2/pricing/on-demand/#Data_Transfer_within_the_same_AWS_Region)\.
414415

415416
**Cluster updates to change from the use of a single subnet to multiple subnets:**
416417
+ Suppose the subnet definition of a cluster is defined with a single subnet and an AWS ParallelCluster managed FSx for Lustre file system\. Then, you can't update this cluster with an updated subnet ID definition directly\. To make the cluster update, you must first change the managed file system to an external file system\. For more information, see [Convert AWS ParallelCluster managed storage to external storage](shared-storage-conversion-v3.md)\.
417418
+ Suppose the subnet definition of a cluster is defined with a single subnet and an external Amazon EFS file system if EFS mount targets don't exist for all of the AZs for the multiple subnets defined to be added\. Then, you can't update this cluster with an updated subnet ID definition directly\. To make the cluster update or to create a cluster, you must first create all of the mount targets for all of the AZs for the defined multiple subnets\.
418-
419419

420420
**Availability Zones and cluster capacity reservations defined in [CapacityReservationResourceGroupArn](#yaml-Scheduling-SlurmQueues-CapacityReservationResourceGroupArn):**
421421
+ You can't create a cluster if there is no overlap between the set of instance types and availability zones covered by the defined capacity reservation resource group and the set of instance types and availability zones defined for the queue\.
@@ -528,7 +528,7 @@ For example, if a custom AMI has an encrypted snapshot associated with it, the f
528528
]
529529
}
530530
```
531-
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3.md#troubleshooting-v3-custom-amis)\.
531+
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3-custom-amis.md)\.
532532
[Update policy: The compute fleet must be stopped or QueueUpdateStrategy must be set for this setting to be changed for an update.](using-pcluster-update-cluster-v3.md#update-policy-queue-update-strategy-v3)
533533

534534
#### `ComputeResources`<a name="Scheduling-v3-SlurmQueues-ComputeResources"></a>
@@ -582,6 +582,7 @@ For more information, see [Multiple instance type allocation with Slurm](slurm-m
582582
`Instances`:
583583
- `InstanceType`: string
584584
```
585+
`EnableMemoryBasedScheduling` can't be enabled if you configure multiple instance types in [Instances](#yaml-Scheduling-SlurmQueues-ComputeResources-Instances)\.
585586
[Update policy: For this list values setting, a new value can be added during an update or the compute fleet must be stopped when removing an existing value.](using-pcluster-update-cluster-v3.md#update-policy-list-values-v3)
586587
`InstanceType` \(**Required**, `String`\)
587588
The instance type to use in this Slurm compute resource\. All of the instance types in a cluster must use the same processor architecture, either `x86_64` or `arm64`\.
@@ -592,7 +593,9 @@ The instance types listed in [`Instances`](#yaml-Scheduling-SlurmQueues-ComputeR
592593
The instance types that are listed in [`Instances`](#yaml-Scheduling-SlurmQueues-ComputeResources-Instances) can have:
593594
+ Different amount of memory\.
594595

595-
In this case, the minimum memory is to be set as a consumable Slurm resource\. [`EnableMemoryBasedScheduling`](#yaml-Scheduling-SlurmSettings-EnableMemoryBasedScheduling) can't be enabled for multiple instance types\.
596+
In this case, the minimum memory is to be set as a consumable Slurm resource\.
597+
598+
If you specify multiple instance types, `EnableMemoryBasedScheduling` can't be enabled\.
596599
+ Different network cards\.
597600

598601
In this case, the number of network interfaces configured for the compute resource is defined by the instance type with the smallest number of network cards\.
@@ -640,6 +643,7 @@ Efa:
640643
`Enabled` \(**Optional**, `Boolean`\)
641644
Specifies that Elastic Fabric Adapter \(EFA\) is enabled\. To view the list of EC2 instances that support EFA, see [Supported instance types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types) in the *Amazon EC2 User Guide for Linux Instances*\. For more information, see [Elastic Fabric Adapter](efa-v3.md)\. We recommend that you use a cluster [`SlurmQueues`](#Scheduling-v3-SlurmQueues) / [`Networking`](#Scheduling-v3-SlurmQueues-Networking) / [`PlacementGroup`](#yaml-Scheduling-SlurmQueues-Networking-PlacementGroup) to minimize latencies between instances\.
642645
The default value is `false`\.
646+
Elastic Fabric Adapter \(EFA\) isn't supported over different availability zones\. For more information, see [SubnetIds](#yaml-Scheduling-SlurmQueues-Networking-SubnetIds)\.
643647
If you're defining a custom security group in [SecurityGroups](#yaml-Scheduling-SlurmQueues-Networking-SecurityGroups), make sure that your EFA\-enabled instances are members of a security group that allows all inbound and outbound traffic to itself\.
644648
[Update policy: The compute fleet must be stopped or QueueUpdateStrategy must be set for this setting to be changed for an update.](using-pcluster-update-cluster-v3.md#update-policy-queue-update-strategy-v3)
645649
`GdrSupport` \(**Optional**, `Boolean`\)
@@ -940,6 +944,7 @@ The default value is `false`\.
940944
Enabling memory\-based scheduling impacts the way that the Slurm scheduler handles jobs and node allocation\.
941945
For more information, see [Slurm memory\-based scheduling](slurm-mem-based-scheduling-v3.md)\.
942946
`EnableMemoryBasedScheduling` is supported starting with AWS ParallelCluster version 3\.2\.0\.
947+
`EnableMemoryBasedScheduling` can't be enabled if you configure multiple instance types in [Instances](#yaml-Scheduling-SlurmQueues-ComputeResources-Instances)\.
943948
[Update policy: The compute fleet must be stopped for this setting to be changed for an update.](using-pcluster-update-cluster-v3.md#update-policy-compute-fleet-v3)
944949

945950
### `Database`<a name="Scheduling-v3-SlurmSettings-Database"></a>

doc_source/api-reference-v3.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,12 @@ The template used to deploy the API is available at the following URL:
3030
https://<REGION>-aws-parallelcluster.s3.<REGION>.amazonaws.com/parallelcluster/<VERSION>/api/parallelcluster-api.yaml
3131
```
3232

33-
where `<REGION>` is the AWS Region where the API needs to be deployed to and `<VERSION>` is the AWS ParallelCluster version \(e\.g\. 3\.4\.1\)\.
33+
where `<REGION>` is the AWS Region where the API needs to be deployed to and `<VERSION>` is the AWS ParallelCluster version \(e\.g\. 3\.5\.0\)\.
3434

3535
The [Docker](https://aws.amazon.com/docker/) image used to deploy the AWS Lambda function implementing AWS ParallelCluster features is available at:  [https://gallery\.ecr\.aws/parallelcluster/pcluster\-api](https://gallery.ecr.aws/parallelcluster/pcluster-api)
3636

3737
**Warning**
38-
Any user in the AWS account, that has privileged access to AWS Lambda or Amazon API Gateway services, will automatically inherit permissions to administer AWS ParallelCluster API resources\.
38+
Any user in the AWS account, that has privileged access to AWS Lambda or Amazon API Gateway services, automatically inherits permissions to administer AWS ParallelCluster API resources\.
3939

4040
## Deploy with AWS CLI<a name="api-reference-deploy-v3"></a>
4141

@@ -50,7 +50,7 @@ Run the following commands to deploy the API
5050
```
5151
$ REGION=<region>
5252
$ API_STACK_NAME=<stack-name>  # This can be any name
53-
$ VERSION=3.4.1
53+
$ VERSION=3.5.0
5454
$ aws cloudformation create-stack \
5555
  --region ${REGION} \
5656
  --stack-name ${API_STACK_NAME} \
@@ -96,7 +96,7 @@ The `ParallelClusterApiUserRole` has permission to invoke all AWS ParallelClus
9696
```
9797
$ REGION=<region>
9898
$ API_STACK_NAME=<stack-name>  # This needs to correspond to the existing API stack name
99-
$ VERSION=3.4.1
99+
$ VERSION=3.5.0
100100
$ aws cloudformation update-stack \
101101
  --region ${REGION} \
102102
  --stack-name ${API_STACK_NAME} \

doc_source/autoscaling.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Clusters deployed with AWS ParallelCluster are elastic in several ways\. Setting
1212

1313
## Scaling up<a name="scaling-up"></a>
1414

15-
Every minute, a process called [https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/jobwatcher](https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/jobwatcher) runs on the head node\. It evaluates the current number of instances required by the pending jobs in the queue\. If the total number of busy nodes and requested nodes is greater than the current desired value in the Auto Scaling group, it adds more instances\. If you submit more jobs, the queue is re\-evaluated and the Auto Scaling group is updated, up to the specified [`max_queue_size`](cluster-definition.md#configuration-max-queue-size)\.
15+
Every minute, a process called [https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/jobwatcher](https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/jobwatcher) runs on the head node\. It evaluates the current number of instances required by the pending jobs in the queue\. If the total number of busy nodes and requested nodes is greater than the current desired value in the Auto Scaling group, it adds more instances\. If you submit more jobs, the queue is re\-evaluated and the Auto Scaling group is updated, up to the specified [`max_queue_size`](cluster-definition.md#configuration-max-queue-size)\.
1616

1717
With an SGE scheduler, each job requires a number of slots to run \(one slot corresponds to one processing unit, for example, a vCPU\)\. To evaluate the number of instances that are required to serve the currently pending jobs, the `jobwatcher` divides the total number of requested slots by the capacity of a single compute node\. The capacity of a compute node that corresponds to the number of available vCPUs depends on the Amazon EC2 instance type that's specified in the cluster configuration\.
1818

@@ -27,7 +27,7 @@ In this example, the `jobwatcher` requires three new compute instances in the Au
2727

2828
## Scaling down<a name="scaling-down"></a>
2929

30-
On each compute node, a process called [https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/nodewatcher](https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/nodewatcher) runs and evaluates the idle time of the node\. An instance is terminated when both of the following conditions are met:
30+
On each compute node, a process called [https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/nodewatcher](https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/nodewatcher) runs and evaluates the idle time of the node\. An instance is terminated when both of the following conditions are met:
3131
+ An instance has no jobs for a period of time longer than the [`scaledown_idletime`](scaling-section.md#scaledown-idletime) \(the default setting is 10 minutes\)
3232
+ There are no pending jobs in the cluster
3333

doc_source/building-custom-ami-v3.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ Steps:
5555
"cloudformationStackStatus": "CREATE_IN_PROGRESS",
5656
"cloudformationStackArn": "arn:aws:cloudformation:us-east-1:123456789012:stack/IMAGE_ID/abcd1234-ef56-gh78-ij90-1234abcd5678",
5757
"region": "us-east-1",
58-
"version": "3.4.1"
58+
"version": "3.5.0"
5959
}
6060
}
6161
```
@@ -78,18 +78,18 @@ Steps:
7878
# BEFORE COMPLETE
7979
{
8080
"imageConfiguration": {
81-
"url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.amazonaws.com/parallelcluster/3.4.1/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?...",
81+
"url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.amazonaws.com/parallelcluster/3.5.0/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?...",
8282
},
8383
"imageId": "IMAGE_ID",
8484
"imagebuilderImageStatus": "BUILDING",
8585
"imageBuildStatus": "BUILD_IN_PROGRESS",
8686
"cloudformationStackStatus": "CREATE_IN_PROGRESS",
8787
"cloudformationStackArn": "arn:aws:cloudformation:us-east-1:123456789012:stack/IMAGE_ID/abcd1234-ef56-gh78-ij90-1234abcd5678",
8888
"region": "us-east-1",
89-
"version": "3.4.1",
89+
"version": "3.5.0",
9090
"cloudformationStackTags": [
9191
{
92-
"value": "3.4.1",
92+
"value": "3.5.0",
9393
"key": "parallelcluster:version"
9494
},
9595
{
@@ -105,7 +105,7 @@ Steps:
105105
# AFTER COMPLETE
106106
{
107107
"imageConfiguration": {
108-
"url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.us-east-1.amazonaws.com/parallelcluster/3.4.1/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?Signature=..."
108+
"url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.us-east-1.amazonaws.com/parallelcluster/3.5.0/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?Signature=..."
109109
},
110110
"imageId": "IMAGE_ID",
111111
"imageBuildStatus": "BUILD_COMPLETE",
@@ -124,7 +124,7 @@ Steps:
124124
],
125125
"architecture": "x86_64"
126126
},
127-
"version": "3.4.1"
127+
"version": "3.5.0"
128128
}
129129
```
130130

@@ -146,10 +146,10 @@ After running the [`build-image`](pcluster.build-image-v3.md) command, it's poss
146146
$ pcluster get-image-stack-events --image-id IMAGE_ID --region REGION --query "events[0]"
147147
{
148148
"eventId": "ParallelClusterImage-CREATE_IN_PROGRESS-2022-04-05T21:39:24.725Z",
149-
"physicalResourceId": "arn:aws:imagebuilder:us-east-1:123456789012:image/parallelclusterimage-IMAGE_ID/3.4.1/1",
149+
"physicalResourceId": "arn:aws:imagebuilder:us-east-1:123456789012:image/parallelclusterimage-IMAGE_ID/3.5.0/1",
150150
"resourceStatus": "CREATE_IN_PROGRESS",
151151
"resourceStatusReason": "Resource creation Initiated",
152-
"resourceProperties": "{\"InfrastructureConfigurationArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:infrastructure-configuration/parallelclusterimage-abcd1234-ef56-gh78-ij90-1234abcd5678\",\"ImageRecipeArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:image-recipe/parallelclusterimage-IMAGE_ID/3.4.1\",\"DistributionConfigurationArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:distribution-configuration/parallelclusterimage-abcd1234-ef56-gh78-ij90-1234abcd5678\",\"Tags\":{\"parallelcluster:image_name\":\"IMAGE_ID\",\"parallelcluster:image_id\":\"IMAGE_ID\"}}",
152+
"resourceProperties": "{\"InfrastructureConfigurationArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:infrastructure-configuration/parallelclusterimage-abcd1234-ef56-gh78-ij90-1234abcd5678\",\"ImageRecipeArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:image-recipe/parallelclusterimage-IMAGE_ID/3.5.0\",\"DistributionConfigurationArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:distribution-configuration/parallelclusterimage-abcd1234-ef56-gh78-ij90-1234abcd5678\",\"Tags\":{\"parallelcluster:image_name\":\"IMAGE_ID\",\"parallelcluster:image_id\":\"IMAGE_ID\"}}",
153153
"stackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/IMAGE_ID/abcd1234-ef56-gh78-ij90-1234abcd5678",
154154
"stackName": "IMAGE_ID",
155155
"logicalResourceId": "ParallelClusterImage",
@@ -164,11 +164,11 @@ After about 15 minutes, the stack events appear in the log event entry related t
164164
$ pcluster list-image-log-streams --image-id IMAGE_ID --region REGION \
165165
--query 'logStreams[*].logStreamName'
166166
167-
"3.4.1/1"
167+
"3.5.0/1"
168168
]
169169
170170
$ pcluster get-image-log-events --image-id IMAGE_ID --region REGION \
171-
--log-stream-name 3.4.1/1 --limit 3
171+
--log-stream-name 3.5.0/1 --limit 3
172172
{
173173
"nextToken": "f/36295977202298886557255241372854078762600452615936671762",
174174
"prevToken": "b/36295977196879805474012299949460899222346900769983430672",
@@ -178,7 +178,7 @@ $ pcluster get-image-log-events --image-id IMAGE_ID --region REGION \
178178
"timestamp": "2022-04-05T22:13:26.633Z"
179179
},
180180
{
181-
"message": "Document arn:aws:imagebuilder:us-east-1:123456789012:component/parallelclusterimage-test-abcd1234-ef56-gh78-ij90-1234abcd5678/3.4.1/1",
181+
"message": "Document arn:aws:imagebuilder:us-east-1:123456789012:component/parallelclusterimage-test-abcd1234-ef56-gh78-ij90-1234abcd5678/3.5.0/1",
182182
"timestamp": "2022-04-05T22:13:26.741Z"
183183
},
184184
{
@@ -195,7 +195,7 @@ Continue to check with the [`describe-image`](pcluster.describe-image-v3.md) com
195195
$ pcluster describe-image --image-id IMAGE_ID --region REGION
196196
{
197197
"imageConfiguration": {
198-
"url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.us-east-1.amazonaws.com/parallelcluster/3.4.1/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?Signature=..."
198+
"url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.us-east-1.amazonaws.com/parallelcluster/3.5.0/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?Signature=..."
199199
},
200200
"imageId": "IMAGE_ID",
201201
"imageBuildStatus": "BUILD_COMPLETE",
@@ -214,7 +214,7 @@ $ pcluster describe-image --image-id IMAGE_ID --region REGION
214214
],
215215
"architecture": "x86_64"
216216
},
217-
"version": "3.4.1"
217+
"version": "3.5.0"
218218
}
219219
```
220220

0 commit comments

Comments
 (0)