You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 16, 2023. It is now read-only.
If `true`, detailed monitoring is enabled for all cluster nodes\. This enables 1 minute monitoring in the Amazon EC2 console\. The default value is `false`\.
24
-
[Update policy: If this setting is changed, the update is not allowed.](using-pcluster-update-cluster-v3.md#update-policy-fail-v3)
25
-
26
21
`Logs`\(**Optional**\)
27
22
The log settings for the cluster\.
28
23
[Update policy: If this setting is changed, the update is not allowed.](using-pcluster-update-cluster-v3.md#update-policy-fail-v3)
Copy file name to clipboardExpand all lines: doc_source/Scheduling-v3.md
+8-3Lines changed: 8 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -410,12 +410,12 @@ For example, suppose you define subnet\-1 and subnet\-2 for your queue\.
410
410
`subnet-1` can be in AZ\-1 and `subnet-2` can be in AZ\-2\.
411
411
If you configure only one instance type and want to use multiple subnets, define your instance type in `Instances` rather than `InstanceType`\.
412
412
For example, define `ComputeResources` / `Instances` / `InstanceType`=`instance.type` instead of `ComputeResources` / `InstanceType`=`instance.type`\.
413
+
Elastic Fabric Adapter \(EFA\) isn't supported over different availability zones\.
413
414
The use of multiple Availability Zones might cause increases in storage networking latency and added inter\-AZ data transfer costs\. For example, this could occur when an instance accesses file storage that's located in a different AZ\. For more information, see [Data Transfer within the same AWS Region](https://aws.amazon.com/ec2/pricing/on-demand/#Data_Transfer_within_the_same_AWS_Region)\.
414
415
415
416
**Cluster updates to change from the use of a single subnet to multiple subnets:**
416
417
+ Suppose the subnet definition of a cluster is defined with a single subnet and an AWS ParallelCluster managed FSx for Lustre file system\. Then, you can't update this cluster with an updated subnet ID definition directly\. To make the cluster update, you must first change the managed file system to an external file system\. For more information, see [Convert AWS ParallelCluster managed storage to external storage](shared-storage-conversion-v3.md)\.
417
418
+ Suppose the subnet definition of a cluster is defined with a single subnet and an external Amazon EFS file system if EFS mount targets don't exist for all of the AZs for the multiple subnets defined to be added\. Then, you can't update this cluster with an updated subnet ID definition directly\. To make the cluster update or to create a cluster, you must first create all of the mount targets for all of the AZs for the defined multiple subnets\.
418
-
419
419
420
420
**Availability Zones and cluster capacity reservations defined in [CapacityReservationResourceGroupArn](#yaml-Scheduling-SlurmQueues-CapacityReservationResourceGroupArn):**
421
421
+ You can't create a cluster if there is no overlap between the set of instance types and availability zones covered by the defined capacity reservation resource group and the set of instance types and availability zones defined for the queue\.
@@ -528,7 +528,7 @@ For example, if a custom AMI has an encrypted snapshot associated with it, the f
528
528
]
529
529
}
530
530
```
531
-
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3.md#troubleshooting-v3-custom-amis)\.
531
+
To troubleshoot custom AMI validation warnings, see [Troubleshooting custom AMI issues](troubleshooting-v3-custom-amis.md)\.
532
532
[Update policy: The compute fleet must be stopped or QueueUpdateStrategy must be set for this setting to be changed for an update.](using-pcluster-update-cluster-v3.md#update-policy-queue-update-strategy-v3)
@@ -582,6 +582,7 @@ For more information, see [Multiple instance type allocation with Slurm](slurm-m
582
582
`Instances`:
583
583
- `InstanceType`: string
584
584
```
585
+
`EnableMemoryBasedScheduling` can't be enabled if you configure multiple instance types in [Instances](#yaml-Scheduling-SlurmQueues-ComputeResources-Instances)\.
585
586
[Update policy: For this list values setting, a new value can be added during an update or the compute fleet must be stopped when removing an existing value.](using-pcluster-update-cluster-v3.md#update-policy-list-values-v3)
586
587
`InstanceType`\(**Required**, `String`\)
587
588
The instance type to use in this Slurm compute resource\. All of the instance types in a cluster must use the same processor architecture, either `x86_64` or `arm64`\.
@@ -592,7 +593,9 @@ The instance types listed in [`Instances`](#yaml-Scheduling-SlurmQueues-ComputeR
592
593
The instance types that are listed in [`Instances`](#yaml-Scheduling-SlurmQueues-ComputeResources-Instances) can have:
593
594
+ Different amount of memory\.
594
595
595
-
In this case, the minimum memory is to be set as a consumable Slurm resource\.[`EnableMemoryBasedScheduling`](#yaml-Scheduling-SlurmSettings-EnableMemoryBasedScheduling) can't be enabled for multiple instance types\.
596
+
In this case, the minimum memory is to be set as a consumable Slurm resource\.
597
+
598
+
If you specify multiple instance types, `EnableMemoryBasedScheduling` can't be enabled\.
596
599
+ Different network cards\.
597
600
598
601
In this case, the number of network interfaces configured for the compute resource is defined by the instance type with the smallest number of network cards\.
@@ -640,6 +643,7 @@ Efa:
640
643
`Enabled`\(**Optional**, `Boolean`\)
641
644
Specifies that Elastic Fabric Adapter \(EFA\) is enabled\. To view the list of EC2 instances that support EFA, see [Supported instance types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types) in the *Amazon EC2 User Guide for Linux Instances*\. For more information, see [Elastic Fabric Adapter](efa-v3.md)\. We recommend that you use a cluster [`SlurmQueues`](#Scheduling-v3-SlurmQueues) / [`Networking`](#Scheduling-v3-SlurmQueues-Networking) / [`PlacementGroup`](#yaml-Scheduling-SlurmQueues-Networking-PlacementGroup) to minimize latencies between instances\.
642
645
The default value is `false`\.
646
+
Elastic Fabric Adapter \(EFA\) isn't supported over different availability zones\. For more information, see [SubnetIds](#yaml-Scheduling-SlurmQueues-Networking-SubnetIds)\.
643
647
If you're defining a custom security group in [SecurityGroups](#yaml-Scheduling-SlurmQueues-Networking-SecurityGroups), make sure that your EFA\-enabled instances are members of a security group that allows all inbound and outbound traffic to itself\.
644
648
[Update policy: The compute fleet must be stopped or QueueUpdateStrategy must be set for this setting to be changed for an update.](using-pcluster-update-cluster-v3.md#update-policy-queue-update-strategy-v3)
645
649
`GdrSupport`\(**Optional**, `Boolean`\)
@@ -940,6 +944,7 @@ The default value is `false`\.
940
944
Enabling memory\-based scheduling impacts the way that the Slurm scheduler handles jobs and node allocation\.
941
945
For more information, see [Slurm memory\-based scheduling](slurm-mem-based-scheduling-v3.md)\.
942
946
`EnableMemoryBasedScheduling` is supported starting with AWS ParallelCluster version 3\.2\.0\.
947
+
`EnableMemoryBasedScheduling` can't be enabled if you configure multiple instance types in [Instances](#yaml-Scheduling-SlurmQueues-ComputeResources-Instances)\.
943
948
[Update policy: The compute fleet must be stopped for this setting to be changed for an update.](using-pcluster-update-cluster-v3.md#update-policy-compute-fleet-v3)
where `<REGION>` is the AWS Region where the API needs to be deployed to and `<VERSION>` is the AWS ParallelCluster version \(e\.g\. 3\.4\.1\)\.
33
+
where `<REGION>` is the AWS Region where the API needs to be deployed to and `<VERSION>` is the AWS ParallelCluster version \(e\.g\. 3\.5\.0\)\.
34
34
35
35
The [Docker](https://aws.amazon.com/docker/) image used to deploy the AWS Lambda function implementing AWS ParallelCluster features is available at: [https://gallery\.ecr\.aws/parallelcluster/pcluster\-api](https://gallery.ecr.aws/parallelcluster/pcluster-api)
36
36
37
37
**Warning**
38
-
Any user in the AWS account, that has privileged access to AWS Lambda or Amazon API Gateway services, will automatically inherit permissions to administer AWS ParallelCluster API resources\.
38
+
Any user in the AWS account, that has privileged access to AWS Lambda or Amazon API Gateway services, automatically inherits permissions to administer AWS ParallelCluster API resources\.
39
39
40
40
## Deploy with AWS CLI<aname="api-reference-deploy-v3"></a>
41
41
@@ -50,7 +50,7 @@ Run the following commands to deploy the API
50
50
```
51
51
$ REGION=<region>
52
52
$ API_STACK_NAME=<stack-name> # This can be any name
53
-
$ VERSION=3.4.1
53
+
$ VERSION=3.5.0
54
54
$ aws cloudformation create-stack \
55
55
--region ${REGION} \
56
56
--stack-name ${API_STACK_NAME} \
@@ -96,7 +96,7 @@ The `ParallelClusterApiUserRole` has permission to invoke all AWS ParallelClus
96
96
```
97
97
$ REGION=<region>
98
98
$ API_STACK_NAME=<stack-name> # This needs to correspond to the existing API stack name
Copy file name to clipboardExpand all lines: doc_source/autoscaling.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Clusters deployed with AWS ParallelCluster are elastic in several ways\. Setting
12
12
13
13
## Scaling up<aname="scaling-up"></a>
14
14
15
-
Every minute, a process called [https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/jobwatcher](https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/jobwatcher) runs on the head node\. It evaluates the current number of instances required by the pending jobs in the queue\. If the total number of busy nodes and requested nodes is greater than the current desired value in the Auto Scaling group, it adds more instances\. If you submit more jobs, the queue is re\-evaluated and the Auto Scaling group is updated, up to the specified [`max_queue_size`](cluster-definition.md#configuration-max-queue-size)\.
15
+
Every minute, a process called [https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/jobwatcher](https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/jobwatcher) runs on the head node\. It evaluates the current number of instances required by the pending jobs in the queue\. If the total number of busy nodes and requested nodes is greater than the current desired value in the Auto Scaling group, it adds more instances\. If you submit more jobs, the queue is re\-evaluated and the Auto Scaling group is updated, up to the specified [`max_queue_size`](cluster-definition.md#configuration-max-queue-size)\.
16
16
17
17
With an SGE scheduler, each job requires a number of slots to run \(one slot corresponds to one processing unit, for example, a vCPU\)\. To evaluate the number of instances that are required to serve the currently pending jobs, the `jobwatcher` divides the total number of requested slots by the capacity of a single compute node\. The capacity of a compute node that corresponds to the number of available vCPUs depends on the Amazon EC2 instance type that's specified in the cluster configuration\.
18
18
@@ -27,7 +27,7 @@ In this example, the `jobwatcher` requires three new compute instances in the Au
27
27
28
28
## Scaling down<aname="scaling-down"></a>
29
29
30
-
On each compute node, a process called [https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/nodewatcher](https://github.com/aws/aws-parallelcluster-node/tree/release-2.11/src/nodewatcher) runs and evaluates the idle time of the node\. An instance is terminated when both of the following conditions are met:
30
+
On each compute node, a process called [https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/nodewatcher](https://github.com/aws/aws-parallelcluster-node/tree/v2.11.4/src/nodewatcher) runs and evaluates the idle time of the node\. An instance is terminated when both of the following conditions are met:
31
31
+ An instance has no jobs for a period of time longer than the [`scaledown_idletime`](scaling-section.md#scaledown-idletime)\(the default setting is 10 minutes\)
0 commit comments