Tutorial document for setting up AWS ParallelCluster on closed network environment. This feature is supported by AWS ParallelCluster 2.7.0 or later.
This tutorial includes these steps:
- Set up VPC and Subnet without internet access (IGW) by CloudFormation template or manually
- Launch ParallelCluster on the closed VPC
- testing the environment with Systems Manager Session Manager
Launch CloudFormation template below. It includes VPC, private subnet, required Private Endpoints for various services.
If you set UseSSM
to true
, template also set up PrivateLinks for Systems Manager Session Manager to test the cluster.
Info: In some AZ, they have missing PrivateLink service and failed to set up CloudFormation template. At that case, you could chose different AZ by setting AZ letter on SubnetAZLetter
(a/b/c/d etc..)
or with CLI
$ aws cloudformation create-stack --stack-name ClosedEnvironment --template-url https://midaisuk-public-templates.s3.amazonaws.com/parallelcluster-closednetwork/closed-vpc-privatelink.yml
this step is not required if you use CloudFormation template
You need to set up following components.
- VPC
- Private Subnet for the VPC
- Security Group for PrivateLinks
- Private Endpoints
- s3
- dynamodb
- logs
- cloudformation
- monitoring
- ec2
- sqs
- sns
- autoscaling
If you want to use Systems Manager Session Manager for testing the cluster, you also need to set up following PrivateLinks.
- ssmmessages
- ec2messages
- ssm
Launch ParallelCluster with config file for closed network condition.
- closed-network.config
[aws]
aws_region_name = <REGION>
[global]
update_check = true
sanity_check = true
cluster_template = closed
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
[cluster closed]
key_name = <KEY_NAME>
additional_iam_policies = arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
base_os = alinux2
scheduler = slurm
master_instance_type = c5.large
compute_instance_type = c5.xlarge
disable_hyperthreading = true
initial_queue_size = 0
max_queue_size = 10
vpc_settings = closed
[vpc closed]
vpc_id = <VPC_ID>
master_subnet_id = <SUBNET_ID>
use_public_ips = false
You should change <REGION>
, <KEY_NAME>
, <VPC_ID>
, and <SUBNET_ID>
.
You could find <VPC_ID>
, and <SUBNET_ID>
in the output of the cloudformation.
AmazonSSMManagedInstanceCore
is required for connecting Master node by using Session Manager.
$ pcluster create -c closed-network.config closed-cluster
Go to Systems Manager page on Management Console, and select Session Manager.
Select Start Session
and chose Master instance and start session.
After connecting the Master instance, you need to change user to submit jobs. Example input is shown in below.
$ sudo su ec2-user
$ cd
$ cat > job.sh
#!/bin/bash
hostname
$ sbatch job.sh
You need to have extra cost for PrivateLink.
https://aws.amazon.com/privatelink/pricing/
- Currently, Amazon Linux and Amazon Linux 2 could be used for closed network condition.
- On closed network condition, scale-out process seems to need few more minutes because of wating connection timeout of external repositories.
- You could restrict access to S3 bucket by setting up PolicyDocument on PrivateEndpoint for S3. But you need to allow following bucket.
- for ParallelCluster
arn:aws:s3:::${AWS::Region}-aws-parallelcluster/*
- for Amazon Linux
arn:aws:s3:::packages.${AWS::Region}.amazonaws.com/*
arn:aws:s3:::repo.${AWS::Region}.amazonaws.com/*
- for Amazon Linux 2
arn:aws:s3:::amazonlinux.${AWS::Region}.amazonaws.com
- for ParallelCluster