ParaTools Pro for E4S™ Getting Started with Amazon Web Services (AWS)¶
General Background Information¶
In this tutorial we will show you how to launch an HPC cluster on AWS. You will use the command line tools, AWS CLI, and AWS ParallelCluster to create a .yaml file that describes your head-node, and the cluster-nodes. It will then launch a head-node that can spawn EC2 instances that are linked with EFA networking capabilities.
For the purposes of this tutorial, we make the following assumptions: - You have created an AWS account, and an Administrative User
Tutorial¶
Install AWS ParallelCluster¶
To install Pcluster, upgrade pip, and install virtualenv if not installed. Note amazon recommends installing pcluster in a virtual environment. For this section we essentially follow "Setting Up AWS ParallelCluster", if you have any issues look there.
Then create and source the virtual environment:
Then install ParallelCluster. If the version of ParallelCluster does not match the version used to generate the AMI then the cluster creation operation will fail. As of this writing ParaTools Pro for E4S™ AMIs are built with ParallelCluster 3.10.0. Check the version string of your selected ParaTools Pro for E4S™ AMI, visible on the AWS Marketplace listing, for the associated ParallelCluster version.
ParallelCluster needs node.js for CloudFormation, so
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
chmod ug+x ~/.nvm/nvm.sh
source ~/.nvm/nvm.sh
nvm install --lts
node --version
Install AWS Command Line Interface¶
Now we must install AWS CLI, which will handle authenticating your information every time you create a cluster. For this section we follow "Installing AWS CLI", if you have any issues look there.
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
-i
and -b
, as shown below
AWS Security Credentials and CLI Configuration¶
For this section we follow Creating Access Keys and Configuring AWS CLI, if you have any issues look there. If you do not already have a secure access key, you must create one. From the IAM page, on the left side of the page select Users, then select the user you would like to grant access credentials to, then select the Security credentials, and scroll down to Create access key. Create a key for CLI activities. Make sure to save these very securely.
Now we can configure AWS with those security credentials.
And then enter the respective information,AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [us-east-1]: us-west-2
Default output format [None]: json
AWS EC2 Key Pair¶
To perform cluster tasks, such as running and monitoring jobs, or managing users, you must be able to access the cluster head node. To verify you can access the head node instance using SSH, you must use an EC2 key pair. If you do not already have a key pair you in the region you would like to use, follow this guide to quickly make a key
AWS user policies¶
To create and manage clusters in an AWS account, AWS ParallelCluster requires permissions at two levels:
* Permissions that the pcluster user requires to invoke the pcluster CLI commands for creating and managing clusters.
* Permissions that the cluster resources require to perform cluster actions.
The policies described here are supersets of the required permissions to create clusters. If you know what you are doing you can remove permissions as you feel fit. To make the policies, open the IAM page, select Policies on the left, and Create Policy, then select the JSON editor. Copy and paste the policy found here. Unless you plan to use AWS secrets, you must remove the final section from the JSON.
{
"Action": "secretsmanager:DescribeSecret",
"Resource": "arn:aws:secretsmanager:<REGION>:<AWS ACCOUNT ID>:secret:<SECRET NAME>",
"Effect": "Allow"
}
Find the AMI¶
You will need to have the AMI (Amazon Machine Image) ready for this next step. Select the ParaTools Pro for E4S™ marketplace listing for the image you want, click subscribe, then continue to configuration, select the correct region, and then copy the AMI Id that is provided.
Cluster configuration and creation¶
When creating a cluster you will be prompted for: - Region: Select whichever region you are planning to launch these in. - EC2 key pair: Select the one you just created, or plan on using to access the nodes. - Scheduler: You must select slurm - OS: Ubuntu 22.04 - Head node instance type: As it only controls the nodes it does not require much compute capabilities. A t3.large will often suffice. Note the head node does not have to be EFA capable. - Structure of your queue should be selected as required by your use case. - Compute instance types: You must select an EFA capable node. You can find these out by:
aws ec2 describe-instance-types --filters "Name=processor-info.supported-architecture,Values=x86_64*" "Name=network-info.efa-supported,Values=true" --query InstanceTypes[].InstanceType
aws ec2 describe-instance-types --filters "Name=processor-info.supported-architecture,Values=x86_64" "Name=network-info.efa-supported,Values=true" --query 'InstanceTypes[?GpuInfo.Gpus!=null].InstanceType'
To create the cluster-config.yaml file,
INFO: Configuration file cluster-config.yaml will be written.
Press CTRL-C to interrupt the procedure.
Allowed values for AWS Region ID:
1. ap-northeast-1
2. ap-northeast-2
...
15. us-west-1
16. us-west-2
AWS Region ID [us-west-2]:
Allowed values for EC2 Key Pair Name:
1. Your-EC2-key
EC2 Key Pair Name [Your-EC2-key]: 1
Allowed values for Scheduler:
1. slurm
2. awsbatch
Scheduler [slurm]: 1
Allowed values for Operating System:
1. alinux2
2. centos7
3. ubuntu2004
4. ubuntu2204
Operating System [ubuntu2204]:
Head node instance type [t3.large]:
Number of queues [1]:
Name of queue 1 [queue1]:
Number of compute resources for queue1 [1]:
Compute instance type for compute resource 1 in queue1 [t3.micro]: t3.micro
Maximum instance count [10]:
Automate VPC creation? (y/n) [n]: y
Allowed values for Availability Zone:
1. us-west-2a
2. us-west-2b
3. us-west-2c
Availability Zone [us-west-2a]: 1
Allowed values for Network Configuration:
1. Head node in a public subnet and compute fleet in a private subnet
2. Head node and compute fleet in the same public subnet
Network Configuration [Head node in a public subnet and compute fleet in a private subnet]:
Beginning VPC creation. Please do not leave the terminal until the creation is finalized
Creating CloudFormation stack...
Do not leave the terminal until the process has finished.
If there is an error regarding a failed authorization, there may have been an issue in setting up your policies, make sure you have created the 3 policies correctly.
Final Cluster Configurations¶
Opening cluster-config.yaml, add the line CustomAmi: <ParaTools-Pro-ami-id>
under the Image section. Replacing
Spinning up the cluster head node¶
Now that all configuration is complete,
This process will return some JSON such as{
"cluster": {
"clusterName": "name_of_cluster",
"cloudformationStackStatus": "CREATE_IN_PROGRESS",
"cloudformationStackArn": "arn:aws:cloudformation:us-west-2:123456789100:stack/name_of_cluster",
"region": "us-west-2",
"version": "3.5.1",
"clusterStatus": "CREATE_IN_PROGRESS",
"scheduler": {
"type": "slurm"
}
},
"validationMessages": [
{
"level": "WARNING",
"type": "CustomAmiTagValidator",
"message": "The custom AMI may not have been created by pcluster. You can ignore this warning if the AMI is shared or copied from another pcluster AMI. If the AMI is indeed not created by pcluster, cluster creation will fail. If the cluster creation fails, please go to https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting.html#troubleshooting-stack-creation-failures for troubleshooting."
},
{
"level": "WARNING",
"type": "AmiOsCompatibleValidator",
"message": "Could not check node AMI ami-12345678910 OS and cluster OS ubuntu2204 compatibility, please make sure they are compatible before cluster creation and update operations."
}
]
}
pcluster list-clusters
. If it says creation has failed, a common issue is your pcluster version mismatching the one that created the AMI. Make sure you installed the correct version.
Accessing your cluster¶
Once your cluster is finished launching, enter the EC2 page, and select Instances. Then select the newly created node, which should be labeled "Head Node". In the upper right select Connect and select your method of connection. Note for ssh, the username is likely to be "ubuntu", if not, then try to ssh using a conventional terminal, and it should respond with what the username is.
Alternatively you can access your cluster from your local console by doing pcluster ssh -i /path/to/key/file -n name_of_cluster
From there you should be able to launch jobs using slurm.
Runing Examples¶
There is an examples
directory with different tests and examples that you can run.
For using NVIDIA NeMo™ please see examples/nemo/ex2/text_classification/ex2.sbatch
.