ParaTools Pro for E4S™ Getting Started with AWS ParallelCluster¶
Looking for AWS Parallel Computing Service (PCS)?
This guide covers AWS ParallelCluster (PC), the open-source self-managed orchestrator. For the managed-service alternative, see Getting Started with AWS Parallel Computing Service (PCS).
General Background Information¶
This tutorial configures AWS ParallelCluster (PC) with the matching ParaTools Pro for E4S™ on ParallelCluster AMI from the AWS Marketplace:
| Architecture | AWS Marketplace product |
|---|---|
x86_64 |
ParaTools Pro for E4S™ on ParallelCluster (x86) |
arm64 (Graviton) |
ParaTools Pro for E4S™ on ParallelCluster (arm64) |
You will use the command line tools, AWS CLI, and AWS ParallelCluster to create a .yaml file that describes your head node and the cluster nodes. It will then launch a head node that can spawn EC2 instances linked with EFA networking capabilities.
This tutorial assumes that you have already created an AWS account and an Administrative User.
Tutorial¶
Install AWS ParallelCluster¶
To install ParallelCluster, upgrade pip and install virtualenv if it is not already installed. Amazon recommends installing ParallelCluster in a virtual environment. This section follows "Setting Up AWS ParallelCluster"; refer to it if you run into issues.
Then create and source the virtual environment:
Install ParallelCluster. If the version of ParallelCluster does not match the version used to generate the AMI, the cluster creation operation will fail. At the time of writing, ParaTools Pro for E4S™ AMIs are built with ParallelCluster 3.10.0. Check the version string of your selected ParaTools Pro for E4S™ AMI, visible on the AWS Marketplace listing, for the associated ParallelCluster version.
ParallelCluster requires Node.js for CloudFormation. Install it with:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
chmod ug+x ~/.nvm/nvm.sh
source ~/.nvm/nvm.sh
nvm install --lts
node --version
Install AWS Command Line Interface¶
Install the AWS CLI, which handles authentication every time you create a cluster. This section follows "Installing AWS CLI"; refer to it if you run into issues.
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
If you do not have sudo privileges, specify install and binary locations with the -i and -b flags:
AWS Security Credentials and CLI Configuration¶
This section follows Creating Access Keys and Configuring AWS CLI; refer to them if you run into issues.
If you do not already have an access key, create one. From the IAM page, select Users on the left, choose the user to grant access credentials to, open the Security credentials tab, and scroll down to Create access key. Create a key for CLI activities, and store it securely.
Configure the AWS CLI with those credentials:
Enter the requested information:
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [us-east-1]: us-west-2
Default output format [None]: json
AWS EC2 Key Pair¶
Cluster tasks such as running jobs, monitoring jobs, and managing users require access to the cluster head node. Access over SSH requires an EC2 key pair. If none exists in the target region, follow this guide to create one.
AWS user policies¶
To create and manage clusters in an AWS account, AWS ParallelCluster requires permissions at two levels:
- Permissions that the
pclusteruser requires to invoke thepclusterCLI commands for creating and managing clusters. - Permissions that the cluster resources require to perform cluster actions.
The policies described here are supersets of the required permissions; trim them down as needed. To create the policies, open the IAM page, select Policies on the left, click Create Policy, and select the JSON editor. Copy and paste the policy found here. Unless AWS Secrets Manager is in use, remove the final section from the JSON:
{
"Action": "secretsmanager:DescribeSecret",
"Resource": "arn:aws:secretsmanager:<REGION>:<AWS ACCOUNT ID>:secret:<SECRET NAME>",
"Effect": "Allow"
}
<AWS ACCOUNT ID> with your 12-digit account ID.
Create the policy and name it ClusterPolicy1. Create another policy using this JSON, naming it ClusterPolicy2 and replacing the account ID placeholder as before. From the policies menu, open ClusterPolicy1, click Entities attached, and attach the users who will create clusters. Repeat for ClusterPolicy2. In the policies list, find AmazonVPCFullAccess and attach it to the same users. This allows them to create VPCs when needed.
Find the AMI¶
Prepare the AMI (Amazon Machine Image) for the next step. Open the ParaTools Pro for E4S™ marketplace listing for the image you want, click Subscribe, click Continue to Configuration, select the correct region, and copy the AMI ID that is displayed.
Cluster configuration and creation¶
Cluster creation prompts for the following:
- Region: the region in which to launch the cluster.
- EC2 key pair: the key pair you created earlier, or one you intend to use to access the nodes.
- Scheduler: select slurm.
- OS: Ubuntu 22.04.
- Head node instance type: the head node only manages the compute fleet, so it does not require much compute capacity. A
t3.largeis usually sufficient. The head node does not need to be EFA-capable. - Queue structure: select as required by your use case.
- Compute instance types: select an EFA-capable instance. To list EFA-capable instance types:
aws ec2 describe-instance-types --filters "Name=processor-info.supported-architecture,Values=x86_64*" "Name=network-info.efa-supported,Values=true" --query InstanceTypes[].InstanceType
To list EFA-capable instances that also have GPU support:
aws ec2 describe-instance-types --filters "Name=processor-info.supported-architecture,Values=x86_64" "Name=network-info.efa-supported,Values=true" --query 'InstanceTypes[?GpuInfo.Gpus!=null].InstanceType'
- Network settings: select as required for your workflow, or accept the defaults.
- Automatic VPC: unless you already have a VPC to reuse, select yes. Note that AWS imposes per-account VPC limits, so unused VPCs should be deleted periodically.
Create the cluster-config.yaml file:
If the command reports an authorization failure, one of the policies was likely misconfigured. Verify that all three policies were created correctly.
Final Cluster Configurations¶
Open cluster-config.yaml and add CustomAmi: <ParaTools-Pro-ami-id> under the Image section, replacing <ParaTools-Pro-ami-id> with the AMI ID obtained in the prior section:
To enable RDP/DCV access to the head node, add the following Dcv block:
Spinning up the cluster head node¶
Once configuration is complete, launch the cluster:
The command returns JSON similar to:
Cluster creation takes a few minutes. Monitor progress with pcluster list-clusters. If creation fails, a common cause is a ParallelCluster version mismatch between the CLI and the AMI. Verify that the installed version matches the AMI.
Accessing your cluster¶
Once the cluster finishes launching, open the EC2 page and select Instances. Select the newly created instance labeled Head Node. Click Connect in the upper right and choose your connection method. For SSH, the default username is typically ubuntu; if it is not, connect with a standard SSH client and the server will report the expected username.
Alternatively, connect from your local terminal with:
From the head node, you can submit jobs using Slurm.
Running Examples¶
The head node contains an examples directory with tests and example workloads. For NVIDIA NeMo™, see examples/nemo/ex2/text_classification/ex2.sbatch.
NVIDIA NeMo™ and BioNeMo™ live in a dedicated Python environment
NeMo and BioNeMo are installed in a separate virtual environment to avoid dependency conflicts with other GPU/ML packages. Activate it before running NeMo or BioNeMo workloads (or source it from your sbatch script):
Other Python packages (including vLLM) are available in the default system Python and require no activation.