ParaTools Pro for E4S™ Getting Started with Google Cloud Platform (GCP)¶
General Background Information¶
This tutorial roughly follows the same steps as the "Deploy an HPC cluster with Slurm" quickstart from the Cluster Toolkit project. This tutorial assumes the following:
- You have created a Google Cloud account.
- You have created a Google Cloud project appropriate for this tutorial, and it is selected.
- You have set up billing for your Google Cloud project.
- You have enabled the Compute Engine API.
- You have enabled the Filestore API.
- You have enabled the Cloud Storage API.
- You have enabled the Service Usage API.
- You have enabled the Cloud Resource Manager API.
- You have subscribed to ParaTools Pro for E4S™ on the GCP Marketplace.
- You are aware of the costs for running instances on GCP Compute Engine, and of the costs of using the ParaTools Pro for E4S™ GCP Marketplace VM image.
- You are comfortable using the GCP Cloud Shell, or are running locally (which will match this tutorial), are familiar with SSH and a terminal, and have installed and initialized the gcloud CLI.
Tutorial¶
Getting Set Up¶
First, capture your PROJECT_ID and PROJECT_NUMBER.
Navigate to the GCP project selector and select the project for this tutorial.
Take note of the PROJECT_ID and PROJECT_NUMBER.
Open your local shell or the GCP Cloud Shell, and run the following commands:
export PROJECT_ID=<enter your project ID here>
export PROJECT_NUMBER=<enter your project number here>
Set a default project you will be using for this tutorial. If you have multiple projects, you can switch back to a different one when you are finished.
Next, ensure that the default Compute Engine service account is enabled:
gcloud iam service-accounts enable \
--project="${PROJECT_ID}" \
${PROJECT_NUMBER}-compute@developer.gserviceaccount.com
roles/compute.instanceAdmin.v1 for managing Compute Engine resources, and
roles/iam.serviceAccountUser to act as the service account):
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member=serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com \
--role=roles/compute.instanceAdmin.v1
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member=serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com \
--role=roles/iam.serviceAccountUser
Minimum required IAM roles
Older tutorials grant roles/editor to this service account. That role is
project-wide and far broader than required. The two roles above are the
minimum recommended by the upstream
Cluster Toolkit setup guide.
Install the Cluster Toolkit¶
Pre-built binary bundle (alternative)
Since Cluster Toolkit v1.82.0, Google publishes pre-built gcluster bundles on the
Releases page. Download the bundle matching your OS and architecture
(e.g., gcluster_bundle_linux_amd64.zip, gcluster_bundle_mac_arm64.zip), unzip it,
and skip the build-from-source steps below. The bundle includes the gcluster binary
plus the examples/ and community/examples/ directories. The build-from-source
instructions below are still supported and required on Windows hosts.
First install the dependencies of gcluster. Instructions to do this are included below.
If you encounter trouble please check the latest instructions from Google,
available here. If you are running the GCP Cloud Shell, you do not need to install the dependencies and can skip ahead to cloning the Cluster Toolkit.
Install the Cluster Toolkit Prerequisites
Please download and install any missing software packages from the following list:
- Terraform version 1.12.2 or later (note: Terraform 1.6 and later are licensed under the BUSL)
- Packer version 1.10.0 or later
- Go version 1.23 or later. Ensure that the
GOPATHis set up andgois on yourPATH. You may need to add the following to.profileor.bashrcstartup "dot" file: - Git
make(see below for instructions specific to your OS)
Use your OS package manager
Most of the packages above may be installable through your OS's package manager.
For example, if you have Homebrew on macOS you should be able to brew install <package_name>
for most of these items, where <package_name> is, e.g., go.
Once all the software listed above has been verified or installed, clone the Cluster Toolkit and change directories to the cloned repository:
Next, build the Cluster Toolkit, then verify the version and confirm that it built correctly. To install the compiled binary on your$PATH, run:
This installs the gcluster binary into /usr/local/bin. If you do not have
root privileges or do not want to install the binary into a system-wide
location, run:
This installs gcluster into ${HOME}/bin; then ensure that directory is
on your PATH:
Grant ADC Access to Terraform¶
Generate cloud credentials associated with your Google Cloud account and grant Terraform access to the Application Default Credential (ADC).
Cloud Shell users skip this step
If you are using the Cloud Shell you can skip this step.
OS Login is already enabled by default
The Slurm GCP v6 modules used by the example blueprint
(schedmd-slurm-gcp-v6-controller, schedmd-slurm-gcp-v6-login, and
schedmd-slurm-gcp-v6-nodeset) all enable OS Login at the
instance level by default, so you do not need to enable it at the
project level for the cluster's VMs to accept SSH from your Google
identity. Skip the gcloud compute project-info add-metadata --metadata
enable-oslogin=TRUE step you may have seen in older tutorials.
If you have a non-default scenario where you actively want to disable
OS Login on a specific role (for example, to use legacy project-wide SSH
keys on the login node), set enable_oslogin: false in that module's
settings: block in your blueprint -- do not change project-level
metadata.
Heidi conflict
Do not enable OS Login at the project level on a project that also runs ParaTools Pro Heidi (the Adaptive Computing-orchestrated marketplace product). Heidi relies on instance-metadata-injected SSH keys for cluster-internal authentication, and project-level OS Login breaks that injection. The default behavior of the v6 modules (instance-level OS Login, no project-level change) is safe to mix with Heidi in the same project.
Your user identity still needs the right IAM roles for SSH to succeed. At minimum, grant yourself:
roles/compute.osLogin(orroles/compute.osAdminLoginif you needsudo) so the VMs accept your Google identity as a Linux user.roles/iap.tunnelResourceAccessorso the GCP Console SSH button (which tunnels through Identity-Aware Proxy) can reach the login node.
USER_EMAIL=$(gcloud config get-value account)
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member="user:${USER_EMAIL}" --role=roles/compute.osLogin
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member="user:${USER_EMAIL}" --role=roles/iap.tunnelResourceAccessor
Deploy the Cluster¶
Copy the ParaTools-Pro-slurm-cluster-blueprint-example from the
ParaTools Pro for E4S™ documentation to your clipboard, then paste it into a file
named e4s-25.11-cluster-slurm-gcp-v6.yaml. After copying the text, in your
terminal do the following:
cat > e4s-25.11-cluster-slurm-gcp-v6.yaml
# paste the copied text # (1)!
# press Ctrl-d to add an end-of-file character
cat e4s-25.11-cluster-slurm-gcp-v6.yaml # check that the file copied correctly # (2)!
- Usually
Ctrl-v, orCommand-von macOS, to paste from the clipboard. - Optional, but recommended.
Using your favorite editor, select appropriate instance types for the compute
partitions. If you do not have access to H3 instances, remove the h3_nodeset
and h3_partition modules from the blueprint, and remove the - h3_partition
entry from the slurm_controller.use: list (otherwise gcluster create will
fail with an unresolved-reference error). See the expandable annotations and
pay extra attention to the highlighted lines on the
ParaTools-Pro-slurm-cluster-blueprint-example example.
Pay Attention
In particular:
- Determine whether to pass the
${PROJECT_ID}on the command line, or setvars.project_id:directly in the blueprint. - Verify that the
image_familykey (paratools-gcluster-e4s-2511-nvidia89-x86-64for the current GCluster x86-64 image) matches the image family of the ParaTools Pro for E4S™ on GCluster GCP Marketplace listing. - Adjust the region and zone used, if desired.
- Set an appropriate
machine_typeandnode_count_dynamic_maxfor each*_nodeset(debug_nodeset,compute_nodeset, andh3_nodeset). - The default
networkmodule provides IAP SSH only (which is what the GCP Console SSH button uses). To SSH directly from your workstation, see Allowing direct SSH from your workstation in the blueprint reference.
Once the blueprint is configured to be consistent with your GCP usage quotas and your preferences, set deployment variables and create the deployment folder.
Create deployment folder
- If you uncommented and updated
vars.project_id:in the blueprint, you do not need to pass--vars project_id=...on the command line. If you are bringing a cluster back online that was previously deleted, but the blueprint has been modified and the deployment folder is still present, pass the-wflag togcluster createto overwrite the deployment folder contents with the latest changes.
gcluster create produces a deployment folder named after the blueprint's
vars.deployment_name: field -- in this example,
./ppro-e4s-25-11-cluster/. The next step references that folder.
Provisioning time
It may take a few minutes to finish provisioning your cluster.
Now the cluster can be deployed. Run the following command to deploy your ParaTools Pro for E4S™ cluster:
Review the proposed changes, then press a to accept.
Connect to the Cluster¶
Once the cluster is deployed, SSH to the login node.
-
Go to the Compute Engine → VM Instances page.
-
Click SSH for the login node of the cluster. You may need to approve Google authentication before the session can connect.
SSH permission errors
If clicking SSH in the Console produces a permission error, confirm
that your user identity holds the IAM roles listed in the OS Login
admonition under Grant ADC Access to Terraform
(roles/compute.osLogin and roles/iap.tunnelResourceAccessor).
Delete the Cluster¶
When you are done using the cluster, you must use gcluster to destroy it.
If your instances were deleted in a different manner, see
Proper Cluster Deletion on GCP. To delete your cluster
correctly, run:
Review the proposed changes, then press a to accept and proceed with the
deletion.