Skip to content

ParaTools Pro for E4S™ Getting Started with Google Cloud Platform (GCP)

General Background Information

This tutorial roughly follows the same steps as the "Deploy an HPC cluster with Slurm" quickstart from the Cluster Toolkit project. This tutorial assumes the following:

Tutorial

Getting Set Up

First, capture your PROJECT_ID and PROJECT_NUMBER. Navigate to the GCP project selector and select the project for this tutorial. Take note of the PROJECT_ID and PROJECT_NUMBER. Open your local shell or the GCP Cloud Shell, and run the following commands:

export PROJECT_ID=<enter your project ID here>
export PROJECT_NUMBER=<enter your project number here>

Set a default project you will be using for this tutorial. If you have multiple projects, you can switch back to a different one when you are finished.

gcloud config set project "${PROJECT_ID}"

Next, ensure that the default Compute Engine service account is enabled:

gcloud iam service-accounts enable \
     --project="${PROJECT_ID}" \
     ${PROJECT_NUMBER}-compute@developer.gserviceaccount.com
and grant the service account the IAM roles required by Cluster Toolkit (roles/compute.instanceAdmin.v1 for managing Compute Engine resources, and roles/iam.serviceAccountUser to act as the service account):

gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member=serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com \
    --role=roles/compute.instanceAdmin.v1
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member=serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com \
    --role=roles/iam.serviceAccountUser

Minimum required IAM roles

Older tutorials grant roles/editor to this service account. That role is project-wide and far broader than required. The two roles above are the minimum recommended by the upstream Cluster Toolkit setup guide.

Install the Cluster Toolkit

Pre-built binary bundle (alternative)

Since Cluster Toolkit v1.82.0, Google publishes pre-built gcluster bundles on the Releases page. Download the bundle matching your OS and architecture (e.g., gcluster_bundle_linux_amd64.zip, gcluster_bundle_mac_arm64.zip), unzip it, and skip the build-from-source steps below. The bundle includes the gcluster binary plus the examples/ and community/examples/ directories. The build-from-source instructions below are still supported and required on Windows hosts.

First install the dependencies of gcluster. Instructions to do this are included below. If you encounter trouble please check the latest instructions from Google, available here. If you are running the GCP Cloud Shell, you do not need to install the dependencies and can skip ahead to cloning the Cluster Toolkit.

Install the Cluster Toolkit Prerequisites

Please download and install any missing software packages from the following list:

  • Terraform version 1.12.2 or later (note: Terraform 1.6 and later are licensed under the BUSL)
  • Packer version 1.10.0 or later
  • Go version 1.23 or later. Ensure that the GOPATH is set up and go is on your PATH. You may need to add the following to .profile or .bashrc startup "dot" file:
    export PATH=$PATH:$(go env GOPATH)/bin
    
  • Git
  • make (see below for instructions specific to your OS)

make is packaged with the Xcode command line developer tools on macOS. To install, run:

xcode-select --install

Install make with the OS' package manager:

apt-get -y install make

Install make with the OS' package manager:

yum install -y make

Use your OS package manager

Most of the packages above may be installable through your OS's package manager. For example, if you have Homebrew on macOS you should be able to brew install <package_name> for most of these items, where <package_name> is, e.g., go.

Once all the software listed above has been verified or installed, clone the Cluster Toolkit and change directories to the cloned repository:

git clone https://github.com/GoogleCloudPlatform/cluster-toolkit.git
cd cluster-toolkit/
Next, build the Cluster Toolkit, then verify the version and confirm that it built correctly.
make
./gcluster --version
To install the compiled binary on your $PATH, run:

sudo make install

This installs the gcluster binary into /usr/local/bin. If you do not have root privileges or do not want to install the binary into a system-wide location, run:

make install-user

This installs gcluster into ${HOME}/bin; then ensure that directory is on your PATH:

export PATH="${PATH}:${HOME}/bin"

Grant ADC Access to Terraform

Generate cloud credentials associated with your Google Cloud account and grant Terraform access to the Application Default Credential (ADC).

Cloud Shell users skip this step

If you are using the Cloud Shell you can skip this step.

gcloud auth application-default login

OS Login is already enabled by default

The Slurm GCP v6 modules used by the example blueprint (schedmd-slurm-gcp-v6-controller, schedmd-slurm-gcp-v6-login, and schedmd-slurm-gcp-v6-nodeset) all enable OS Login at the instance level by default, so you do not need to enable it at the project level for the cluster's VMs to accept SSH from your Google identity. Skip the gcloud compute project-info add-metadata --metadata enable-oslogin=TRUE step you may have seen in older tutorials.

If you have a non-default scenario where you actively want to disable OS Login on a specific role (for example, to use legacy project-wide SSH keys on the login node), set enable_oslogin: false in that module's settings: block in your blueprint -- do not change project-level metadata.

Heidi conflict

Do not enable OS Login at the project level on a project that also runs ParaTools Pro Heidi (the Adaptive Computing-orchestrated marketplace product). Heidi relies on instance-metadata-injected SSH keys for cluster-internal authentication, and project-level OS Login breaks that injection. The default behavior of the v6 modules (instance-level OS Login, no project-level change) is safe to mix with Heidi in the same project.

Your user identity still needs the right IAM roles for SSH to succeed. At minimum, grant yourself:

  • roles/compute.osLogin (or roles/compute.osAdminLogin if you need sudo) so the VMs accept your Google identity as a Linux user.
  • roles/iap.tunnelResourceAccessor so the GCP Console SSH button (which tunnels through Identity-Aware Proxy) can reach the login node.
USER_EMAIL=$(gcloud config get-value account)
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member="user:${USER_EMAIL}" --role=roles/compute.osLogin
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member="user:${USER_EMAIL}" --role=roles/iap.tunnelResourceAccessor

Deploy the Cluster

Copy the ParaTools-Pro-slurm-cluster-blueprint-example from the ParaTools Pro for E4S™ documentation to your clipboard, then paste it into a file named e4s-25.11-cluster-slurm-gcp-v6.yaml. After copying the text, in your terminal do the following:

cat > e4s-25.11-cluster-slurm-gcp-v6.yaml
# paste the copied text # (1)!
# press Ctrl-d to add an end-of-file character
cat e4s-25.11-cluster-slurm-gcp-v6.yaml # check that the file copied correctly # (2)!
  1. Usually Ctrl-v, or Command-v on macOS, to paste from the clipboard.
  2. Optional, but recommended.

Using your favorite editor, select appropriate instance types for the compute partitions. If you do not have access to H3 instances, remove the h3_nodeset and h3_partition modules from the blueprint, and remove the - h3_partition entry from the slurm_controller.use: list (otherwise gcluster create will fail with an unresolved-reference error). See the expandable annotations and pay extra attention to the highlighted lines on the ParaTools-Pro-slurm-cluster-blueprint-example example.

Pay Attention

In particular:

  • Determine whether to pass the ${PROJECT_ID} on the command line, or set vars.project_id: directly in the blueprint.
  • Verify that the image_family key (paratools-gcluster-e4s-2511-nvidia89-x86-64 for the current GCluster x86-64 image) matches the image family of the ParaTools Pro for E4S™ on GCluster GCP Marketplace listing.
  • Adjust the region and zone used, if desired.
  • Set an appropriate machine_type and node_count_dynamic_max for each *_nodeset (debug_nodeset, compute_nodeset, and h3_nodeset).
  • The default network module provides IAP SSH only (which is what the GCP Console SSH button uses). To SSH directly from your workstation, see Allowing direct SSH from your workstation in the blueprint reference.

Once the blueprint is configured to be consistent with your GCP usage quotas and your preferences, set deployment variables and create the deployment folder.

Create deployment folder

./gcluster create e4s-25.11-cluster-slurm-gcp-v6.yaml \
  --vars project_id=${PROJECT_ID} # (1)!
  1. If you uncommented and updated vars.project_id: in the blueprint, you do not need to pass --vars project_id=... on the command line. If you are bringing a cluster back online that was previously deleted, but the blueprint has been modified and the deployment folder is still present, pass the -w flag to gcluster create to overwrite the deployment folder contents with the latest changes.

gcluster create produces a deployment folder named after the blueprint's vars.deployment_name: field -- in this example, ./ppro-e4s-25-11-cluster/. The next step references that folder.

Provisioning time

It may take a few minutes to finish provisioning your cluster.

Now the cluster can be deployed. Run the following command to deploy your ParaTools Pro for E4S™ cluster:

Perform the deployment

./gcluster deploy ppro-e4s-25-11-cluster

Review the proposed changes, then press a to accept.

Connect to the Cluster

Once the cluster is deployed, SSH to the login node.

  1. Go to the Compute EngineVM Instances page.

    GCP VM Instances

  2. Click SSH for the login node of the cluster. You may need to approve Google authentication before the session can connect.

SSH permission errors

If clicking SSH in the Console produces a permission error, confirm that your user identity holds the IAM roles listed in the OS Login admonition under Grant ADC Access to Terraform (roles/compute.osLogin and roles/iap.tunnelResourceAccessor).

Delete the Cluster

When you are done using the cluster, you must use gcluster to destroy it. If your instances were deleted in a different manner, see Proper Cluster Deletion on GCP. To delete your cluster correctly, run:

./gcluster destroy ppro-e4s-25-11-cluster

Review the proposed changes, then press a to accept and proceed with the deletion.