Important: new GPU types naming in Baobab

Dear users,

TLDR; we have changed the GPU type name, please update your script according to the new GRES value in the table.

Longer explanation:

We have enabled a feature regarding GPUs on Baobab Yggdrasil and Bamboo clusters.

  • Previously, we had a static definition of each GPU card on GPU servers. This was basic and Slurm didn’t know the affinity between the CPU socket and the GPU card. Nor did it know the affinities between GPUs. We had a limited “type” name to target GPUS.

  • Now we’ve enabled NVML GPU auto-detection for Slurm. Slurm is now aware of the GPU topology and can make better resource allocations.
    The GPU type is also automatically inferred and we have a unique type name for each GPU model. You can look up the new GRES name for the requested GPU model in the table. As before, it’s not possible to request more than one GPU type. However, you can use a constraint to filter the GPUs you want to request.

This change should improve job performance as GPUs are now paired with the nearest CPU. This change was made because a user asked us [how to target two GPUs that are linked together] ([GPU][SLURM] How to request a pair of GPUs connected with an NVLINK?).

Best regards

2 Likes

Hi, thanks for the update!

Before the new naming convention, we could request only ampere GPUs like:

salloc ... --gres=gpu:ampere:1,VramPerGpu:2G

What is the new convention if I want to restrict the architecture?

Cheers,
Malte

Dear @Malte.Algren

you can request a specific constraint instead of using the gpu type:

(bamboo)-[sagon@login1 ~]$ srun --constraint=COMPUTE_TYPE_AMPERE --partition=debug-gpu --gres=gpu:1,VramPerGpu:2G hostname
gpu001.bamboo

You can see which feature is available on the nodes:

(bamboo)-[sagon@login1 ~]$ sinfo -o "Node: %n | Gres: %Gres | Feature: %f" -p shared-gpu
Node: HOSTNAMES | Gres: GRESres | Feature: AVAIL_FEATURES
Node: gpu001 | Gres: gpu:nvidia_geforce_rtx_3090:8(S:0-1),VramPerGpu:no_consume:24Gres | Feature: EPYC-7742,V8,COMPUTE_CAPABILITY_8_6,COMPUTE_TYPE_AMPERE,SIMPLE_PRECISION_GPU,COMPUTE_MODEL_RTX_3090_25G
Node: gpu002 | Gres: gpu:nvidia_geforce_rtx_3090:8(S:0-1),VramPerGpu:no_consume:24Gres | Feature: EPYC-7742,V8,COMPUTE_CAPABILITY_8_6,COMPUTE_TYPE_AMPERE,SIMPLE_PRECISION_GPU,COMPUTE_MODEL_RTX_3090_25G
Node: gpu003 | Gres: gpu:nvidia_a100_80gb_pcie:4(S:0),VramPerGpu:no_consume:80Gres | Feature: EPYC-7302P,V8,DOUBLE_PRECISION_GPU,COMPUTE_CAPABILITY_8_0,COMPUTE_TYPE_AMPERE,COMPUTE_MODEL_A100_80G

Best

1 Like