How to ask for a specific GPU?

Hello,
Now that there are RTX 3080 and 3090, when I ask for an ampere card I can get these GPUs. It is really annoying, is it possible to ask only for A100 gpus ?
There is probably a solution but I don’t know SLURM.
I would try to use --nodelist=gpu[020,022,027,028] but I don’t want to affect negatively the cluster then I prefer to ask first.
Thank you,
Ludovic

Hi @Ludovic.Dumoulin a change we can made is to add a feature. For example: COMPUTE_MODEL_A100_40G. I’m adding the memory size to be able to specify exactly the model as the new A100 has 80G

So you should be able to request an A100 like that:

sbatch --constraint=COMPUTE_MODEL_A100_40G --gpus=1 xxx

Does it sounds good for you?

It would be perfect !!!
Thank you so much if you can do that !

I don’t have a memory constraint. The constraint is about TFLOPS in double precision, RTX 30… are not made for hydrodynamics but for machine learning (and gaming). Then the simulation time of my system on A100 is approximately 5h and of more than 16h on RTX. I don’t know exactly because I ask only for 8h, it is large enough for A100 but not enough to complete half of my simulation on RTX.

Maybe a constraint on double/single precision might do, like:
sbatch --constraint=DOUBLE_PRECISION --gres=gpu:ampere:1
with this constraint it is also possible to ask for P100 (or V100) that is 10 (15) times more effective than a RTX3090 for FP64 if you don’t need FP64 tensor core.
(sbatch --constraint=DOUBLE_PRECISION --gpus=1)

I think a way of sorting ML/Hydro cards would be nice.
(When I say Hydro I mean any computation or data analysis that need FP64)

Hi,

we added a constraint: DOUBLE_PRECISION_GPU for those GPUs:

model Architecture RAM compute capability slurm resource nb nodes
P100 Pascal 12GB 6.0 pascal 6 gpu[004].baobab
P100 Pascal 12GB 6.0 pascal 5 gpu[005].baobab
P100 Pascal 12GB 6.0 pascal 8 gpu[006].baobab
P100 Pascal 12GB 6.0 pascal 4 gpu[007].baobab
A100 Ampere 40GB 8.0 ampere 2 gpu[020,027].baobab
A100 Ampere 40GB 8.0 ampere 6 gpu[022].baobab
A100 Ampere 40GB 8.0 ampere 1 gpu[028].baobab
V100 Volta 32GB 7.0 volta 1 gpu[008].yggdrasil

Please check if it fits your needs.

Best

2 Likes

Hi,

I have a similar issue, I would like to use only RTX 3090 and not the 3080 (due to memory constraints). It appears that there is no way to ask for a specific GPU memory in SLURM and I was therefore wondering if there is a way to specify only RTX 3090?

Best,

Hugues

It works well, thank you !

Hi, right now, the only way seems to list explicitly the nodes to be used.

What we can do is to add a constraint with the GPU model name such as :

RTX_3090_25G and RTX_3090

Hi, is there any change in the way we can specify the memory of a GPU?
I would need more than 12 GB and my SLURM script I have

#SBATCH -p private-cui-gpu
#SBATCH --gpus=titan:1

so I guess I’m short on memory. If I try to request an Ampere GPU, I get this message:

sbatch: error: Batch job submission failed: Requested node configuration is not available

and if I try the constraint to get A100 40G, I get the following:

sbatch: error: Batch job submission failed: Invalid feature specification

I think that the RTX 3090 would be fine for my case, as I don’t need double precision. How should I write the request to use it?

Thanks!

Hi, as suggested by @Yann.Sagon in the HPC Lunch today, I could use more memory in GPU with the following commands in my batch file:

#--- CPU
#SBATCH --mem=20G

#--- GPU
#SBATCH -p shared-gpu
#SBATCH --gpus=1
#SBATCH --gres=gpu:1,VramPerGpu:20G

Indeed, Yann, without requesting memory for the CPU as well, I get this OOM error message:

slurmstepd: error: Detected 1 oom-kill event(s) in StepId=62400628.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: gpu009: task 0: Out Of Memory
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=62400628.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Thanks for your help, Yann! :slight_smile:

1 Like

Hi, glad it helped! I would suggest to remove or lower the VramPerGPu to 10G. Maybe the issue was only the lack of CPU ram and not the GPU model?

2 Likes

Sure! I will try it! Also I’m trying with cropped input 3D images, but I’m not sure yet on the performance of the model in this case. If possible, I’ll use this reduced input instead and I won’t need to request extra memory at all! :smiley:

1 Like