Slurm is reserving a GPU node even if I exclude it

Hello,

I try to run my program on GPU node with the ampere card but slurm is reserving gpu005 which is down (I guess).

sinfo output: shared-gpu up 12:00:00 1 down* gpu005

So, I tried to exclude it but it didn’t work.

#!/bin/sh

#SBATCH --job-name=elem.out
#SBATCH --output=elem-opencl-1.out.o%j
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=shared-gpu
#SBATCH --gpus=ampere:1
#SBATCH --exclude=gpu005 <-- It should exclude gpu005 ?
#SBATCH --time=00:20:00
#SBATCH --mail-type=END

srun /home/users/c/coudrayb/projet-de-bachelor/elementary/cmake-build-release/elementary_opencl_bench 1 900000000

squeue output: (ReqNodeNotAvail, UnavailableNodes:gpu[004-006,020])

gpu004-006 mean gpu004, 005, 006 right ?

Am I doing something wrong ?

Thanks in advance,
Baptiste

Hi,

This is informative only, this doesn’t mean that gpu005 will be assigned.

No need to explicitly exclude it. As you are requesting ampere cards, this node won’t be picked up by slurm.

You get this output because Slurm is listing all the nodes that aren’t available in the requested partition. The filter isn’t on the gpu type that will be used by slurm later. As I said, this is informative only, everything is correct. And yes, in the message gpu005 is included.

1 Like