Hello,
I try to run my program on GPU node with the ampere card but slurm is reserving gpu005 which is down (I guess).
sinfo output: shared-gpu up 12:00:00 1 down* gpu005
So, I tried to exclude it but it didn’t work.
#!/bin/sh
#SBATCH --job-name=elem.out
#SBATCH --output=elem-opencl-1.out.o%j
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=shared-gpu
#SBATCH --gpus=ampere:1
#SBATCH --exclude=gpu005 <-- It should exclude gpu005 ?
#SBATCH --time=00:20:00
#SBATCH --mail-type=END
srun /home/users/c/coudrayb/projet-de-bachelor/elementary/cmake-build-release/elementary_opencl_bench 1 900000000
squeue output: (ReqNodeNotAvail, UnavailableNodes:gpu[004-006,020])
gpu004-006 mean gpu004, 005, 006 right ?
Am I doing something wrong ?
Thanks in advance,
Baptiste