Srun fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive

Hi,

I’m trying to submit GPU jobs to the dpnc-gpu-EL7 partition and I’m encountering an error I had never seen before in the log files:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

The batch options of my script are:
#!/bin/env bash
#SBATCH --time=11:00:00
#SBATCH --partition=dpnc-gpu-EL7
#SBATCH --gres=gpu:1
#SBATCH --constraint=“V3|V4|V5|V6”
#SBATCH --mem=15G
#SBATCH --output=logs/train-%j.out
#SBATCH --job-name=‘DNN_train’

What’s interesting is that a couple days ago I ran a script which is completely identical, but on the shared-gpu-EL7 partition with the exact same options. These jobs didn’t crash.

Is there something off with the dpnc-gpu-EL7 partition?

Relevant path:
“Faulty” submission script: /home/drozd/analysis/runs/run_07Feb20_addSTK/runTraining_faulty.sh
“Faulty” logs: /home/drozd/analysis/runs/run_07Feb20_addSTK/logs/train-29797626.out
/home/drozd/analysis/runs/run_07Feb20_addSTK/logs/train-29797630.out
“Good” submission script: /home/drozd/analysis/runs/run_06Jan20_multiVars/runTraining.sh
“Good” log: /home/drozd/analysis/runs/run_06Jan20_multiVars/logs/train-29798226.out

The workaround looks to be as simple as using the shared-gpu queue instead of the DPNC one, but I’m curious about this issue…

Cheers

Hi there,

Not that we are aware of, also considering that the private partitions are simply a subset of the common partitions, thus the node configuration does not change.

Even when connected as your account, I was not able to reproduce your error on the dpnc-gpu-EL7 partition with the following command…

srun -p dpnc-gpu-EL7 --nodelist=gpu002 --time=11:00:00 --gres=gpu:1 --constraint="V3|V4|V5|V6" --mem=15G -n 1 -c 1 --pty $SHELL

…nor with a modified version of your sbatch above (JobId 30059441), is the error still present?

Two weeks ago we had another report with a similar error, but again I was not able to reproduce it and the user has not provided feedback yet.

Thx, bye,
Luca

Hi Luca,

I can’t reproduce either, using the same script as before. Strange.

Hi there,

OK, considering the issue temporary (and solved) for the moment, feel free to come back if it happens again.

Thx, bye,
Luca