Nodes required for job are DOWN, DRAINED or reserved

We try to run a job on the gpu of the Kalousis gpu on baobab. We are Kalousis’ student and have normally access to that partition.

squeue jobid NODELIST returns to following message:
(Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
The message is independent of the file we try to run.
Thanks for your help.
Yoann

Hi @Yoann.Boget

Root Cause:
You are trying to run a job using the private-kalousis-gpu

When you get this kind of message, you should check the availability of the partition used:

(baobab)-[toto@ login2 ~]$ sinfo -p private-kalousis-gpu
PARTITION            AVAIL  TIMELIMIT  NODES  STATE NODELIST
private-kalousis-gpu    up 7-00:00:00      1  drain gpu008

You can see that the node is in the DRAIN state. This means that the node is out of production for a specific reason:

Use the -R option to have more informations:

(baobab)-[toto@login2 ~]$ sinfo -p private-kalousis-gpu -R
REASON               USER      TIMESTAMP           NODELIST
health_BEEGFS: TCP c root      2022-09-12T10:48:03 gpu008

you might not understand the REASON but it’s okay. This information is for us.

Resolution:
The node need an Admin intervention. Just wait until the node is available again.