We try to run a job on the gpu of the Kalousis gpu on baobab. We are Kalousis’ student and have normally access to that partition.
squeue jobid NODELIST returns to following message:
(Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
The message is independent of the file we try to run.
Thanks for your help.
Yoann
Hi @Yoann.Boget
Root Cause:
You are trying to run a job using the private-kalousis-gpu
When you get this kind of message, you should check the availability of the partition used:
(baobab)-[toto@ login2 ~]$ sinfo -p private-kalousis-gpu
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
private-kalousis-gpu up 7-00:00:00 1 drain gpu008
You can see that the node is in the DRAIN state. This means that the node is out of production for a specific reason:
Use the -R option to have more informations:
(baobab)-[toto@login2 ~]$ sinfo -p private-kalousis-gpu -R
REASON USER TIMESTAMP NODELIST
health_BEEGFS: TCP c root 2022-09-12T10:48:03 gpu008
you might not understand the REASON but it’s okay. This information is for us.
Resolution:
The node need an Admin intervention. Just wait until the node is available again.