Drained nodes in mono-shared-EL7?

Dear all,

it has been a while that I don’t manage to have a job run on the mono-shared-E7 partition, squeue always says

[cucci@login2 ~]$ squeue | grep cucci
  32123701_[1-100] mono-shar navigati    cucci PD       0:00      1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)

Will my job eventually start and it is just a matter of waiting or there is something wrong? I don’t understand exactly the meaning of DRAINED for a node (some in mono-shared-EL7 are actually drain if I look to sinfo).

Thanks to everybody

Dear Davide,

nothing wrong. You job will start when a resource will be free for your job. When a node is “drained” it means it’s not available for a job. It may have been put on drain on purpose (for example to reboot it, test something or whatever) or it may be put on drain because an error occured on it and Slurm noticed it. Once issue fixed, we can resume to node to idle mode.

The message here only indicates that some nodes aren’t available because they are in drain and/or reserved for higher priority jobs. Fortunately it’s not all the nodes that are in this state.

I see you canceled your job, so I can’t investigate why it was pending.

[root@master ~]# sacct -j 32123701
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
32123701_[1+ navigation mono-shar+   guerries          1 CANCELLED+      0:0

Best