Both of the mentioned nodes show alloc
on shared-gpu
but when I try squeue -w gpu032
no jobs appear.
Are they taken by private partition jobs? If so, would it be possible to know how long those jobs have remaining to run? I need an 80GB memory gpu and if they are unavailable for several days I will need to find another solution.
Ok it seems they are indeed occupied on private allocations. I didn’t realize sinfo --all
would show even private partitions.
You can find more information by running these commands :
(baobab)-[alberta@admin1 ~]$ sinfo -n gpu[032-033]
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
admin up 7-00:00:00 0 n/a
debug-cpu* up 15:00 0 n/a
public-interactive-cpu up 8:00:00 0 n/a
public-longrun-cpu up 14-00:00:0 0 n/a
public-cpu up 4-00:00:00 0 n/a
public-short-cpu up 1:00:00 0 n/a
public-bigmem up 4-00:00:00 0 n/a
shared-cpu up 12:00:00 0 n/a
shared-bigmem up 12:00:00 0 n/a
shared-gpu up 12:00:00 2 alloc gpu[032-033]
(baobab)-[alberta@admin1 ~]$ sacct -N gpu[032-033]
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
4342029 LM_TRAINI+ private-r+ toto 64 RUNNING 0:0
4342029.bat+ batch toto 64 RUNNING 0:0
4342029.ext+ extern toto 64 RUNNING 0:0
4342030 LM_TRAINI+ private-r+ toto 64 RUNNING 0:0
4342030.bat+ batch toto 64 RUNNING 0:0
4342030.ext+ extern toto 64 RUNNING 0:0
Be careful, the node’s status can change very quickly. One second it may appear idle, and the next it’s allocated.
Best regards
For others who might hit the same issue, I created a modified version of squeue
which by default will show some extended formatting options that help figure out what’s going to be on a node for how long:
alias sqlong="squeue --all --Format=UserName:12,Partition:16,NumCPUs:6,TimeLimit:11,TimeUsed:10,StateCompact:3,Reason:15,NodeList"
which, for an example with gpu029
, produces:
gercek@login2:~$ sq -w gpu029
USER PARTITION CPUS TIME_LIMIT TIME ST REASON NODELIST
kinakh private-sip-gpu 16 7-00:00:00 15:22:50 R None gpu029
drozdova private-sip-gpu 16 5-20:00:00 15:22:50 R None gpu029
drozdova private-sip-gpu 8 5-20:00:00 15:22:50 R None gpu029
Edit: forgot to note that you should save this in your ~/.bashrc
if you’d like it to be available every login.