Hi there,
we have just added to the login node spart
, a new tool to check the overall partition usage/description, (cf. https://github.com/mercanca/spart ).
Here the default output:
capello@login2:~$ spart
QUEUE STA FREE TOTAL RESORC OTHER FREE TOTAL || MAX DEFAULT MAXIMUM CORES NODE
PARTITION TUS CORES CORES PENDNG PENDNG NODES NODES || NODES JOB-TIME JOB-TIME /NODE MEM-GB
debug-EL7 * 64 64 0 8 4 4 || 2 15 mins 15 mins 16 64
mono-EL7 131 784 3256 24 0 49 || - 1 mins 4 days 16 64
parallel-EL7 131 784 932 114 0 49 || - 1 mins 4 days 16 64
shared-EL7 324 3584 1235 9 4 225 || - 1 mins 12 hour 12 40
mono-shared-EL7 324 3584 1112 0 4 225 || - 1 mins 12 hour 12 40
bigmem-EL7 9 16 60 0 0 1 || 1 1 mins 4 days 16 256
shared-bigmem-EL7 39 212 271 0 0 10 || - 1 mins 12 hour 8 256
shared-gpu-EL7 182 228 16 0 4 10 || - 1 mins 12 hour 12 128
admin-EL7 g 16 16 0 0 1 1 || - 1 mins 7 days 16 64
YOUR YOUR YOUR YOUR
RUN PEND OTHR TOTL
COMMON VALUES: 0 0 0 0
capello@login2:~$
The -h
option gives you plenty of explanation and other options to tweak the information reported.
Thx, bye,
Luca
2 Likes
Hi there,
And we have just added pestat
as well (cf. Slurm_tools/pestat at master · OleHolmNielsen/Slurm_tools · GitHub ), from the upstream “Slurm related software” section (cf. Slurm Workload Manager - Download Slurm ).
pestat
is another tool to check cluster usage, this time focusing on single nodes. @Yann.Sagon and @Pablo.Strasser have already talked about on this forum (cf. [Suggestion] Quick overview of gpu usage for nodes and partitions - #2 by Yann.Sagon and [Suggestion] Reserve one core per gpu on gpu nodes , respectively).
The default output including all ~250 nodes, here few lines:
capello@login2:~$ pestat | \
head -n 20
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (MB) (MB) JobId User ...
gpu002 dpnc-gpu-EL7+ mix 3 12 3.01 257820 232204 32558797 salamda0 32585454 drozd 32585438 drozd
gpu003 dpnc-gpu-EL7+ mix 3 12 3.02 257820 234949 32571724 salamda0 32587282 krivachy 32585455 drozd
gpu004 shared-gpu-EL7 mix 6 20 6.01 128820 121712
gpu005 shared-gpu-EL7 mix 4 20 3.98 128820 102896 32585432 drozd 32585429 drozd 32585428 drozd 32585427 drozd
gpu006 shared-gpu-EL7 mix 8 20 8.02 128820 120871
gpu007 shared-gpu-EL7 alloc 20 20 13.60* 257840 230270 32585443 drozd 32585442 drozd 32585440 drozd 32585439 drozd
gpu008 shared-gpu-EL7 mix 8 20 9.74* 256000 225026 32577812 krivachy 32577822 krivachy 32577832 krivachy 32577842 krivachy 32577852 krivachy 32584204 krivachy 32585453 drozd 32585449 drozd
gpu009 shared-gpu-EL7 mix 8 20 8.00 256000 206983 32585450 drozd 32585436 drozd 32585437 drozd 32585435 drozd 32585433 drozd 32585434 drozd 32585431 drozd 32585430 drozd
gpu010 shared-gpu-EL7 mix 8 20 8.01 256000 210982 32585451 drozd 32585452 drozd 32585448 drozd 32585447 drozd 32585444 drozd 32585445 drozd 32585446 drozd 32585441 drozd
gpu011 shared-gpu-EL7 idle 0 64 0.01 256000 253407
node001 debug-EL7* idle 0 16 0.01 64000 47962
node002 debug-EL7* idle 0 16 0.01 64000 58047
node003 debug-EL7* idle 0 16 0.01 64000 52885
node004 debug-EL7* idle 0 16 0.01 64000 55427
node005 mono-shared-EL7 mix 3 16 3.40 64000 15341 32587732 blanchme 32593378 drozd 32593379 drozd
node007 mono-shared-EL7 mix 13 16 19.80* 64000 40268 32592912 weijiah7 32584737 drozd 32584738 drozd 32584739 drozd 32592853 drozd 32592765 drozd 32593395 blanchme 32593193 blanchme
node008 mono-EL7+ mix 8 16 5.01* 64000 56183 32578161 proix 32589019 blanchme 32589020 blanchme 32589021 blanchme
node009 mono-EL7+ alloc 16 16 16.02 64000 46580 32560799 cantoni 32591518 blanchme 32591519 blanchme 32591520 blanchme 32591521 blanchme 32591522 blanchme 32591523 blanchme 32584795 drozd 32584796 drozd 32584797 drozd 32584798 drozd 32584799 drozd 32584740 drozd 32584741 drozd 32584742 drozd 32584743 drozd
capello@login2:~$
Again, the -h
option gives you plenty of explanation and other options to tweak the information reported.
Thx, bye,
Luca
1 Like
Hi, I like using spart
to check usage. I just tried to use it on Yggdrasil and I get Segmentation fault (core dumped)
. Any idea why?
Genevieve.Savard:
Any
Dear Genevieve,
This is a know issue of the last revision onf slurm. We already send the information to the provider and we are waiting for a fix in the future versions.
Details are loggued on this page : https://github.com/mercanca/spart/issues/17
Best regards,
1 Like