Hi there,
And we have just added pestat
as well (cf. Slurm_tools/pestat at master · OleHolmNielsen/Slurm_tools · GitHub ), from the upstream “Slurm related software” section (cf. Slurm Workload Manager - Download Slurm ).
pestat
is another tool to check cluster usage, this time focusing on single nodes. @Yann.Sagon and @Pablo.Strasser have already talked about on this forum (cf. [Suggestion] Quick overview of gpu usage for nodes and partitions - #2 by Yann.Sagon and [Suggestion] Reserve one core per gpu on gpu nodes , respectively).
The default output including all ~250 nodes, here few lines:
capello@login2:~$ pestat | \
head -n 20
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (MB) (MB) JobId User ...
gpu002 dpnc-gpu-EL7+ mix 3 12 3.01 257820 232204 32558797 salamda0 32585454 drozd 32585438 drozd
gpu003 dpnc-gpu-EL7+ mix 3 12 3.02 257820 234949 32571724 salamda0 32587282 krivachy 32585455 drozd
gpu004 shared-gpu-EL7 mix 6 20 6.01 128820 121712
gpu005 shared-gpu-EL7 mix 4 20 3.98 128820 102896 32585432 drozd 32585429 drozd 32585428 drozd 32585427 drozd
gpu006 shared-gpu-EL7 mix 8 20 8.02 128820 120871
gpu007 shared-gpu-EL7 alloc 20 20 13.60* 257840 230270 32585443 drozd 32585442 drozd 32585440 drozd 32585439 drozd
gpu008 shared-gpu-EL7 mix 8 20 9.74* 256000 225026 32577812 krivachy 32577822 krivachy 32577832 krivachy 32577842 krivachy 32577852 krivachy 32584204 krivachy 32585453 drozd 32585449 drozd
gpu009 shared-gpu-EL7 mix 8 20 8.00 256000 206983 32585450 drozd 32585436 drozd 32585437 drozd 32585435 drozd 32585433 drozd 32585434 drozd 32585431 drozd 32585430 drozd
gpu010 shared-gpu-EL7 mix 8 20 8.01 256000 210982 32585451 drozd 32585452 drozd 32585448 drozd 32585447 drozd 32585444 drozd 32585445 drozd 32585446 drozd 32585441 drozd
gpu011 shared-gpu-EL7 idle 0 64 0.01 256000 253407
node001 debug-EL7* idle 0 16 0.01 64000 47962
node002 debug-EL7* idle 0 16 0.01 64000 58047
node003 debug-EL7* idle 0 16 0.01 64000 52885
node004 debug-EL7* idle 0 16 0.01 64000 55427
node005 mono-shared-EL7 mix 3 16 3.40 64000 15341 32587732 blanchme 32593378 drozd 32593379 drozd
node007 mono-shared-EL7 mix 13 16 19.80* 64000 40268 32592912 weijiah7 32584737 drozd 32584738 drozd 32584739 drozd 32592853 drozd 32592765 drozd 32593395 blanchme 32593193 blanchme
node008 mono-EL7+ mix 8 16 5.01* 64000 56183 32578161 proix 32589019 blanchme 32589020 blanchme 32589021 blanchme
node009 mono-EL7+ alloc 16 16 16.02 64000 46580 32560799 cantoni 32591518 blanchme 32591519 blanchme 32591520 blanchme 32591521 blanchme 32591522 blanchme 32591523 blanchme 32584795 drozd 32584796 drozd 32584797 drozd 32584798 drozd 32584799 drozd 32584740 drozd 32584741 drozd 32584742 drozd 32584743 drozd
capello@login2:~$
Again, the -h
option gives you plenty of explanation and other options to tweak the information reported.
Thx, bye,
Luca