It happen sometime on gpu nodes that gpu tasks are not scheduled on gpus because no cpu core are available even if gpus are available.
For example using pestat:
gpu009 cui-gpu-EL7 alloc 20 20 20.01 256000 249657 gpu:titan:8 32383496 anonymous
gpu010 cui-gpu-EL7 alloc 20 20 16.53* 256000 250910 gpu:titan:8 32383507 anonymous 32383506 anonymous 32383504 anonymous 32383503 anonymous 32383502 anonymous
Both gpu nodes use all cores but without using all gpus. I believe reserving 1 core for every gpu would allows to avoid this sheduling situation by allowing to launch gpu only jobs like the majority of pytorch or tensorflow code which are single threaded.
I don’t know if is possible to configure slurm for this or if it could be detrimental for other use cases.
edit: edited out the name of the person from the log. To be clear, I don’t want to attack anybody. I’m mainly discussing some scheduling issue with the sheduler.
In don’t see in your
pestat how you figured out that not all the gpus are in use?
To answser your question, I don’t know if we can enforce that one GPU is allocated per core. If it would be possible, how would you proceed with nodes with more CPUs cores than GPUs? You’ll get unused CPU not allocatable?
Anyway, I see your point and that’s true that as GPU are more expensive than CPU they should be “usable”.
I used scontrol on every job to check if some job were using multiple GPUs, they were not.
One solution I see based on the assumption that the number of cores is always higher than the number of GPUs is to reduce the number of allocatable CPU by the number of GPUs. When you do a request using at least one GPUs you reduce the number of core asked with the number of GPUs to obtain the number of additional cores you get always getting the number of GPUs free core.
The only problem I see with this algorithm is for people asking less core than the number of GPUs.
In conclusion, the idea is that every GPUs come with one core reserved for it and you can ask for more if you want.
The purpose of this change is to ensure that a GPU always has one core which is the minimum for a single GPU task to do something with it.
AFAIK 1 core per GPU is very restrictive for anything beyond MNIST or in-memory datasets. In my experience you need at least 2-4 feeder threads (note that pytorch dataloaders are already multi-threaded).
But as Pablo states, it would be good to have some sanity warning if possible, i.e. if a user requests 1 GPU and all CPUs on a node that they should get a warning message or something of that nature stating they are blocking usage for everyone else.
Renewing this thread: there seem to be quite a few tasks that request 20+ CPUs with 1 GPU blocking usage for other users to an entire node worth of GPUs. We do need a solution here, be it a warning or something a little more drastic like Pablo mentions.