"Fair" GPU use in Bamboo

Hi, I have used bamboo for a couple of weeks and found the short queues quite a relief, and while I understand that it is a matter of time before more users migrate their research to bamboo and the queues fill up, I want to raise the issue of fair GPU usage, specially since it is quite easy for a single user (as it seems to be mostly the case at this very moment) to use all GPU nodes thus effectively blocking other people from doing any work.

My particular use case is: I am still developing and debugging my GPU code but I am unable to do a short (5min) test because other people are using all gpu nodes and have 40 or so jobs pending (as was the case yesterday). Shouldn’t there be a safeguard against this? Wouldn’t it be possible to at least ensure that one debug GPU will be available so the max wait time for a single test is 15 min?

I am aware there are fair use policies and limits on queued jobs (10k). It seems 10k pending jobs may be a good number for CPUs but not really for GPUs since it is far greater than the total number of GPUs available.

Thanks in advance for your help.

Dear @daniel.forerosanchez thanks for your feedback.

In fact, gpu001 is available from two partitions: debug-gpu and shared-gpu, which is obviously a problem. As GPUs are quite expensive, it is not a good idea to use the whole GPU node for debugging purposes. What we’ll try to do in the next maintenance is to split the node in two, i.e. reserve 4 of the 8 GPUs for debugging purposes.

In the meantime, you can use debug-gpu on Yggdrasil which is only dedicated to debugging.

Best