How to request > 8 GPU tasks?

Currently as far as I can tell (I’d be happy to be wrong :smile: ) it is not possible with Baobab to request a (dependent) job with > 8GPUs for an equivalent number of tasks (as this is the max limit of one node). (Related discussion: Interconnect between GPU nodes)

This is the recommended way proposed by pytorch (i.e a 1GPU:1process setup) for distributed workloads. The only workaround is a job-array which requires guess-work and checking the scheduler manually. To be a little more specific: it is currently possible to request say 2x nodes with 8 gpus each, but leveraging shared-gpu to get say 30 nodes is not possible. 30 GPUs is about half the resources to train a SOTA machine learning model :cold_face:

Since --gpus-per-task is not supported in the currently installed SLURM, what is the alternative?