Interconnect between GPU nodes

Jason.Ramapuram · April 29, 2020, 10:51pm

Are all the gpu nodes wired via a fast interlink like infiniband?
Or are they limited to 10G / 1G? I wasn’t able to find any info about this here. If some are wired via inifniband, which gpu* nodes would these be?

I’m trying a few pytorch distributed-data-parallel jobs via TCP and just want to know what to expect.

Thanks!

Jason.Ramapuram · May 14, 2020, 7:52pm

@Yann.Sagon answered this in the lunch HPC meetup. The nodes are connected by infiniband, but singularity needs extra work to get it working. Marking this thread as closed and will follow up with another thread regarding distributed training.

Thanks Yann!

Yann.Sagon · May 15, 2020, 5:41am

A quick hack may be to “talk” to the nodes through their infiniband TCP network interface instead of their ethernet network. For example, from node001, you can “talk” to node002 (eth 1G) or node002i (IB 40G). This should probably work out of the box. I think there is only something to do in Singularity if you want to support RDMA.

Jason.Ramapuram · May 15, 2020, 11:54pm

Ah neat! This would be a good workaround for not having the proper driver in Singularity. I do believe pytorch supports RDMA since it is backed into NCCL 2.5+, but not sure what sort of overhead that would have. I can benchmark and report back though

Unfortunately I ran into another issue (sent an email to HPC-support) as the --gpus-per-task is not currently supported and the only way to get >8 gpus is with job-arrays.

Yann.Sagon · May 18, 2020, 7:48am

Hello, indeed, and it’s a change that would kill all the running and pending jobs on Baobab. So unless we have a big issue before, we don’t plan to do the change before launching Yggdrasil. Is there any other way to solve your issue using only gres?

Jason.Ramapuram · May 18, 2020, 8:14am

Is there an equivalent --gres / <other_flag> command that can be used to request 1 GPU per 1 task for a setup requiring > 8 GPUs? I tried the following but it only requested 1 GPU for all the tasks:

#!/bin/bash -l

#SBATCH --job-name=SOTAVAE
#SBATCH --ntasks=9
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --partition=shared-gpu-EL7
#SBATCH --time=12:00:00
#SBATCH --mem=16000
#SBATCH --constraint="COMPUTE_CAPABILITY_6_0|COMPUTE_CAPABILITY_6_1"

srun --ntasks=9 --exclusive --multi-prog distributed.conf

I also tried setting #SBATCH --gres=gpu:9 but as expected this threw a Node does not exist error.

Jason.Ramapuram · May 20, 2020, 2:19am

Moving this discussion to a new thread since the original question of this topic has been answered

Yann.Sagon · January 19, 2021, 5:02pm

Update of my post: it is now possible to use --gpus-per-task on Yggdrasil and Baobab.