Issue with gpu020

Hi, this seems like an infiniband + MPI error, maybe in your code? Remember that GPUs on gpu020 are split, maybe your job requires too much memory? Nvidia A100 Ampere architecture with MIG

1 Like