Yggdrasil: PyTorch calculations on GPUs are not working

Dear HPC-Community,

I wanted to try out the Pytorch module (“PyTorch/1.6.0-Python-3.7.4” and “PyTorch/1.4.0-Python-3.7.4”) on Yggdrasil, but unfortunately it did not work as expected, since I got the following return from the debug-GPU (NVIDIA Titan RTX):

Output file content (PyTorch160.o) coming from Yggdrasil

Hostname: gpu001.yggdrasil Python 3.7.4
THCudaCheck FAIL file=…/aten/src/THC/THCGeneral.cpp line=47 error=100 : no CUDA-capable device is detected
x: cpu
tensor([[0.8530, 0.0887, 0.4857],
[0.4920, 0.5917, 0.1314],
[0.6559, 0.5153, 0.5836]])
y: cpu
tensor([[0.7562, 0.6895, 0.2346],
[0.7555, 0.7269, 0.7955],
[0.1314, 0.5913, 0.6127]])
z=x+y: cpu
tensor([[1.6092, 0.7782, 0.7203],
[1.2475, 1.3187, 0.9269],
[0.7873, 1.1065, 1.1963]])
Is CUDA available?: False
Traceback (most recent call last):
File “PyTorchTest.py”, line 18, in
a = torch.rand(3,3, device=‘cuda:0’); b = torch.rand(3,3, device=‘cuda:0’)
File “/opt/ebsofts/PyTorch/1.6.0-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages/torch/cuda/init.py”, line 190, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at …/aten/src/THC/THCGeneral.cpp:47
srun: error: gpu001: task 0: Exited with exit code 1

My Python script has the following form:

Script (PyTorchTest.py)

import torch

#Testing PyTorch
x = torch.rand(3,3); y = torch.rand(3,3)
print(‘x:’, x.device, sep=‘\t’); print(x)
print(‘y:’, y.device, sep=‘\t’); print(y)

z = x+y
print(‘z=x+y:’, z.device, sep=‘\t’); print(z)

#Testing the presence of CUDA
print('Is CUDA available?: ', torch.cuda.is_available())

a = torch.rand(3,3, device=‘cuda:0’); b = torch.rand(3,3, device=‘cuda:0’)
print(‘a:’, a.device, sep=‘\t’); print(a)
print(‘b:’, b.device, sep=‘\t’); print(b)
d = a+b
print(‘d=a+b:’, d.device, sep=‘\t’); print(d)

And I am submitting the job with the modules indicated in the documentation:

Job submission (Run160.sh)

#!/bin/sh

#SBATCH --job-name=PyTorchTest
#SBATCH --output=PyTorch160.o
#SBATCH --time=0-00:01:00

#SBATCH --partition=debug-gpu

#SBATCH --ntasks=1

module load GCC/8.3.0 CUDA/10.1.243 OpenMPI/3.1.4
module load PyTorch/1.6.0-Python-3.7.4

echo “Hostname: $(hostname -f)”
echo $CUDA_VISIBLE_DEVICES

srun python --version
srun python PyTorchTest.py

I tried the same kind of scripts on Baobab and it worked without any problem, even with the NVIDIA RTX 2080Ti (Architecture: Turing, Compute Capability: 7.5) that has the same architecture and “Compute Capabilty” as the debug-GPU on Yggdrasil (NVIDIA Titan RTX, Architecture: Turing, Compute Capability: 7.5) [Source]:

Output file content (PyTorch160.o) coming from Baobab

Hostname: gpu013.cluster
0
Python 3.7.4
x: cpu
tensor([[0.0963, 0.5734, 0.7547],
[0.6056, 0.9441, 0.3672],
[0.7199, 0.7243, 0.1161]])
y: cpu
tensor([[0.3504, 0.9175, 0.3713],
[0.0679, 0.1985, 0.0360],
[0.4479, 0.4846, 0.9477]])
z=x+y: cpu
tensor([[0.4466, 1.4909, 1.1260],
[0.6735, 1.1426, 0.4031],
[1.1678, 1.2090, 1.0638]])
Is CUDA available?: True
a: cuda:0
tensor([[0.3627, 0.1694, 0.3453],
[0.2654, 0.1715, 0.5954],
[0.9009, 0.9110, 0.6349]], device=‘cuda:0’)
b: cuda:0
tensor([[0.4165, 0.4420, 0.2125],
[0.3598, 0.3475, 0.7647],
[0.4621, 0.9076, 0.7042]], device=‘cuda:0’)
d=a+b: cuda:0
tensor([[0.7793, 0.6114, 0.5578],
[0.6252, 0.5190, 1.3601],
[1.3630, 1.8185, 1.3391]], device=‘cuda:0’)

Knowing that the NVIDIA Titan RTX GPU is compatible with CUDA [Source], maybe there is a problem with the drivers, or maybe PyTorch should simply be re-build on Yggdrasil?

Thank you in advance for your help.

Best regards,
Y. ABIPOUR

Well, now it is crystal clear that to be able to use a GPU on the “debug-gpu” partition, one has to request at least one GPU by adding in the submission script (Run160.sh):

#SBATCH --gres=gpu:1

or

#SBATCH --gres=gpu:rtx:1
(It is not necessary to precise that we want one RTX GPU on the Yggdrasil "debug-gpu" partition, since every GPU in the node "gpu001" are at this point of time RTX's)

The reason why my submission script was working in Baobab, is that I was precising that I wanted one RTX GPU, such that I can do a kind of a behaviour comparison between the two clusters with some similar hardware.

By writing the script in the right way, we can finally obtain the end result that we want:

Hostname: gpu001.yggdrasil
0
Python 3.7.4
x: cpu
tensor([[0.4305, 0.5328, 0.5498],
[0.3303, 0.8889, 0.5907],
[0.2051, 0.2027, 0.9379]])
y: cpu
tensor([[0.2488, 0.2539, 0.3774],
[0.4092, 0.4819, 0.6806],
[0.1294, 0.4697, 0.0900]])
z=x+y: cpu
tensor([[0.6793, 0.7866, 0.9271],
[0.7395, 1.3708, 1.2712],
[0.3345, 0.6724, 1.0279]])
Is CUDA available?: True
a: cuda:0
tensor([[0.2630, 0.6004, 0.1138],
[0.9031, 0.7148, 0.4127],
[0.6668, 0.1961, 0.8721]], device=‘cuda:0’)
b: cuda:0
tensor([[0.9830, 0.1081, 0.8388],
[0.5590, 0.4483, 0.8096],
[0.5778, 0.6322, 0.7951]], device=‘cuda:0’)
c=a+b: cuda:0
tensor([[1.2460, 0.7085, 0.9526],
[1.4621, 1.1630, 1.2223],
[1.2446, 0.8284, 1.6673]], device=‘cuda:0’)

The “take-home” lesson is that one should always be as precise as possible about the hardware needs of our tasks in a Task submission script and one should not make arbitrary assumptions about the “inner workings” of a partition.

Hi there,

sorry for the delay, this week problems and the Easter backlog are taking most of the time.

Indeed, as clearly stated in the UNIGE HPC documentation (cf. hpc:slurm [eResearch Doc] ):

Currently on Baobab there are several nodes equipped with GPUs. To request a GPU, it's not enough the specify a partition with nodes having GPUs, you must as well specify how many GPUs and optionnaly the needed type.

To specify how many GPU you request, use the option --gres=gpu:n with n having a value between 1 and the maximum according to the table below. 

I guess you had two different sbatch files, one per cluster.

Do you know that you can share a common sbatch overriding single options at submission time? Thus, in your case:

  • Run160.sh would contain #SBATCH --gres=gpu:1 and #SBATCH --constraint="COMPUTE_TYPE_RTX" , but no partition at all
  • on Baobab: sbatch --partition=shared-gpu Run160.sh
  • on Yggdrasil: sbatch --partition=debug-gpu Run160.sh

BTW, we provide basic sbatch for some software, PyTorch included (cf. https://gitlab.unige.ch/hpc/softs/-/blob/master/p/pytorch/cuda_10.1.105_-is_available-singularity-_PyTorch_1.5.sbatch ).

Thx, bye,
Luca

Hello,

Thank you for your reply and the links.

As you guessed right, I had two different sbatch files, one per cluster.

Moreover, I was not aware that we can overwrite the sbatch option at the moment of the task submission.

Best regards,
Y. ABIPOUR