GPU010 Cuda+ singularity Cuda Runtime error

Pablo.Strasser · June 27, 2019, 8:38am

Sorry used the wrong terminal command in fact the problem is still here:

srun -p shared-gpu-EL7 --nodelist=gpu010 --gres=gpu:1 singularity exec --nv /home/strassp6/scratch/pytorch.simg python /home/strassp6/pytorchCheck.py
THCudaCheck FAIL file=…/aten/src/THC/THCGeneral.cpp line=51 error=999 : unknown error
Traceback (most recent call last):
File “/home/strassp6/pytorchCheck.py”, line 3, in
a = torch.zeros(10,device=cuda)
File “/opt/conda/lib/python3.6/site-packages/torch/cuda/init.py”, line 163, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (999) : unknown error at …/aten/src/THC/THCGeneral.cpp:51
srun: error: gpu010: task 0: Exited with exit code 1