GPU010 Cuda+ singularity Cuda Runtime error

Hi there,

NB, this seems exactly the same problem as Issue with GPU on CentOS7 .

The CUDA upstream deviceQuery does not report any error, test case available at https://gitlab.unige.ch/hpc/softs/tree/ff8b7626113206871ad380ad496b327bc8fa7aa8/c/cuda (launched on gpu010/Slurm-18517239, gpu009/Slurm-18517272 and gpu008/Slurm-18517273 ).

Now back to pythorch:

  1. a simple torch.cuda.device_count works with module:PyTorch/0.3.0-Python-3.6.4, test case available at https://gitlab.unige.ch/hpc/softs/tree/3de4a730f5d8c617e2586fda7058bb7ae0eeb66b/p/pytorch (launched on gpu010/Slurm-18574803, gpu009/Slurm-18574804 and gpu008/Slurm-18574805).
  2. your Pytorch in Singularity test works as well, test case available at https://gitlab.unige.ch/hpc/softs/commit/b9973e982654776742faefd79f016777e9ad56e6 (launched on gpu010/Slurm-18693217, gpu009/Slurm-18693287 and gpu008/Slurm-18693288, after having built the image as you suggested).

@Pablo.Strasser , can you please test again with a clean build, please?

Thx, bye,
Luca