Compilation error on gpu011 with CUDA

Hello,
I have a strange error happening on gpu011 when I execute a CUDA code (OpenCL works fine):

Compilation log:
nvrtc: error: failed to load builtins for compute_72.
(null): liblbm.c:567: NVRTC call
  res
failed with error code 7 (NVRTC_ERROR unknown)
srun: error: gpu011: task 0: Exited with exit code 255

This is not happening on gpu008 nodes for example. I tried to google it with no real success. Do you have any clue on what the problem might be?

Best regards,
Orestis

Hello,

what are you trying to compile? I see a user is currently using the node with the two gpus.

Do you compile from the login2?

Hello,

thank you for your reply.

I compiled a futhark code with the CUDA backend. The same code run with the opencl backend.

It was compiled both on baobab2 and on gpu011. I tried to execute it with srun or directly through ssh on gpu011 with the same outcome. The same code executes on gpu008 for example.

I have used the gpu011 without problem before using singularity and cuda with pytorch.
I have had problem before with the same graphical card (a 2080 ti) on my personal computer when running code compiled with Cuda 9.0 using docker. Using a 1080 on the same machine did work.
The problems was solved when using the up to date Cuda version of 10 at the time.
The gpu on node gpu011 is of a different architecture than (at the best of my knowledge) all the other gpus node on baobab. It is possible that some libraries need to be updated to work with gpu011. On my side using up to date cuda libraries inside singularity or docker I had no problem with this model of gpu.

I hope this help.

Hello,

I’ve installed CUDA 10.0.130 on Baobab. Please check!

module load CUDA/10.0.130

Thanks. I’ll test it later today.

Regards,
Orestis

Looks like it’s still not working unfortunately. I will also contact the author of futhark to see if he has any clue.

Hi there,

What is compute_72 ?

Some links:

I could not find any hint about liblbm , is that specific to your code?

And I guess you have validate that NVRTC works with the upstream samples (cf. cuda-samples/Samples/vectorAdd_nvrtc at 489d9f7b1f812adcabd3cd14988cbfae569b1222 · NVIDIA/cuda-samples · GitHub ), have you?

Thx, bye,
Luca

Hello,

I retried after the maintenance you did recently. Now it’s magically working. Did you update anything?

Thank you for the support.

Best regards,
Orestis

Hello Orestis,

No we didn’t change anything per se.
The node was reinstalled on 19th August 2019 (but the installation is fully automated, and gpu011 was already in CentOS7, so no surprises on that side).

We still don’t know exactly how you are loading the libraries (which versions, etc.), however since Yann asked you to test CUDA/10.0.130, the version CUDA/10.1.105 is now available.
Which one are you using ?

In any case, good for you ! :slight_smile:

Cheers,

Massimo