Cuda problem running a python script file

Primary informations

Username: $bugatti
Cluster: $Yggdrasil

Description

I would like to run a python script using CUDA in order to decrease the running time. The python script uses a package called pyechelle which produces some image files (fits format).

Steps to Reproduce

I log into Yggdrasil. Then I load the modules:
$ ml load GCCcore CUDA Python
I launch my file:
srun python base.py

The base.py file content is here below:

from pyechelle.simulator import Simulator
from pyechelle.sources import Constant
from pyechelle.spectrograph import ZEMAX
sim = Simulator(ZEMAX("MaroonX"))
sim.set_ccd(1)
sim.set_fibers(1)
sim.set_sources(Constant())
sim.set_exposure_time(1.)
# Enable cuda and set a specific random seed.
sim.set_cuda(True)
sim.set_output('02_cuda.fits', overwrite=True)
sim.run()

If I run it without the line sim.set_cuda(True) (which calls CUDA to run the file on the GPU), it works, but if I run it with that line I encounter this error:

CUDA driver library cannot be found.
If you are sure that a CUDA driver is installed,
try setting environment variable NUMBA_CUDA_DRIVER
with the file path of the CUDA driver shared library.
:
srun: error: cpu001: task 0: Exited with exit code 1

I tried to solve it by doing:
$ locate cuda
/opt/eos-folly/include/boost/fiber/cuda
/opt/eos-folly/include/boost/fiber/cuda/waitfor.hpp
/opt/eos-folly/include/boost/predef/language/cuda.h
/usr/include/linux/cuda.h
/usr/lib64/libicudata.so.60
/usr/lib64/libicudata.so.60.3
/usr/share/gtksourceview-3.0/language-specs/cuda.lang
/usr/share/vim/vim80/indent/cuda.vim
/usr/share/vim/vim80/syntax/cuda.vim
/usr/src/kernels/4.18.0-425.3.1.el8.x86_64/include/linux/cuda.h
/usr/src/kernels/4.18.0-425.3.1.el8.x86_64/include/uapi/linux/cuda.h
/usr/src/kernels/4.18.0-477.10.1.el8_8.x86_64/include/linux/cuda.h
/usr/src/kernels/4.18.0-477.10.1.el8_8.x86_64/include/uapi/linux/cuda.h
/usr/src/kernels/4.18.0-477.27.1.el8_8.x86_64/include/linux/cuda.h
/usr/src/kernels/4.18.0-477.27.1.el8_8.x86_64/include/uapi/linux/cuda.h

$ export NUMBA_CUDA_DRIVER=/usr/lib64/libicudata.so.60.3

And run again
srun python base.py
But there is this other error which appears:

File "/home/users/b/bugatti/.local/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 381, in absent_function
    raise CudaDriverError(f'Driver missing function: {fname}')
numba.cuda.cudadrv.error.CudaDriverError: Driver missing function: cuInit

Expected Result

The code should run smoothly and create a file.fits when it’s done. It works if I don’t call CUDA.

Dear @Maddalena.Bugatti

you are lacking a couple of options for srun: partition, time limit, number of cpu(s), number of gpu(s).

As you want to use a gpu, you need to use for example the partition shared-gpu and specify that you want one gpu.

Example: srun --partition shared-gpu --gpus=1 python base.py

Check here for more details: hpc:slurm [eResearch Doc]

Two suggestions:

  • write an sbatch script instead of using srun
  • always specify a version when you load a module: ex ml GCCcore/12.3.0 instead of using the latest one.

Thank you very much @Yann.Sagon, now it works properly.
Cordially,
Maddalena