Error installing thundersvm (or other packages using cuda)

Hello I am trying to run thundersvm and use it in a python script to send it later on sbatch jobs on baobab but I get an error (logs below) .

This is thundsersvm package https://github.com/Xtra-Computing/

I will try to compile the sources also as alternative but I want help to diagnose why I can’t use it as it is.

Steps to reproduce:

  1. I load python via the tensorflow module
module load GCC/6.4.0-2.28  OpenMPI/2.1.2
module load TensorFlow/1.7.0-Python-3.6.4

(NOTE the library mentions as requirements the following:

  • cmake 2.8 or above
  • gcc 4.8 or above for Linux and MacOS
    )
  1. I try install it via pip but on a virtualenv and with non cuda (according to github instructions of the library):
pip install thundersvm

Error logs:


Exception:
Traceback (most recent call last):
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/req/req_set.py", line 784, in install
    **kwargs
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/req/req_install.py", line 851, in install
    self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
    isolated=self.isolated,
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/wheel.py", line 345, in move_wheel_files
    clobber(source, lib_dir, True)
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/wheel.py", line 316, in clobber
    ensure_dir(destdir)
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/utils/__init__.py", line 83, in ensure_dir
    os.makedirs(path)
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/thundersvm-0.3.3.dist-info'

Any ideas/recommendations ?

Kind regards

Dear Dimitrios,

maybe you missed something when you created the python virtualenv? I tried to do the installation with a test user:

[testsagon1@login2 ~]$ module load GCC/6.4.0-2.28  OpenMPI/2.1.2
[testsagon1@login2 ~]$ module load TensorFlow/1.7.0-Python-3.6.4
[testsagon1@login2 ~]$ virtualenv --no-site-packages ~/baobab_python_env
Using base prefix '/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4'
New python executable in /home/testsagon1/baobab_python_env/bin/python
Installing setuptools, pip, wheel...done.
[testsagon1@login2 ~]$ . ~/baobab_python_env/bin/activate
(baobab_python_env) [testsagon1@login2 ~]$ pip install thundersvm
Collecting thundersvm
  Downloading https://files.pythonhosted.org/packages/21/05/559e34744a8939c2dc70ddf05df426f79ca9bceae7404e6c677db7b9b982/thundersvm-0.3.3-py3-none-any.whl (500kB)
     |████████████████████████████████| 501kB 4.1MB/s 
Collecting scipy
  Downloading https://files.pythonhosted.org/packages/dc/29/162476fd44203116e7980cfbd9352eef9db37c49445d1fec35509022f6aa/scipy-1.4.1-cp36-cp36m-manylinux1_x86_64.whl (26.1MB)
     |████████████████████████████████| 26.1MB 8.4MB/s 
Requirement already satisfied: numpy in /opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/TensorFlow/1.7.0-Python-3.6.4/lib/python3.6/site-packages (from thundersvm) (1.14.3)
Collecting scikit-learn
  Downloading https://files.pythonhosted.org/packages/d1/48/e9fa9e252abcd1447eff6f9257636af31758a6e46fd5ce5d3c879f6907cb/scikit_learn-0.22.1-cp36-cp36m-manylinux1_x86_64.whl (7.0MB)
     |████████████████████████████████| 7.1MB 5.8MB/s 
Collecting joblib>=0.11
  Downloading https://files.pythonhosted.org/packages/28/5c/cf6a2b65a321c4a209efcdf64c2689efae2cb62661f8f6f4bb28547cf1bf/joblib-0.14.1-py2.py3-none-any.whl (294kB)
     |████████████████████████████████| 296kB 11.5MB/s 
Installing collected packages: scipy, joblib, scikit-learn, thundersvm
Successfully installed joblib-0.14.1 scikit-learn-0.22.1 scipy-1.4.1 thundersvm-0.3.3
(baobab_python_env) [testsagon1@login2 ~]$

As you can see, the installation was successful. By the way, why do you want the version without GPU? We do have GPUS on Baobab?

Hi Dimitrios, Yann

I also tried to set this up and found that I can install it in my virtualenv however when it comes to importing at runtime it cannot find a cuda shared library.

[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import thundersvm
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/raine/temp/testenv/lib/python3.6/site-packages/thundersvm/__init__.py", line 12, in <module>
    from .thundersvm import *
  File "/home/raine/temp/testenv/lib/python3.6/site-packages/thundersvm/thundersvm.py", line 41, in <module>
    thundersvm = CDLL(lib_path)
  File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.9.0: cannot open shared object file: No such file or directory

There exists also a version in pip compiled with cuda 10.0 (to match the version installed on baobab which I loaded) however it is still unable to find the shared library (but now v 10.0).

Any advice would be more than welcome.

Cheers,
Johnny

I want a version with GPU actually. My impresssion that by running:
pip install thundersvm it is using cuda but please correct me if I am wrong.

Hi there,

Indeed, there is no libcusparse.so.9.0 on Baobab, but there are:

  • libcusparse.so.9.1 (CUD/9.1.85)
  • libcusparse.so.9.2 (CUDA/9.2.88)
  • libcusparse.so.10 (either CUDA/10.1.105 or CUDA/10.1.243)

According to the upstream documentation, installing via PIP without specifying anything requires means “CUDA 9.0 - linux_x86_64” (cf. thundersvm/README.md at 943a0989dc2f17092fa294b1abf354575397949e · Xtra-Computing/thundersvm · GitHub ).

The wheel files available are only 3 (cf. Search results · PyPI ), trying the cuda10 generates another error:

capello@login2:~$ module list
No modules loaded
capello@login2:~$ module load GCC/8.2.0-2.31.1  OpenMPI/3.1.3
capello@login2:~$ module load Python/3.7.2
capello@login2:~$ module load CUDA/10.0.130
capello@login2:~$ virtualenv --no-site-packages test-thundersvm-cuda10
[...]
capello@login2:~$ . test-thundersvm-cuda10/bin/activate
(test-thundersvm-cuda10) capello@login2:~$ pip install thundersvm-cuda10
[...]
Successfully installed joblib-0.14.1 numpy-1.18.1 scikit-learn-0.22.1 scipy-1.4.1 thundersvm-cuda10-0.3.5
(test-thundersvm-cuda10) capello@login2:~$ python
Python 3.7.2 (default, Nov 18 2019, 10:47:22) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import thundersvm
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/users/c/capello/test-thundersvm-cuda10/lib/python3.7/site-packages/thundersvm/__init__.py", line 10, in 
    from .thundersvm import *
  File "/home/users/c/capello/test-thundersvm-cuda10/lib/python3.7/site-packages/thundersvm/thundersvm.py", line 39, in 
    thundersvm = CDLL(lib_path)
  File "/opt/ebsofts/Python/3.7.2-GCCcore-8.2.0/lib/python3.7/ctypes/__init__.py", line 356, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home/users/c/capello/test-thundersvm-cuda10/lib/python3.7/site-packages/thundersvm/libthundersvm.so)
>>> 

Time to investigate…

Thx, bye,
Luca

Hello,

We have different issue there. @John.Raine, as @Luca.Capello said, we don’t have Cuda 9.0 on Baobab.
@Luca.Capello tried with a more recent .whl file, but this file needs a more recent version of GLIBC than we have on Baobab.

Anyway, I rebuilt the .whl and it seems it’s working, at least the import.

Create a Python virutalenv if not already done. To use it in the future, please remember to load the modules before activating the virtualenv.

[testsagon1@login2 ~]$ module load GCC/6.4.0-2.28  OpenMPI/2.1.2 TensorFlow/1.7.0-Python-3.6.4 CMake
[testsagon1@login2 ~]$ virtualenv --no-site-packages ~/baobab_python_env
[testsagon1@login2 ~]$ . ~/baobab_python_env/bin/activate

Clone the github repo and build thundersvm.

[testsagon1@login2 ~]$ git clone https://github.com/Xtra-Computing/thundersvm.git
[testsagon1@login2 ~]$ cd thundersvm
[testsagon1@login2 ~]$ mkdir build && cd build && cmake .. && make -j
[testsagon1@login2 ~]$ cd ../python

Edit the setup.py if needed, for example to change the version number. It’s not needed here.

Build the wheel.

[testsagon1@login2 ~]$ python3 setup.py bdist_wheel

Install the whl you generated:

[testsagon1@login2 ~]$ pip install ./dist/thundersvm-0.3.4-cp36-cp36m-linux_x86_64.whl

Test

[testsagon1@login2 ~]$ python
Python 3.6.4 (default, Apr 25 2018, 10:28:12) 
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import thundersvm
>>>

Best

@Yann.Sagon I replicated your steps and you are right import thundersvm is working.

But when I am trying to actually run a python script using SVM provided by thunderSVM (following script):

from thundersvm import OneClassSVM
import numpy as np

X= 0.3 * np.random.randn(100, 2)
clf = OneClassSVM(nu=0.1, kernel='rbf',gamma=0.1)
clf.fit(X)

I get the following error which seems to indicate different CUDA versions between runtime and build ( maybe relevant to what @Luca.Capello mentioned?)

2020-01-22 23:06:22,610 FATAL [default] Check failed: [error == cudaSuccess]  CUDA driver version is insufficient for CUDA runtime version

2020-01-22 23:06:22,610 WARNING [default] Aborting application. Reason: Fatal log at [/home/users/p/proios0/src/geneva-jet-anomaly/thundersvm/thundersvm/src/thundersvm/thundersvm-scikit.cpp:154]
Aborted```.

@Yann.Sagon did you manage to run on your side?