Hello I am trying to run thundersvm and use it in a python script to send it later on sbatch jobs on baobab but I get an error (logs below) .
This is thundsersvm package https://github.com/Xtra-Computing/
I will try to compile the sources also as alternative but I want help to diagnose why I can’t use it as it is.
Steps to reproduce:
- I load python via the tensorflow module
module load GCC/6.4.0-2.28 OpenMPI/2.1.2
module load TensorFlow/1.7.0-Python-3.6.4
(NOTE the library mentions as requirements the following:
- cmake 2.8 or above
- gcc 4.8 or above for Linux and MacOS
)
- I try install it via pip but on a virtualenv and with non cuda (according to github instructions of the library):
pip install thundersvm
Error logs:
Exception:
Traceback (most recent call last):
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/wheel.py", line 345, in move_wheel_files
clobber(source, lib_dir, True)
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/wheel.py", line 316, in clobber
ensure_dir(destdir)
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/pip/utils/__init__.py", line 83, in ensure_dir
os.makedirs(path)
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/site-packages/thundersvm-0.3.3.dist-info'
Any ideas/recommendations ?
Kind regards
Dear Dimitrios,
maybe you missed something when you created the python virtualenv? I tried to do the installation with a test user:
[testsagon1@login2 ~]$ module load GCC/6.4.0-2.28 OpenMPI/2.1.2
[testsagon1@login2 ~]$ module load TensorFlow/1.7.0-Python-3.6.4
[testsagon1@login2 ~]$ virtualenv --no-site-packages ~/baobab_python_env
Using base prefix '/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4'
New python executable in /home/testsagon1/baobab_python_env/bin/python
Installing setuptools, pip, wheel...done.
[testsagon1@login2 ~]$ . ~/baobab_python_env/bin/activate
(baobab_python_env) [testsagon1@login2 ~]$ pip install thundersvm
Collecting thundersvm
Downloading https://files.pythonhosted.org/packages/21/05/559e34744a8939c2dc70ddf05df426f79ca9bceae7404e6c677db7b9b982/thundersvm-0.3.3-py3-none-any.whl (500kB)
|████████████████████████████████| 501kB 4.1MB/s
Collecting scipy
Downloading https://files.pythonhosted.org/packages/dc/29/162476fd44203116e7980cfbd9352eef9db37c49445d1fec35509022f6aa/scipy-1.4.1-cp36-cp36m-manylinux1_x86_64.whl (26.1MB)
|████████████████████████████████| 26.1MB 8.4MB/s
Requirement already satisfied: numpy in /opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/TensorFlow/1.7.0-Python-3.6.4/lib/python3.6/site-packages (from thundersvm) (1.14.3)
Collecting scikit-learn
Downloading https://files.pythonhosted.org/packages/d1/48/e9fa9e252abcd1447eff6f9257636af31758a6e46fd5ce5d3c879f6907cb/scikit_learn-0.22.1-cp36-cp36m-manylinux1_x86_64.whl (7.0MB)
|████████████████████████████████| 7.1MB 5.8MB/s
Collecting joblib>=0.11
Downloading https://files.pythonhosted.org/packages/28/5c/cf6a2b65a321c4a209efcdf64c2689efae2cb62661f8f6f4bb28547cf1bf/joblib-0.14.1-py2.py3-none-any.whl (294kB)
|████████████████████████████████| 296kB 11.5MB/s
Installing collected packages: scipy, joblib, scikit-learn, thundersvm
Successfully installed joblib-0.14.1 scikit-learn-0.22.1 scipy-1.4.1 thundersvm-0.3.3
(baobab_python_env) [testsagon1@login2 ~]$
As you can see, the installation was successful. By the way, why do you want the version without GPU? We do have GPUS on Baobab?
Hi Dimitrios, Yann
I also tried to set this up and found that I can install it in my virtualenv however when it comes to importing at runtime it cannot find a cuda shared library.
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import thundersvm
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/raine/temp/testenv/lib/python3.6/site-packages/thundersvm/__init__.py", line 12, in <module>
from .thundersvm import *
File "/home/raine/temp/testenv/lib/python3.6/site-packages/thundersvm/thundersvm.py", line 41, in <module>
thundersvm = CDLL(lib_path)
File "/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.9.0: cannot open shared object file: No such file or directory
There exists also a version in pip compiled with cuda 10.0 (to match the version installed on baobab which I loaded) however it is still unable to find the shared library (but now v 10.0).
Any advice would be more than welcome.
Cheers,
Johnny
I want a version with GPU actually. My impresssion that by running:
pip install thundersvm
it is using cuda but please correct me if I am wrong.
Hi there,
Indeed, there is no libcusparse.so.9.0
on Baobab, but there are:
libcusparse.so.9.1
(CUD/9.1.85)
libcusparse.so.9.2
(CUDA/9.2.88)
libcusparse.so.10
(either CUDA/10.1.105 or CUDA/10.1.243)
According to the upstream documentation, installing via PIP without specifying anything requires means “CUDA 9.0 - linux_x86_64” (cf. thundersvm/README.md at 943a0989dc2f17092fa294b1abf354575397949e · Xtra-Computing/thundersvm · GitHub ).
The wheel files available are only 3 (cf. Search results · PyPI ), trying the cuda10 generates another error:
capello@login2:~$ module list
No modules loaded
capello@login2:~$ module load GCC/8.2.0-2.31.1 OpenMPI/3.1.3
capello@login2:~$ module load Python/3.7.2
capello@login2:~$ module load CUDA/10.0.130
capello@login2:~$ virtualenv --no-site-packages test-thundersvm-cuda10
[...]
capello@login2:~$ . test-thundersvm-cuda10/bin/activate
(test-thundersvm-cuda10) capello@login2:~$ pip install thundersvm-cuda10
[...]
Successfully installed joblib-0.14.1 numpy-1.18.1 scikit-learn-0.22.1 scipy-1.4.1 thundersvm-cuda10-0.3.5
(test-thundersvm-cuda10) capello@login2:~$ python
Python 3.7.2 (default, Nov 18 2019, 10:47:22)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import thundersvm
Traceback (most recent call last):
File "", line 1, in
File "/home/users/c/capello/test-thundersvm-cuda10/lib/python3.7/site-packages/thundersvm/__init__.py", line 10, in
from .thundersvm import *
File "/home/users/c/capello/test-thundersvm-cuda10/lib/python3.7/site-packages/thundersvm/thundersvm.py", line 39, in
thundersvm = CDLL(lib_path)
File "/opt/ebsofts/Python/3.7.2-GCCcore-8.2.0/lib/python3.7/ctypes/__init__.py", line 356, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home/users/c/capello/test-thundersvm-cuda10/lib/python3.7/site-packages/thundersvm/libthundersvm.so)
>>>
Time to investigate…
Thx, bye,
Luca
Hello,
We have different issue there. @John.Raine, as @Luca.Capello said, we don’t have Cuda 9.0
on Baobab.
@Luca.Capello tried with a more recent .whl
file, but this file needs a more recent version of GLIBC than we have on Baobab.
Anyway, I rebuilt the .whl
and it seems it’s working, at least the import.
Create a Python virutalenv if not already done. To use it in the future, please remember to load the modules before activating the virtualenv.
[testsagon1@login2 ~]$ module load GCC/6.4.0-2.28 OpenMPI/2.1.2 TensorFlow/1.7.0-Python-3.6.4 CMake
[testsagon1@login2 ~]$ virtualenv --no-site-packages ~/baobab_python_env
[testsagon1@login2 ~]$ . ~/baobab_python_env/bin/activate
Clone the github repo and build thundersvm.
[testsagon1@login2 ~]$ git clone https://github.com/Xtra-Computing/thundersvm.git
[testsagon1@login2 ~]$ cd thundersvm
[testsagon1@login2 ~]$ mkdir build && cd build && cmake .. && make -j
[testsagon1@login2 ~]$ cd ../python
Edit the setup.py if needed, for example to change the version number. It’s not needed here.
Build the wheel.
[testsagon1@login2 ~]$ python3 setup.py bdist_wheel
Install the whl you generated:
[testsagon1@login2 ~]$ pip install ./dist/thundersvm-0.3.4-cp36-cp36m-linux_x86_64.whl
Test
[testsagon1@login2 ~]$ python
Python 3.6.4 (default, Apr 25 2018, 10:28:12)
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import thundersvm
>>>
Best
@Yann.Sagon I replicated your steps and you are right import thundersvm
is working.
But when I am trying to actually run a python script using SVM provided by thunderSVM (following script):
from thundersvm import OneClassSVM
import numpy as np
X= 0.3 * np.random.randn(100, 2)
clf = OneClassSVM(nu=0.1, kernel='rbf',gamma=0.1)
clf.fit(X)
I get the following error which seems to indicate different CUDA versions between runtime and build ( maybe relevant to what @Luca.Capello mentioned?)
2020-01-22 23:06:22,610 FATAL [default] Check failed: [error == cudaSuccess] CUDA driver version is insufficient for CUDA runtime version
2020-01-22 23:06:22,610 WARNING [default] Aborting application. Reason: Fatal log at [/home/users/p/proios0/src/geneva-jet-anomaly/thundersvm/thundersvm/src/thundersvm/thundersvm-scikit.cpp:154]
Aborted```.
@Yann.Sagon did you manage to run on your side?