Issue loading torch-sparse

Dimitrios.Proios · December 14, 2022, 12:09am

Hello I face some issues when loading torch-sparse:

I run with sbatch the following

#!/bin/bash 
#SBATCH --job-name=ad_svm_QCD_WPRIME
#SBATCH --ntasks=10
#SBATCH --cpus-per-task=1
#SBATCH --time=0-00:01:00
#SBATCH --mail-user=dimitrios.proios@unige.ch
#SBATCH --mail-type=FAIL
#SBATCH --partition=private-teodoro-gpu
#SBATCH --output=end_to_end_%j.out
# # SBATCH --workdir="/Users/dproios/work/create_EHR_gra"
#SBATCH --mem=20000
module load CUDA/11.3.1
module load GCC/7.3.0-2.30 OpenMPI/3.1.1
module load PyTables/3.3.0-Python-2.7.11
module load CUDA/11.3.1
module load GCC/10.3.0 
module load OpenMPI/4.1.1
module load PyTorch/1.11.0-CUDA-11.3.1
module load scikit-image/0.18.1-Python-3.9.5
pip install torch -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch_geometric
pip install pyg-lib torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
python -c "import torch_sparse"
python -c "import torch_geometric"

#!/bin/bash 
#SBATCH --job-name=ad_svm_QCD_WPRIME
#SBATCH --ntasks=10
#SBATCH --cpus-per-task=1
#SBATCH --time=0-00:01:00
#SBATCH --mail-user=dimitrios.proios@unige.ch
#SBATCH --mail-type=FAIL
#SBATCH --partition=private-teodoro-gpu
#SBATCH --output=end_to_end_%j.out
# # SBATCH --workdir="/Users/dproios/work/create_EHR_gra"
#SBATCH --mem=20000
module load CUDA/11.3.1
module load GCC/7.3.0-2.30 OpenMPI/3.1.1
module load PyTables/3.3.0-Python-2.7.11
module load CUDA/11.3.1
module load GCC/10.3.0 
module load OpenMPI/4.1.1
module load PyTorch/1.11.0-CUDA-11.3.1
module load scikit-image/0.18.1-Python-3.9.5
pip install torch -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch_geometric
pip install pyg-lib torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
python -c "import torch_sparse"
python -c "import torch_geometric"

And I get this error in the returned result

  File "/opt/ebsofts/PyTorch/1.11.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/torch/_ops.py", line 220, in load_library
    ctypes.CDLL(path)
  File "/opt/ebsofts/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/users/p/proios/.local/lib/python3.9/site-packages/torch_sparse/_version_cuda.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs

Adrien.Albert · December 14, 2022, 11:07am

Hi @Dimitrios.Proios,

Did you try this ?

Maybe it’s missing -f for:

pip install torch_geometric

Dimitrios.Proios · December 16, 2022, 4:53pm

Hello I checked the stackoverflow post it’s the same exact error message but there are no answers applicable provided (1-said check docs (which I did ) 2 - said to reinstall python ). torch sparse also is not found -f option is for file. I managed to “find them within singularity” but I get a CUDA not found error message now.


module load CUDA/11.3.1
srun singularity exec torch_geometric.simg python3 my_file.py --small_part --cuda --epochs 3 --partition=private-teodoro-gpu

But I get this error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver
 from http://www.nvidia.com/Download/index.aspx
srun: error: gpu034: task 0: Exited with exit code 1

I also tried to install with pip
as shown above and with conda


conda create -n py38

conda install pyg -c pyg

# pip3 install pyg-lib torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cu116.html

# pip conda install pytorch pyg -c pytorch -c pyg -c conda-forge

conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

# conda install pytorch torchvision torchaudio cpuonly -c pytorch

conda install pyg -c pyg -c conda-forge

Adrien.Albert · December 19, 2022, 10:21am

Hi,

Dimitrios.Proios:

But I get this error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver
 from http://www.nvidia.com/Download/index.aspx
srun: error: gpu034: task 0: Exited with exit code 1

I didn’t notice but you didn’t allocate a GPU in your sbatch.
You must specify the number of GPU you want to use. With this sbatch you allocate zero GPU.

(yggdrasil)-[alberta@login1 ~]$ srun --partition debug-gpu nvidia-smi
No devices were found
(yggdrasil)-[alberta@login1 ~]$ srun --partition debug-gpu --gpus=1  nvidia-smi
Mon Dec 19 11:15:20 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN RTX    On   | 00000000:1A:00.0 Off |                  N/A |
| 40%   29C    P8     3W / 280W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Could you try again using at leat --gpus=1?