Problem with Torchvision module

Hello,

I’m trying to use torchvision for a project, but I encounter a problem while loading it in my python script.

I load the following modules:
module load GCC/6.4.0-2.28 OpenMPI/2.1.2 Python/3.8.2 PyTorch/0.3.0-Python-3.6.4 GCCcore/9.3.0 torchvision/0.2.1-Python-3.6.4

But my job always end with an error when I try to load Torchvision in my script:

Inactive Modules:

  1. hwloc/1.11.8 2) numactl/2.0.11

Due to MODULEPATH changes, the following have been reloaded:

  1. binutils/2.28

The following have been reloaded with a version change:

  1. CUDA/10.1.243 => CUDA/9.1.85 2) GCCcore/6.4.0 => GCCcore/9.3.0

Traceback (most recent call last):
File “NN.py”, line 3, in
import torchvision.models as models
File “/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/torchvision/0.2.1-Python-3.6.4/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/init.py”, line 2, in
File “/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/torchvision/0.2.1-Python-3.6.4/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/init.py”, line 1, in
File “/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/torchvision/0.2.1-Python-3.6.4/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/lsun.py”, line 2, in
File “/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/torchvision/0.2.1-Python-3.6.4/lib/python3.6/site-packages/Pillow-5.1.0-py3.6-linux-x86_64.egg/PIL/Image.py”, line 60, in
File “/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/torchvision/0.2.1-Python-3.6.4/lib/python3.6/site-packages/Pillow-5.1.0-py3.6-linux-x86_64.egg/PIL/_imaging.py”, line 7, in
File “/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/torchvision/0.2.1-Python-3.6.4/lib/python3.6/site-packages/Pillow-5.1.0-py3.6-linux-x86_64.egg/PIL/_imaging.py”, line 6, in bootstrap
File “/opt/ebsofts/MPI/GCC/6.4.0-2.28/OpenMPI/2.1.2/Python/3.6.4/lib/python3.6/imp.py”, line 343, in load_dynamic
return _load(spec)
ImportError: libtiff.so.3: cannot open shared object file: No such file or directory
srun: error: node001: task 0: Exited with exit code 1

I tried to change the version of python 3 used, but it didn’t chage the result.

Bests,

Guy-Raphaël Stauffer

Hi there,

Have you loaded one of the libtiff modules?

capello@login2:~$ module spider libtiff

----------------------------------------------------------------------------------
  LibTIFF:
----------------------------------------------------------------------------------
    Description:
      tiff: Library and tools for reading and writing TIFF data files

     Versions:
        LibTIFF/4.0.4
        LibTIFF/4.0.6
        LibTIFF/4.0.7
        LibTIFF/4.0.8
        LibTIFF/4.0.9
        LibTIFF/4.0.10
        LibTIFF/4.1.0
[...]
capello@login2:~$ 

Given your toolchain (foss/2018a ), you should load LibTIFF/4.0.9 :

capello@login2:~$ module load foss/2018a
capello@login2:~$ module load LibTIFF/4.0.9
capello@login2:~$ module list

Currently Loaded Modules:
 1) GCCcore/6.4.0    6) OpenMPI/2.1.2                    11) zlib/1.2.8
 2) binutils/2.28    7) OpenBLAS/0.2.20                  12) XZ/5.2.2
 3) GCC/6.4.0-2.28   8) FFTW/3.3.7                       13) libxml2/2.9.8
 4) numactl/2.0.11   9) ScaLAPACK/2.0.2-OpenBLAS-0.2.20  14) LibTIFF/4.0.9
 5) hwloc/1.11.8    10) foss/2018a



capello@login2:~$ 

However, this module does not provide libtiff.so.3 , but a more recent one:

capello@login2:~$ module show LibTIFF/4.0.9 2>&1 | \
 grep LIBRARY
prepend_path("LD_LIBRARY_PATH","/opt/ebsofts/Compiler/GCCcore/6.4.0/LibTIFF/4.0.9/lib")
prepend_path("LIBRARY_PATH","/opt/ebsofts/Compiler/GCCcore/6.4.0/LibTIFF/4.0.9/lib")
capello@login2:~$ find /opt/ebsofts/Compiler/GCCcore/6.4.0/LibTIFF/4.0.9/lib -type f -name libtiff.so.3
capello@login2:~$ find /opt/ebsofts/Compiler/GCCcore/6.4.0/LibTIFF/4.0.9/lib -type f -name libtiff.so\*
/opt/ebsofts/Compiler/GCCcore/6.4.0/LibTIFF/4.0.9/lib/libtiff.so.5.3.0
capello@login2:~$ 

We need to recompile some modules, I will come back to you ASAP.

Thx, bye,
Luca

Thank You for the quick answer.

I tried to load one libtiff module, but, as you said, it didn’t provide libtiff.so.3, so the error message didn’t change.

bests,

Guy-Raphaël Stauffer

Hello,

is there any news on the modules recompilations ?

bests,

Guy-Raphaël Stauffer

Hi there,

Sorry for the delay, the error you get can be “solved” simply loading Pillow/5.0.0-Python-3.6.4 (which has a correct module dependency on libTIFF/4.0.9 ) after torchvision/0.2.1-Python-3.6.4 , but then there is another deeper error:

capello@login2:~$ module purge
capello@login2:~$ module load GCC/6.4.0-2.28  OpenMPI/2.1.2
capello@login2:~$ module load Python/3.6.4

The following have been reloaded with a version change:
  1) zlib/1.2.8 => zlib/1.2.11

capello@login2:~$ module load PyTorch/0.3.0-Python-3.6.4
capello@login2:~$ module load torchvision/0.2.1-Python-3.6.4
capello@login2:~$ module load Pillow/5.0.0-Python-3.6.4
capello@login2:~$ python
Python 3.6.4 (default, Apr 25 2018, 10:28:12) 
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchvision
[...]
ImportError: liblzma.so.0: cannot open shared object file: No such file or directory
>>> 

I am recompiling Python/3.6.4

Thx, bye,
Luca

Hi there,

For whatever reason, I have troubles recompiling Python/3.6.4 , thus in the meantime I tried installing on Python/3.6.6 the latest torchvision via PIP (cf. https://baobab.unige.ch/enduser/src/enduser/applications.html#custom-python-lib ):

capello@login2:~$ module load GCC/7.3.0-2.30  CUDA/9.2.88  OpenMPI/3.1.1
capello@login2:~$ module load Python/3.6.6
capello@login2:~$ virtualenv --no-site-packages ~/test-torchvision-Python-3.6.6
Using base prefix '/opt/ebsofts/Python/3.6.6-fosscuda-2018b'
New python executable in /home/users/c/capello/test-torchvision-Python-3.6.6/bin/python
Installing setuptools, pip, wheel...done.
capello@login2:~$ . ~/test-torchvision-Python-3.6.6/bin/activate
(test-torchvision-Python-3.6.6) capello@login2:~$ pip install torchvision==0.2.2.post3 
Collecting torchvision==0.2.2.post3
[...]
Installing collected packages: pillow, numpy, six, future, torch, torchvision
Successfully installed future-0.18.2 numpy-1.19.2 pillow-7.2.0 six-1.15.0 torch-1.6.0 torchvision-0.2.2.post3
(test-torchvision-Python-3.6.6) capello@login2:~$ python
Python 3.6.6 (default, Apr 15 2020, 16:42:51) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchvision
>>> quit()
(test-torchvision-Python-3.6.6) capello@login2:~$ pip install torchvision==0.7.0
Collecting torchvision==0.7.0
[...]
Successfully installed torchvision-0.7.0
(test-torchvision-Python-3.6.6) capello@login2:~$ python
Python 3.6.6 (default, Apr 15 2020, 16:42:51) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchvision
>>> quit()
capello@login2:~$ 

@Guy-Raphael.Stauffer , could this be a solution or you strictly needs Python/3.6.4 ?

Thx, bye,.
Luca

I don’t need a special version of python, so python 3.6.6 could be a solution for me

Hello,

I tried with python 3.6.6, and everything works fine.

Thank you very much for your help.

best regards,

Guy-Raphaël