In the module info page it tells me to load GCC/6.4.0-2.28 and OpenMPI/2.1.2.
So I tried to do: module load GCC/6.4.0-2.28 OpenMPI/2.1.2 TensorFlow/1.7.0-Python-3.6.4 srun python helloworld.py
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
So I thought it’s a CUDA issue. I looked at the examples on gitlab and tried running both testTensorFlow_1.7.0.sh testTensorFlow_1.10.1.sh
but they both gave me the same error.
Could someone help me out with just doing the TensorFlow ‘Hello World!’? Thanks!
Note also that the tensorflow installed on baobab require Cuda and to solve some mixing problem between libraries I think Cuda is only installed on GPUs node. This mean that it is not possible to use tensorflow on cpu only on non GPUs node.
FWIW, to ease such workflow, for the next time please fork the GitLab project hpc / softs · GitLab project and send a proper merge request.
You are right about the CUDA system libraries, which are available on GPU nodes only. Here the reason: while moving to CentOS 7 (cf. Baobab migration from CentOS6 to CentOS7 ) we decided to install as less extra software as possible, or, in other words, to have a basic installation shared between all the nodes and the servers as well. CUDA is obviously not part of a basic installation…
From a quick look, it seems that the CUDA application libraries loaded via module do not include libcuda.so* , but only a stub, which should refer to the corresponding system library (my guess is that the latter communicates with the NVIDIA kernel driver).
Moreover, according to the upstream documentation (cf. Build from source | TensorFlow ) while compiling, TensorFlow creates symbolic links to the CUDA system libraries, which de facto renders the compiled TensorFlow not portable.
In their release, they normally have two versions tensorflow-gpu and tensorflow. I believe that the tensorflow cpu only version does not link to CUDA and other GPUs only libraries.
I don’t know if there is a demand for having a CPU only tensorflow installed in the cluster. In all case I think the documentation (https://baobab.unige.ch/enduser/src/enduser/applications.html#tensorflow) should explicitly indicate this module only work on GPUs nodes.