Virtualenv with Tensorflow 2.10 GPU

Hi everyone,

Just for information in case someone new to Baobab/Yggdrasil needs to run TF2.10 in a virtualenv, this is the way to do:

  1. Load Python 3.8.6 with GCC 10.2.0:
module load GCCcore/10.2.0 Python/3.8.6
  1. Create and activate your new env “env_name”
virtualenv ~/env_name
. ~/env_name/bin/activate
  1. Load CUDA and CuDNN librairies
module load fosscuda/2020b
module load cuDNN/8.2.1.32-CUDA-11.3.1

Official TF documentation says that TF2.10 works under another version of CUDA but this set of libraries are just fine.

  1. Install TF2.10 with GPU
pip install tensorflow==2.10
  1. Try your installation

5a. Type this python file ‘test_tfgpu.py’ (eg. vim test_tfgpu.py, type i to type, save and exit with ESC+:x!)

import tensorflow as tf

print("operated on {}, result: {}".format(tf.config.get_visible_devices(), tf.reduce_sum(tf.ones([2,3]))))

5b. Type this sbatch script ‘test_tfgpu.sh’

#!/bin/env bash

#SBATCH --partition=shared-gpu
#SBATCH --time=00:01:00
#SBATCH --gpus=turing:1

export LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib

module load GCCcore/10.2.0 Python/3.8.6
module load fosscuda/2020b
module load cuDNN/8.2.1.32-CUDA-11.3.1
. ~/env_name/bin/activate

# if you need to know the allocated CUDA device, you can obtain it here:
echo $CUDA_VISIBLE_DEVICES

~/env_name/bin/python

echo "python loaded"

srun python test_tfgpu.py

5c. Launch the job

sbatch --ntasks=1 --cpus-per-task=1 --partition=shared-gpu test_tfgpu.sh

5d. Check the output

vim slurm-number_of_your_job.out

Should contain

operated on [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')], result: 6.0