Hi everyone,
Just for information in case someone new to Baobab/Yggdrasil needs to run TF2.10 in a virtualenv, this is the way to do:
- Load Python 3.8.6 with GCC 10.2.0:
module load GCCcore/10.2.0 Python/3.8.6
- Create and activate your new env “env_name”
virtualenv ~/env_name
. ~/env_name/bin/activate
- Load CUDA and CuDNN librairies
module load fosscuda/2020b
module load cuDNN/8.2.1.32-CUDA-11.3.1
Official TF documentation says that TF2.10 works under another version of CUDA but this set of libraries are just fine.
- Install TF2.10 with GPU
pip install tensorflow==2.10
- Try your installation
5a. Type this python file ‘test_tfgpu.py’ (eg. vim test_tfgpu.py
, type i
to type, save and exit with ESC
+:x!
)
import tensorflow as tf
print("operated on {}, result: {}".format(tf.config.get_visible_devices(), tf.reduce_sum(tf.ones([2,3]))))
5b. Type this sbatch
script ‘test_tfgpu.sh’
#!/bin/env bash
#SBATCH --partition=shared-gpu
#SBATCH --time=00:01:00
#SBATCH --gpus=turing:1
export LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib
module load GCCcore/10.2.0 Python/3.8.6
module load fosscuda/2020b
module load cuDNN/8.2.1.32-CUDA-11.3.1
. ~/env_name/bin/activate
# if you need to know the allocated CUDA device, you can obtain it here:
echo $CUDA_VISIBLE_DEVICES
~/env_name/bin/python
echo "python loaded"
srun python test_tfgpu.py
5c. Launch the job
sbatch --ntasks=1 --cpus-per-task=1 --partition=shared-gpu test_tfgpu.sh
5d. Check the output
vim slurm-number_of_your_job.out
Should contain
operated on [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')], result: 6.0