Hi,
I am trying to run TensorFlow with a GPU node, turns out it seems my TensorFlow cannot recognising the GPU. Could anyone help me with it?
So here is the .sh script where I request a GPU node from Yggdrasil
#!/bin/sh
#SBATCH --job-name=cosmopower # Job name
#SBATCH --partition=shared-gpu # Partition (queue) name
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of tasks (processes)
#SBATCH --gres=gpu:1 # Request 1 GPU
#SBATCH --time=00:15:00 # Time limit (hh:mm:ss)
#SBATCH --cpus-per-task=1
conda activate cp_env
module load GCC/10.3.0 OpenMPI/4.1.1 TensorFlow/2.6.0
module load cuDNN/8.6 CUDA/11.8.0
nvidia-smi
python gpu_test.py
where the gpu_test.py is just print out number of GPUs available like gpus = tf.config.list_physical_devices('GPU') print(f"Num GPUs Available: {len(gpus)}")
From the Nvidia-smi, I can see there is a GPU requested, but the output from the python code says “Num GPUs Available: 0”. Does anyone know where is the issue?
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA TITAN RTX On | 00000000:3D:00.0 Off | N/A |
| 41% 31C P8 13W / 280W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+