Update cuDNN/CUDA software stack

Hello,

Recent versions of Tensorflow (>=2.5.0) on PyPI were built using cuDNN 8.1 and CUDA 11.2 (Build from source  |  TensorFlow).

I have tried to use the latest SW stack available on baobab (cuDNN/8.0.4.30-CUDA-11.1.1) with Tensorflow 2.5.0, but that does not seem to work properly when using Ampere cards.

Would it be possible to update the cuDNN/CUDA stack on baobab?

Thank you!

Here it is: New software installed cuDNN/8.2.1.32 and CUDA-11.3.1

Wow, thanks!

I’m testing TensorFlow 2.6.0 with the new stack and everything seems to be working on Ampere nodes…

Great! By curiosity, how do you install/launch TensorFlow? pip, container?

I use venv, here my setup script:

module load GCCcore/10.3.0 Python/3.9.5 cuDNN/8.2.1.32-CUDA-11.3.1
python -m venv sbdenv
source sbdenv/bin/activate
pip install --upgrade pip
pip install jupyterlab scipy tensorflow matplotlib numpy scikit-learn ipympl keras-tuner tensorflow-addons umap-learn tensorboard-plugin-profile tqdm

I have used conda in the past, but I find this approach easier to manage…

Hey Yann,

Tensorflow is now requiring cuDNN v8.6.0:

2023-04-11 10:39:53.416414: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: RET_CHECK failure (tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:618) dnn != nullptr 
2023-04-11 10:39:53.423614: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:417] Loaded runtime CuDNN library: 8.2.1 but source was compiled with: 8.6.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2023-04-11 10:39:53.442479: E tensorflow/compiler/xla/status_macros.cc:57] INTERNAL: RET_CHECK failure (tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:618) dnn != nullptr 

Could you please update the stack once again?

For compatibility, here a list of the modules I am currently loading:

module load GCCcore/10.2.0 Tkinter/3.8.6 Python/3.8.6 cuDNN/8.2.1.32-CUDA-11.3.1 git-lfs/3.1.2 FFmpeg/4.3.1 nodejs/12.19.0

Thank you very much!

Cheers

Hello,

cuDNN 8.6 is now available on both clusters.

Best regards,

1 Like

Great, thank you very much!