Hi all,
I have got issues loading some modules on Baobab, I think it boils down to penMPI/4.0.3. OpenMPI/4.0.3 should require GCC/9.3.0, but without CUDA/11.0.2 it fails.
Here the details:
weninger@login2 ~ $ module list
Currently Loaded Modules:
1) GCCcore/9.3.0 2) zlib/1.2.11 3) binutils/2.34 4) GCC/9.3.0
weninger@login2 ~ $ module spider OpenMPI/4.0.3
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OpenMPI: OpenMPI/4.0.3
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Description:
The Open MPI Project is an open source MPI-3 implementation.
You will need to load all module(s) on any one of the lines below before the "OpenMPI/4.0.3" module is available to load.
GCC/9.3.0
GCC/9.3.0 CUDA/11.0.2
Help:
Description
===========
The Open MPI Project is an open source MPI-3 implementation.
More information
================
- Homepage: https://www.open-mpi.org/
weninger@login2 ~ $ module load OpenMPI/4.0.3
Lmod has detected the following error: The following module(s) are unknown: "UCX/1.8.0"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore-cache load "UCX/1.8.0"
Also make sure that all modulefiles written in TCL start with the string #%Module
Executing this command requires loading "UCX/1.8.0" which failed while processing the following module(s):
Module fullname Module Filename
--------------- ---------------
OpenMPI/4.0.3 /opt/ebmodules/all/Compiler/GCC/9.3.0/OpenMPI/4.0.3.lua
If I load OpenMPI/4.0.3 using GCC and CUDA, I can’t load the packages HDF5/1.10.6 and Armadillo/9.900.1.
Here the logs for loading HDF5/1.10.6 (same error for Armadillo)
[weninger@login2.baobab ~]$ module list
Currently Loaded Modules:
1) GCC/9.3.0 4) binutils/2.34 7) cURL/7.69.1 10) CUDA/11.0.2 13) libxml2/2.9.10 16) libevent/2.1.11 19) UCX/1.8.0-CUDA-11.0.2 22) OpenMPI/4.0.3
2) GCCcore/9.3.0 5) ncurses/6.2 8) CMake/3.16.4 11) numactl/2.0.13 14) libpciaccess/0.16 17) Check/0.15.2 20) libfabric/1.11.0 23) pkg-config/0.29.2
3) zlib/1.2.11 6) bzip2/1.0.8 9) CUDAcore/11.0.2 12) XZ/5.2.5 15) hwloc/2.2.0 18) GDRCopy/2.1-CUDA-11.0.2 21) PMIx/3.1.5
[weninger@login2.baobab ~]$ module spider HDF5/1.10.6
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HDF5: HDF5/1.10.6
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Description:
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and
complex data.
You will need to load all module(s) on any one of the lines below before the "HDF5/1.10.6" module is available to load.
GCC/9.3.0 OpenMPI/4.0.3
iccifort/2020.1.217 impi/2019.7.217
Help:
Description
===========
HDF5 is a data model, library, and file format for storing and managing data.
It supports an unlimited variety of datatypes, and is designed for flexible
and efficient I/O and for high volume and complex data.
More information
================
- Homepage: https://portal.hdfgroup.org/display/support
[weninger@login2.baobab ~]$ module load HDF5/1.10.6
Lmod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "HDF5/1.10.6"
Try: "module spider HDF5/1.10.6" to see how to load the module(s).