Dear all
I have, since yesterday (the same code produced no errors before), an
issue on the yggdrasil cluster. I submit a slurm-job with 20
arrays on public-cpu. The slurm-job loads the following modules:
module load GCC/11.3.0 OpenMPI/4.1.4 R/4.2.1
module load rgdal/1.6-6
While some jobs run perfectly fine, for some others I get the
following error message:
Lmod has detected the following error: The following module(s) are unknown:
“GCC/11.3.0”
It seems that on some cpus the GCC module can not be loaded. Any idea
how to address this issues?
thanks for your hlpe and best wishes, simon
Dear @Simon.Hug
By creating a problem in a category on HPC issues - HPC Community you automatically have a Template to fill, to ensure a good understanding for all and a quick answer, try to fill it as much as possible
Could you give me more information about the node/jobID where you have this problem?
ooppss, this actually happened on baobab and not yggdrasil. best, simon
The job-id was 7644778: of the twenty (1-20) arrays that were started more than half ended with this error message, while a few went through. So for instance 7644778_18 ended with the error message, while 7644778_13 worked perfectly fine.
best wishes
Dear Simon,
Thank you for this information. Indeed, 2 nodes did not have the module available, this has been fixed.
We apologize for the inconvenience.
thanks for looking into this and fixing this problem. best wishes, simon
1 Like