Module fails to load on some arrays on Baobab

Dear all

I have, since yesterday (the same code produced no errors before), an
issue on the yggdrasil cluster. I submit a slurm-job with 20
arrays on public-cpu. The slurm-job loads the following modules:

module load GCC/11.3.0 OpenMPI/4.1.4 R/4.2.1
module load rgdal/1.6-6

While some jobs run perfectly fine, for some others I get the
following error message:

Lmod has detected the following error: The following module(s) are unknown:

It seems that on some cpus the GCC module can not be loaded. Any idea
how to address this issues?

thanks for your hlpe and best wishes, simon

Dear @Simon.Hug

By creating a problem in a category on HPC issues - HPC Community you automatically have a Template to fill, to ensure a good understanding for all and a quick answer, try to fill it as much as possible :pray:t3:

Could you give me more information about the node/jobID where you have this problem?

ooppss, this actually happened on baobab and not yggdrasil. best, simon

The job-id was 7644778: of the twenty (1-20) arrays that were started more than half ended with this error message, while a few went through. So for instance 7644778_18 ended with the error message, while 7644778_13 worked perfectly fine.

best wishes

Dear Simon,

Thank you for this information. Indeed, 2 nodes did not have the module available, this has been fixed.

We apologize for the inconvenience.

thanks for looking into this and fixing this problem. best wishes, simon

1 Like