Yggdrasil scheduled maintenance: 11-12 October 2023

Dear users,

as just announced on the baobab-announce@ mailing list we will do a software and hardware maintenance of the Yggdrasil HPC cluster on 11 and 12 October 2023.

The maintenance will start at 08:00 +0100 and you will receive an email when the maintenance will be over.

The cluster will be totally unavailable during this period, with no access at all (not even to retrieve files).

If you submit a job in the meantime, be sure that the expected wall time (duration) does not overlap with the start of the maintenance or your job will be scheduled after the maintenance.

What should be done during this maintenance:

  1. re install admin1 to Rocky8
  2. Re install all the nodes with latest Rocky8 (8.8)
  3. Upgrade BeeGFS to version 7.4.x
  4. Upgrade the servers with latest security and bug fix
  5. Various hardware fix: replace disks, memory etc.

Thanks for your understanding.

Best regards,
the HPC team

Dear users,

the maintenance is now over and we were able to do what was planed, without TOO much stress! :sweat_smile:

Important: As we have updated the ip address of admin1, we have to wait a little bit until the “/acanas” , “/unige”, “/dpnc” are reachable again. Thanks for your understanding.

We are now using Slurm job_container to have a private “/tmp” “/scratch” and “/dev/shm” for every job on the compute nodes. Slurm Workload Manager - job_container.conf

We upgrade BeeGFS to 7.2.11 and not 7.4.x as planed, due to compatibility issue with external system.

Feel free to let us know if you have any issue.

Best regards

HPC team

1 Like