Yggdrasil scheduled maintenance: 22-23th of February 2023

Dear users,

as just announced on the baobab-announce@ mailing list, we will do a software and hardware maintenance of the Yggdrasil HPC cluster on 22.02.2023 and 23.02.2023.

The maintenance will start at 08:00 +0100 and you will receive an email when the maintenance will be over.

The cluster will be totally unavailable during this period, with no access at all (not even to retrieve files).

If you submit a job in the meantime, be sure that the expected wall time (duration) does not overlap with the start of the maintenance or your job will be scheduled after the maintenance.

What should be done during this maintenance:

  1. update Slurm to version 22.05.8
  2. update BeeGFS to version 7.2.8
  3. add two new meta servers on scratch filesytem to improve performance
  4. enable hard quota on home directory. The limit will be the same as Baobab: 1TB per user.
  5. replace faulty HDD and battery for RAID controlers.
  6. update the servers with latest security patches and bugfix, update CUDA driver to version 525.60.13
  7. re install all the nodes with latest security patches and bugfix.

Thanks for your understanding.

Best regards,
the HPC team

Dear users,

the Yggdrasil maintenance is now over.

What was done:

  • update Slurm to version 22.05.8

  • update BeeGFS to version 7.2.8

  • replaced batteries on BeeGFS scratch servers

  • doubled the numbers of BeeGFS meta services to improve performances on home and scratch filesystems

  • tweaked the BeeGFS config to improve the performances in some case

  • updated all the servers with latest security and bugfixes patches

  • re installed all the nodes

  • many different fixes (disk replacement, etc).

Important news about Baobab: please check the current status here [2023] Current issues on HPC Cluster

Best regards

HPC team