Baobab scheduled maintenance: 13-14 June 2024

Dear users,

As just announced on the baobab-announce@ mailing list, we will perform software and hardware maintenance on the Baobab HPC cluster on June 13th and 14th.

The maintenance will start at 08:00 +0100 and you will receive an email when the maintenance is finished.

The cluster will be completely unavailable during this time and you will not be able to access it at all (not even to retrieve files).

If you submit a job in the meantime, make sure that the expected wall time (duration) does not overlap with the start of the maintenance, or your job will be scheduled after the maintenance.

What will be done during this maintenance

  1. A brand new 128 core login node will be brought into production:
  2. Upgrade OpenOnDemand to version 3.1.0.
  3. Replace the 13 old Infiniband switches with a new EDR model: this will increase the bandwidth between nodes and storage to 100G!
  4. Migrating our slurmdbd server (used for job accounting, job querying with sacct, etc) to a dedicated VM. This may cause some disruption to slurm on Yggdrasil as well, but running jobs shouldn’t be affected.
  5. Upgrade Slurm to version 23.11.6-1.
  6. Update all servers to the latest security and bugfix releases
  7. Re-install all nodes with latest security and bugfixes
  8. Various fixes (replace a battery, correct scripts etc)

As you may have noticed, we’ll be quite busy during this maintenance. We won’t be providing user support this week unless there is an emergency.

Thanks for your understanding.

Best regards,
The HPC Team