Baobab scheduled maintenance: 22-23 November 2023

Dear users,

as just announced on the baobab-announce@ mailing list, we will do a software and hardware maintenance of the Baobab HPC cluster on 22 and 23 November 2023.

The maintenance will start at 08:00 +0100 and you will receive an email when the maintenance will be over.

The cluster will be totally unavailable during this period, with no access at all (not even to retrieve files).

If you submit a job in the meantime, be sure that the expected wall time (duration) does not overlap with the start of the maintenance or your job will be scheduled after the maintenance.

What should be done during this maintenance:

  • Increase disk space on login2.baobab for TMPDIR
  • several hardware stuff (replace battery, fan, disks)
  • better spread storage servers on our Infiniband switches to enhance the load balancing and minimize the network congestion
  • update BeeGFS to 7.4.1
  • re install Slurm server to Rocky8 and update version to 23.02.06
  • re install all the nodes with latest Rocky8 (8.8)
  • Upgrade the servers with latest security and bug fix

Thanks for your understanding.

Best regards,
the HPC team

Hi,

when will the maintenance end? I tried to log into Baobab but it is still not possible.

Best,
Zhongwei

Dear users, the maintenance is now over, thanks for your patience.

What was done:

  • update BeeGFS to version 7.4.2
  • increase temporary space on login2
  • update Slurm to version 23.02.6 and migrate the server Rocky8
  • use job container for private “/tmp” “/scratch” and “/dev/shm” for every job on the compute nodes. Slurm Workload Manager - job_container.conf
  • better organization of our storage servers (each server on a different infiniband switch)
  • Re install all the nodes with latest Rocky8 (8.8)
  • Upgrade the servers with latest security and bug fix
  • flash all the infiniband switch with the latest firmware
  • replace battery, fan, cables etc.
  • in progress : migrate the remaining application servers still in CentOS7 to Rocky8

Best regards, and as usual, thanks for your feedback (bad or good)!

HPC team