Dear users,
as just announced on the baobab-announce@ mailing list, we will do a software and hardware maintenance of the Baobab HPC cluster on Tuesday 30th of November until Friday 3rd of December 10:00. This is a 3day maintenance.
The maintenance will start at 08:00 +0100 and you will receive an email when the maintenance will be over.
The cluster will be totally unavailable during this period, with no access at all (not even to retrieve files).
If you submit a job in the meantime, be sure that the expected wall time (duration) does not overlap with the start of the maintenance or your job will be scheduled after the maintenance.
What should be done during this maintenance:
- replace old master server still in CentOS6 by admin server in CentOS7. This means we’ll get ride of CentOS6 completely after that, yeah!
- upgrade of the cluster job scheduler (Slurm 21.08 with REST API)
- flash storage RAID cards with latest firmware
- replace storage RAID cards batteries
- security and bugfix upgrade, re-installation of all the nodes
- security: on login2, we’ll close all the ports from outside, except ssh.
Thanks for your understanding.
Best regards,
the HPC team
Dear users,
the maintenance is now over! The duration was slightly longer than expected due to issue with nodes re installation. We still have a couple of cpus and gpus down that we need to fix manually.
What are the changes:
- our old main server “master” still in CentOS6 is now ready to be decomissioned, and is now replaced by “admin1” which is in CentOS7. This was a major change, hard work done by @Luca.Capello for months, many thanks to him!
- we replaced batteries on RAID controllers
- SLURM upgraded to major version 21.08.2 and REST API enabled.
- restrict incoming connections to login2.baobab. Only ssh is allowed.
- hostname are now in the format nodeXXX.baobab
- Re installation of every nodes with latest CentOS7 version.
- Do not rely on the host IPs: they are not fixed anymore and may change.
-
iface is broken until further notice as it was hosted on our old master server and not compatible with latest Slurm version.
Time to get some rest, please be kind and don’t break anything, at least during a couple of days!
As usual, thanks for your feedback if you notice something not working as expected. Keep in mind that if your messy script isn’t working as expected, maybe it worth to check your script first before accuse us.
Best
HPC team