as just announced on the baobab-announce@ mailing list, we will do a software and hardware maintenance of the Baobab HPC cluster on 26-27th of April 2023.
The maintenance will start at 08:00 +0100 and you will receive an email when the maintenance will be over.
The cluster will be totally unavailable during this period, with no access at all (not even to retrieve files).
If you submit a job in the meantime, be sure that the expected wall time (duration) does not overlap with the start of the maintenance or your job will be scheduled after the maintenance.
What should be done during this maintenance:
- Upgrade Slurm to the new major version 23.
- Upgrade BeeGFS to version 7.2.9
- Re install all the compute nodes with Rocky8. This is a major upgrade. If you see something unexpected, please let us know.
- Upgrade the uplink ethernet on login2 from 1Gb to 10Gb for internet connection.
- Upgrade the ethernet switches firmware and monitor the ethernet infrastructure to increase reliability
- Replace Battery, memory, disks etc on servers/nodes
- Move a couple of compute nodes to make enough room to install a GPU server
Thanks for your understanding.
the HPC team
Is the maintenance still ongoing? In practice it was supposed to end yesterday (27th of April) but I cannot access the cluster. Is this expected?
Hi @Helena.BachCalsamiglia ,
We apologize for any inconvenience caused, but please be advised that maintenance is currently underway. This is a major update involving the migration from CentOS to Rocky OS, and we are taking every precaution to ensure a seamless transition. As a result, the process may take some time.
Rest assured that, as it is mentionned, we will notify you via email once the maintenance is complete and everything is back to normal. In the meantime, we would like to remind you that the second cluster, Yggdrasil, is available should you need to launch any calculations.
Thank you for your patience and understanding during this time.
the Baobab maintenance is now over, thanks for your patience. The duration was a little bit longer than expected, we had several issues with the upgrade from CentOS7 to Rocky Linux 8.
Do not hesitate to let us know any strange behavior you may notice with this new OS.
We are unable to access the cluster through neither Putty nor FileZilla. Is this expected?
Apr 28 16:37:39 xxx sshd: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=xxx.xxx.xxx.xxxx user=$YOURUSERNAME
Apr 28 16:37:41 xxx sshd: error: PAM: Authentication failure for $YOURUSERNAME from 10.20.XXX.XXX
Could you test your password here: Password Self Service
We tested and the password works.