Baobab: Login node down
Dear users,
The login node on Baobab have crashed. The server have been rebooted and is available again.
We apologize for any inconveniance caused
Thank you for your understanding.
Status : Solved 
start: 2024-07-21T18:42:00Z
end:Invalid date
1 Like
Bamboo Scratch Storage Unavailable
Dear HPC Users,
The scratch storage on Bamboo is currently unavailable due to an ongoing issue. Our team has already contacted the provider and we are actively working with them to resolve the situation as quickly as possible.
Please note that the scratch storage have been unmounted on compute and login nodes and will remain unavailable until further notice. We will keep you updated as soon as we have more information on the situation.
Thank you for your understanding,
Best Regards,
Status : Solved 
start: 2024-09-10T22:33:00Z
end:2024-09-26T07:33:00Z
Update: the vendor will do an intervention the 25th of September to fix the issue.
The service is back in production without data loss!
Yggdrasil nodes unavailable
Dear HPC Users,
Yggdrasil is currently experiencing issues with its electrical power supply, which has resulted in a reduced number of available nodes on the cluster.
Electricians are working to resolve the issue.
Thank you for your understanding.
Best Regards,
Status : Solved 
start: 2024-09-13T21:30:00Z
end: 2024-09-17T12:24:00Z
Dear HPC Users,
Yggdrasil is currently experiencing issues with its electrical power supply, which has resulted in a reduced number of available nodes on the cluster.
Same issue as mid September. We’ll check with the datacenter manager what is going on.
Thank you for your understanding.
Best Regards,
Status : Partially solved 
start: 2024-09-27T22:02:00Z
stop: 2024-09-30T09:45:00Z
edit: Electrical cabling was modified wrongly on Yggdrasil without notice to us by someone at Astro. Astro IT team is reverting the change. This is a partial workaround as it appears we still have an overload issue that has to be solved.
Dear HPC Users,
We’ve set all the nodes in drain in every cluster. As we have an issue with scratch storage, we need to upgrade scripts on every node. No worries, as soon as a node is upgraded, we’ll resume it.
Thank you for your understanding.
Best Regards,
Status : Solved 
start: 2024-09-29T22:02:00Z
stop: 2024-10-02T22:02:00Z
Dear HPC Users,
Bamboo cluster is currently experiencing issues with quota on home filesystem. The symptom are that the disk usage may be incorrect. We are investigating.
Thank you for your understanding.
Best Regards,
Status : Solved 
start: 2024-10-16T22:02:00Z
stop: 2024-11-07T23:12:00Z
Dear HPC Users,
Baobab cluster is currently experiencing issues with home storage. We restarted the servers this morning and will now investigate what was the reason of the crash.
Thank you for your understanding.
Best Regards,
Status : Solved 
start: 2024-10-20T22:02:00Z
stop: 2024-11-08T23:16:00Z
Bamboo Scratch Storage Unavailable
Dear users,
We have some problems with scratch storage, we are investigation to find a solution.
For the time being, scratch storage is not available on Bamboo.
We’ll keep you informed, thank you for your understanding.
Update :
Disk enclosures have been flashed to avoid the bug.
All scratch storage is now available in read/write and no data have been lost.
Kind regards
Status : Solved 
start: 2024-12-08T14:38:00Z
end: 2025-01-10T13:00:00Z
Dear users,
We have some problems with slurm controller on Baobab, we are investigation to find a solution.
For the time being, running jobs are continuing, but no new job can start and user commands such as sinfo squeue aren’t working.
We’ll keep you informed, thank you for your understanding.
Kind regards
Status : Solved 
start: 2024-12-18T10:00:00Z
end: 2024-12-19T10:00:00Z