Current issues on Baobab and Yggdrasil

Dear all,

One of the metadata server for the BeeGFS storing the $HOME directories crashed at around 2020-09-23T14:04:00Z. The service was restored at around 2020-09-23T19:06:00Z

During this time, some of the files in your $HOME were unavailable, resulting in this message if you tried to log in or access the files:
Could not chdir to home directory /home/users/x/<USERNAME>: Communication error on send
or this one Communication error on send if you were already connected.

If you had jobs running, they probably failed and needs to be re-submitted to Slurm.

We are sorry for the inconvenience.

All the best,

HPC team

edit: a side effect is that all the nodes were put in drain, preventing to process new jobs. This is fixed as well since 2020-09-23T22:00:00Z

2 Likes