Dear all,
One of the metadata server for the BeeGFS storing the $HOME
directories crashed at around 2020-09-23T14:04:00Z. The service was restored at around 2020-09-23T19:06:00Z
During this time, some of the files in your $HOME
were unavailable, resulting in this message if you tried to log in or access the files:
Could not chdir to home directory /home/users/x/<USERNAME>: Communication error on send
or this one Communication error on send
if you were already connected.
If you had jobs running, they probably failed and needs to be re-submitted to Slurm.
We are sorry for the inconvenience.
All the best,
HPC team
edit: a side effect is that all the nodes were put in drain, preventing to process new jobs. This is fixed as well since 2020-09-23T22:00:00Z