Not able to access a directory

I am running a code on baobab. Today, all of a sudden I was unable to cd into the directory on my account where my code is. When I ls in my home directory, I get the following message
ls: cannot access “directory name”: Communication error on send.

When I try to download some files from this directory on baobab to my local machine, I get the following error: Communication error on send

Can you advise me what to do. I am in a real panic mode, since I dont have a copy of my code anywhere else.

Thank you,
Azadeh

1 Like

I have got the same issue. Two of my collaborators got it as well.

As a consolation, I suspect the data is just fine, since I also happen to have one shell which is already in the home directory, and in there I can view and even edit files. But I can not do “cd $PWD” which is funny.

Regards

Volodymyr

Thanks, so we just have to wait, but the files should be safe.

1 Like

That’s just my guess, but it seems so.

It is indeed unfortunate and it seems to be quite a critical issue, affecting likely all users and potentially breaking quite some jobs.
But who knows.

V.

Hi there,

One of the beegfs-meta servers segfaulted, I have restarted it (more investigation tomorrow during working time) and thus everything has been back to normal since 21:06, sorry for the inconvenience.

Two notes:

  1. it was a software error, thus the files were safe.
    Moreover, given that we were talking about ${HOME} space, there are daily backups for everything except the ${HOME}/scratch symlink
  2. not all the users were affected, but only those who were accessing files stored on the crashed server.
    However, there is no way to know in advance who is affected, given that ${HOME} (and ${SCRATCH} as well) are on BeeGFS, which is a distributed filesystem.

Again, sorry for the inconvenience.

Thx, bye,
Luca

1 Like

So all three of our group users had the bad luck! It happens!

I confirm that all looks good for us now.

Thanks a lot!

Volodymyr

Dear @Volodymyr.Savchenko:

BeeGFS a distributed storage, user’s files are spread on various servers and big files are stored by chunk on various servers. It means that as soon as you try to access a chunk of file or a file on the faulty server, you’ll face the issue. Sooner or later everyone is accessing all the Baobab storage servers.

Best

1 Like

Thank you for clarification and sorting out the issue. All looks good now.

Azadeh