I/O errors on Yggdrasil scratch

Primary informations

Username: coppinp
Cluster: Yggdrasil

Description

I am unable to create any new file on Yggdrasil scratch. My colleagues are experiencing the same issue.

Steps to Reproduce

cd /srv/beegfs/scratch/users/c/coppinp
touch test_file

Expected Result

Create dummy file

Actual Result

Produces the following error:

touch: setting times of ‘test_file’: Remote I/O error

I was just writing a report myself, but you have been faster.
The problem is for reading files as well and happens for other users, too.

Last times this happened the file system got restarted. But when checking the list of issues, it looks like the same issue appears again and again, while it feels like the time between two cases shortens, thus it happens more and more often.

Hence, we need a more robust solution then every time restarting the file system for the future. Please, HPC team try to investigate this and find a permanent solution.

Dear users,

Thanks for reporting this issue, after checking the scratch2 server was on error due to disk failed. Server is now working again after reboot. I will create a ticket provider side to report this hardware problem.

Best regards,

Unfortunately, the I/O errors are back again.

1 Like

Dear Matthias,

An important hardware problem occurs on scratch Yggdrasil.
At this time, I just restarted the fs.

You can now work as usual, then I will continue to analyze logs to retrace the problem source.

Best regards,

1 Like

Hello,

I wanted to notify that I keep getting errors “Communication error on send” on Yggdrasil although it is mentioned in this post that the issue was resolved.

Thank you for your help,

Paola

Hello,

You’re right we had another issue this time on scratch1 server. I update the post just now.

Best regards,

Hello,

Thank you for your answer.

I wanted to notify again that, although the issue seemed resolved in the past days, I am getting again errors such as “OSError: [Errno 121] Remote I/O error”.

Best regards,

Paola

Dear @Paola.Malsot

This time the error wasn’t hardware, but users having too many files hosted on the scratch space: New scratch policy : quota on number of files

Best

Yann

3 posts were split to a new topic: Issue with storage