Bus errors on cpu144?

User: briel
Cluster: yggdrasil

I’ve been running large job array and have been receiving bus error’s for jobs that were ran on cpu144.
All other jobs seem to run fine.

These jobs fail within the first 10 minutes after starting.
I am reading files from isilon and writing to my home directory.

Is there a specific issue with this CPU?

Dear @Max.Briel thanks for the notification. Indeed there was memory error on the compute nodes. We are checking.



The cpu 144 still produces bus errors from time to time. I got one a few hours ago.

@Matthias.Kruckow thanks for the notification, we’ll replace memory. Node in drain until further notice.