Current issues on Baobab and Yggdrasil

Hi,

the storage on Baobab was unresponsive today 2021-05-02T22:00:00Z. This impacts the user experience such as using the login node and or the running jobs that are using the storage.

The reason was a user who submitted a job array with every job performing intensive IO operations such as gzip and gunzip of big files. This is clearly something that should be avoided to be done on a compute jobs, specially if this is done dozens of time in parallel.

Best practice:

  1. use the scratch space for temporary files. At least you won’t perturb too much the user experience.
  2. use the local scratch on every node or even the memory (tmpfs)

Thanks for your help and understanding.