Hi,
the storage on Baobab was unresponsive today 2021-05-02T22:00:00Z. This impacts the user experience such as using the login node and or the running jobs that are using the storage.
The reason was a user who submitted a job array with every job performing intensive IO operations such as gzip and gunzip of big files. This is clearly something that should be avoided to be done on a compute jobs, specially if this is done dozens of time in parallel.
Best practice:
- use the scratch space for temporary files. At least you won’t perturb too much the user experience.
- use the local scratch on every node or even the memory (tmpfs)
Thanks for your help and understanding.