Jobs running slowly on BAOBAB

Hi everyone,

My jobs are running very slowly since the fix of the I/O issue I would say. Time to time the jobs run at the normal speed but (once they are done?) they remain with the CG (Completing) status until all the other jobs are done. Once all the jobs done it seems that the CG jobs start to run again at the normal speed. It is hard to test that statement due to the long time the jobs take to be done.

I did not saw anything in the log/elog files. But I noticed that when I was submitting jobs I could have a lot of jobs on the same cpu. Currently I have 89 jobs on the cpu291, some of them with the CG status.

I might be making a mistake during the job submission. I was able to submit jobs one month ago with the exact same script without issue.

Also I should mention that my jobs are reading data (root files) from beegfs. The issue could come from this.

Realevent directory:

My home directory: /home/users/e/erobyn/
The data directory: /srv/beegfs/scratch/groups/dpnc/ams/zhen/amsd64n/iss_B1130P7_ecal/
Job submission script: /home/users/e/erobyn/analysis/subjob/baobab_slurm_analysis6.sh

I have been thru the recent post about slow jobs but none of them gave me the solution.

Thanks in advance for your help!

Cheers,
Erwan

Hi, thanks for the notification. Indeed there is an issue with the storage on Baobab: the nodes are accessing the storage servers using TCP instead of RDMA (faster). We are investigating.

Hi, this is now fixed. See Current issues on Baobab and Yggdrasil

Ok, thank you very much!