Scratch access very slow sometimes

Volodymyr.Savchenko · March 6, 2020, 5:29pm

Hello

although most of the time there is no problem, sometimes the read from the scratch partition is ~1-2Mb/s (e.g. right now).

not only this is a problem for the data access, but the 1Gb singularity image takes 10 minutes to start (usually it’s instantaneous).

this is how I test the speed:

find $PWD/scratch/data/integral/scw/ -name isgri_ev* -exec dd if=‘{}’ of=/dev/null ;

(or doing the same on the image)

Is there an easy way to monitor current partition IO load?

Also, is there a good way to cache some large files (like the image) over the computing nodes?

Thanks!

Volodymyr

Yann.Sagon · March 11, 2020, 11:49am

Dear Volodymyr,

your post was the opportunity to write a quick post about storage on Baobab that may be interesting for the users.

As you may understand from my post, it’s a bad idea to perform benchmarks yourself.
According to your post, you are reading the content of 178121 files to determine the performance of the storage. Please don’t.

If you really want to check the performance, there is a tool provided by BeeGFS: beegfs-ctl but it’s up to you to understand how to use it as it’s not intended to be useful for the enduser.

Best

Volodymyr.Savchenko · March 11, 2020, 12:30pm

Dear Yann,

thanks a lot for the post! I am not sure it answered my question though.

I am most certainly not reading all of those files, as there is no need to wait for the command to complete, you can interrupt it after a few reads. It’s meant as an effectively infinite-duration command, I never ran after just several files are read, as it is enough to assess the performance.
Sorry I understand now this was unclear.

So actually my question was if there is currently a monitoring a user can access to see if the disk is loaded, and even if my jobs happen to load it.
It was certain at the point when I wrote the post that the problem was very acute. It effectively blocked my work (I myself had only a few jobs at that time).
How often does this happen? Has it been reflected in some user-accessible log?

Such a report would seem preferable to manipulating beegfs-ctl myself, since I would not really want to setup my own monitoring, or anything else that permanently runs on the hpc cluster.
I can see to add something with beegfs-ctl though, if that’s the best option.

By the way, in this case, my “find” was not the issue (though I indeed understand that in some situations it is), since the problem was very clear also with simply reading a singularity image with a known location.

Cheers

Volodymyr

Yann.Sagon · March 11, 2020, 2:29pm

Dear Volodymyr,

just to be clear: I’m pretty sure as well you weren’t the cause of the slowness of the scratch space, no worries. I understand it’s an issue when it’s that slow. We had two internal issues that may have been the cause of the poor performance. It could be indeed interesting for the users to have access to a performance graph about the storage. We’ll think about it, we need to figure out the good metrics and graph them.

Best

Volodymyr.Savchenko · March 11, 2020, 2:35pm

Indeed, this only happened during about one day, so it’s not really a big concern at all if it was a rare internal issue.
Just that I can not be entierly sure there were no other possibly short-term episodes, hence the question.
Good to know it might become available!

Volodymyr.Savchenko · March 12, 2020, 7:20am

I think it is happening again now, although to a somewhat smaller degree.

Here is a simple test, which should not involve much metadata load:

dd if=/srv/beegfs/scratch/users/s/savchenk/singularity/spiacs-detection-9c73c79-5f75a0a.sif bs=1M count=100 of=/dev/null

Yann.Sagon · March 13, 2020, 9:31am

And what was the result?

Volodymyr.Savchenko · March 13, 2020, 9:57am

it was about 1Mbs.

I also added some small test (~10Mb read with dd) at the beginning of every job. It does not increase the load substantially, since it’s few % of what the job reads regularly.

I understand and fully appreciate your point about users not creating unnecessary load by doing independent monitoring, but this just basically stats on my jobs, I suppose that’s ok.

Here is how it evolved over yesterday. Min, Max, Average in each time bin. I am not sure about one outlier upwards, maybe got cached somehow.

Edit: size of the dot corresponds to the number of reads (i.e. number of my jobs in that time bin)

Luca.Capello · March 13, 2020, 1:18pm

Hi there,

Please remember to remove any “personal” stats as soon as HDF5 profiling will be activated (cf. SLURM - monitor resources during job ).

Thx, bye,
Luca

Volodymyr.Savchenko · March 13, 2020, 1:21pm

Hi Luca,

Sorry I do not understand, what if part my job output is metadata like the total time spent?
Can I not make a summary on the metadata of my jobs?

Or you mean I should not note this metadata in the job? It does not cost extra resources.

Thanks!

Volodymyr