Storage best pratice and Job array limitation

Dear HPC Users,

Recently, we noticided a huge load on the home beegfs impacting all users due to some jobs working directly in the home.

It seems important to remind you the storage best practice to avoid another incident :

The scratch:
Use the scratch to store the temporary data for your jobs

  • Local scratch:
    It is usually fast and there is no overhead for using the network. It is also more efficient at dealing with a large amount of small files.
    You usually use it for temporary files generated while your job is running and that do not need to be accessible at the end of the job.

  • beegfs scratch:
    The scratch directory allows you to store any data that is not unique or that can be regenerated. Please use this to store any file that doesn’t need a backup. You will typically use it as a storage when your application writes temporary data to disk. We thank you for your cooperation.
    It is also acceptable to store for instance a large dataset that you use as an input data during your computation.

The home:
In your home directory you can store any file needed for running your jobs : data, code, software, etc.

If jobs is running and doing I/O directly in the home beegfs, this will causing a huge slowdown on storage impacting all home users. We stronghly recommand you to follow the best pratice to ensure the best experience for everyone on the cluster.

Job array simultaneous limitation:
The more jobs the more I/O there could be. We invite you to limit the number of array simultaneously runs by jobs by adding:

#sbatch --array=1-100%4 to limit 4 running arrays at the same time

the documentation has been updated:[]=array#job_array


This is maybe a weird question but I want to know whether my setup follows the best practices or not.

My job script and code are located under HOME. They read files, write files, and write logs on Scratch but through symlinks. I.e. my directory looks like this:

[drozd@login2.baobab dataExtraction_202205]$ ll
-rw-r--r-- 1 drozd private_dpnc 26159 May  6 15:16 list_of_files.txt
lrwxr-xr-x 1 drozd private_dpnc    74 May 11 14:52 output -> /srv/beegfs/scratch/users/d/drozd/[...]
lrwxr-xr-x 1 drozd private_dpnc    74 May 11 14:52 logs -> /srv/beegfs/scratch/users/d/drozd/[...]
lrwxr-xr-x 1 drozd private_dpnc    77 May  6 15:01 input_files -> /srv/beegfs/scratch/groups/dpnc/[...]
-rw-r--r-- 1 drozd private_dpnc  3565 May 11 15:08
-rw-r--r-- 1 drozd private_dpnc   411 May 11 15:11

with output, logs, input_files being as you see symlinks towards Scratch but located in HOME.

Also, since the working directory is on HOME, does that affect HOME performances?