Scratch disk: FileNotFoundError when file exists

Hello, I’m reading and writing some files on the scratch disk and some of the jobs failed returning a FileNotFoundError when the file clearly exists at the correct location.

I tried to move my files to my home and I don’t seem to get this error anymore. Could it be related with the scratch disk being too full? I tried deleting some of my old files but it doesn’t solve the problem.

Thanks for your help!

So I realized that the problem of the files that were not found was a memory problem; I requested too little memory for my job. Below is an example of my submit script configuration:

#!/bin/env bash
#SBATCH --time=0-00:10:00
#SBATCH --partition=debug-cpu,private-dpnc-cpu,shared-cpu,public-cpu
#SBATCH --mem=60G
#SBATCH --output=log_txt/slurm-%J.out
#SBATCH --job-name='run_GRooTrackerVtx'

With this configuration, I have the problem that sometimes (seems random), files stored on the ${SCRATCH} disk are not found. This doesn’t happen if I run the exact same script but this time reading and writing files on the ${HOME} disk (instead of ${SCRATCH}). Does reading from/writing to ${SCRATCH} takes more resources than reading from/writing to ${HOME}?

In order to be able to keep reading/writing on ${SCRATCH}), my fix to this problem was to request more memory for my jobs. E.g.:

#!/bin/env bash
#SBATCH --time=0-00:10:00
#SBATCH --partition=debug-cpu,private-dpnc-cpu,shared-cpu,public-cpu
#SBATCH --mem=80G
#SBATCH --output=log_txt/slurm-%J.out
#SBATCH --job-name='run_GRooTrackerVtx'

This works but causes a new problem. The jobs are held longer in the queue for reason (BadConstraints). After a while, the jobs start but it takes much longer than usual. Is there a workaround or good practice in this case?

Thank you!

Hi @Stephanie.Bron,

Can you share a jobid that had the issue please?

On which cluster do you submit your jobs?

No.

Can we see the remaining of your sbatch script please?