Hello,
I submitted ~300 jobs to mono-shared-EL7, mono-EL7 and dpnc-EL7, the required time of my jobs is around 5h, the other sbatch script options except for the job names are just default. They have been stuck in a pending state for >1 day.
Is this a problem from my side or a common issue?
Example of a job in mono-EL7: job id: 29517960
This job has a time limit of 04:00:00 (4 hour). To have less wait time, you should submit this job to the partition mono-shared-EL7 as this partition is far bigger. The time limit of this partition is 12h00 which suit your needs.
Example of a job in partition dpnc-EL7: job id: 29517960
This job has a time limit of 05:00:00 (5 hours). To have less wait time, you should submit this job to the partition mono-shared-EL7 as this partition is far bigger. As you belong to the dpnc group, you can even specify both partition (comma separated).
Example of a job in partition mono-shared-EL7: job id 29517730
This partition is a good choice as you have a time limit of 08:00:00
This job has a priority of 5069
The issue in this case is that there is many job with a higher priority that are requesting 16 cores per job.
Slurm isn’t able to look in all the job in the pending queue. Right now, slurm only take into account jobs in the queue that would start in less than 28 days. This may seems a lot, but when a huge number of job is in the queue, this may be bigger. I’ve doubled this value right now (56 days) and I see that a lot of your jobs aren’t anymore in the queue. Did they start? I’ll keep this parameter with the 56 days value right now to see if it help to better schedule the jobs. My advice is as well that you submit your jobs to mono-shared-EL7.