Hi,
I have submitted a few jobs to dpt-EL7. I see that the jobs are estimated to start between 5-7 days !! I have never had such a long queue time. Can you give me some information about this?
Thank you,
Azadeh
Hi,
I have submitted a few jobs to dpt-EL7. I see that the jobs are estimated to start between 5-7 days !! I have never had such a long queue time. Can you give me some information about this?
Thank you,
Azadeh
Hi,
I did give a quick explanation and gave some reference on the subject in the following forum thread Question about mono-EL7 and shared-EL7 partitions usage .
But in short the estimated wait time is very pessimistic it is based on the maximum time a job is announced to take. As the default is given by the maximum time limit of the partition which is 4 days and more in some partition the predicted time is also use. To be more precise a job with a maximum run time of 4 days will be sheduled to take the whole 4 days and other jobs will be sheduled to run after it has finish. However if the job only take 10 hours the other job can start sooner.
In short inaccurate wait time is due to overestimated time limit. Tighter your time limits are, more accurate is the sheduling and faster your job will start. Of course by design your time limit should always be bigger of what you need to avoid having your job kill for time limit.
Hi Pablo,
Thank you for the explanation. Previously, when I was submitting jobs on dept-EL7 requesting 5-10 hours, the queuing time was significantly shorter, and multiple of my jobs would start running simultaneously. This week, the queue time is much longer, and I have only one job at the time running. Is there a particular reason for this change? Do you have a suggestion for how to shorten the queue time?
Thanks,
Azadeh
If the job and configuration are the same. The reason can be other user using more ressources.
One suggestion I have to reduce waiting time is to use as many partition you have access and are compatible, especially the shared partition. This increase the chance of finding a node to be scheduled on. The second suggestion if possible is to have the time limit and ressources usage (memory for example) as tight as possible. Smaller your job is higher is the probability to find a hole to put it in.