Automatic Slurm partition selection

Volodymyr.Savchenko · April 27, 2020, 11:22am

thanks, very useful!

Maybe a strange question, but I was wondering how much I, as a user, should be directly concerned with selection of the partition?

If I define the required job resources, and my goal is obviously to get my job scheduled as soon as possible, is there not always a definitive choice of the partition, which could be as well done without my direct involvement?

I saw that someone has even already implemented a script to make such a selection (does not work directly here, but gives the idea).

Luca.Capello · April 27, 2020, 12:18pm

HI there,

Please do not hijack threads (cf. [howto] check overall partition usage ), I have move your reply to a new topic.

Well, I would say that it depends on user preferences, myself as a molecular biologist I would NEVER want it, given that the partition is IMHO one of the analysis parameter, thus humanly defined.

Thank you for the notice, I guess the easiest going forward would be to adapt that script and then provide here an HowTo.

Thx, bye,
Luca

Volodymyr.Savchenko · April 27, 2020, 12:39pm

I guess my point was that I do not see what is the additional analysis configuration can be provided by the partition selection, which can not be derived from the resource request under an assumption that I want my analysis to be executed as soon as possible.

Could you please clarify what is this use case?
Perhaps it is indeed specific to the kind of tasks you have in molecular biology?

Edit:

I am asking because I want to know if there is something important I am missing when I am selecting the partition myself. For my cases - the automatic thing given these requirements seem always enough. In fact, the most impact a custom choice can make is negative - misjudgment of the partition which can result in very long wait time (or even complete mismatch of the allowed resources, meaning that the job is never scheduled)

I could imagine, if there was priority scheduling in some partitions, with limited overall quotas (as discussed on the last lunch meeting), there would be some user choice besides resources in job. This way, the partition choice would express an additional variable - “importance” of the job for the user.
But I did not think there is anything like that already used in the cluster?

In fact, I have previously adapted a version of that script for myself. If useful, I could publish it in a fork.

Edit2:

I realized debug- partitions have a bit of this special meaning - presumably,they are not meant to be used for “real” jobs even when it is technical possible with the restriction. So that they are always meant to remain available. On the other hand, they are bound to remain available, so maybe they can be used for peculiar kind of real jobs?

Volodymyr.Savchenko · April 27, 2020, 12:42pm

Sorry, I thought it was directly following from the post.
If you disagree, and if you have more to comment, clearly it is worth moving it!
Luckily there is an efficient technical way to do it.

Pablo.Strasser · April 27, 2020, 8:57pm

A little trick that could be useful is that you can specify more than one partition with the -p flag. I use it to specify a global shared partition together with any private one I have access with higher priority but of the same kind than the global shared one. I’m not sure if an automatic script is a good idea without putting some limitation to avoid selecting specialized node when not needed. For example, it is completely possible to use a GPU partition without requesting a GPU, and you may even have no queue as the bottleneck should be GPUs and not CPU core, however you may also block a person requesting a GPU because no CPU core are available anymore. Same may happen with specialized node like bigmem.

Volodymyr.Savchenko · April 28, 2020, 6:32am

Actually I was implying that ideally, it could be up to scheduler (user-side, like this script, or even better slurm) to decide the partition, in some cases at least, based on the resource request. So if specifying multiple partitions does just that - it’s perfect.
But it remains unclear what should be the recommended default to allow this (should it be a list of partitions?) and what are the specific reasons not to do it.

But the amount of memory and CPU required is specified in the job description.
It says in the manual that:
“If you need more than 10GB RAM per core, you should try to use a bigmem node.”
Sounds like something easily scriptable. And even better, to prevent user-side scheduling, slurm could take care of it.
This would prevent mistakes in manual selection.

And sometimes perhaps there are no memory-hungry jobs for a while. Then, would it not be efficient for everybody to execute non-memory-hungry jobs on bigmem nodes? If selected by the scheduler, appropriately reducing the priority.

Same for the GPU requests.

So perhaps, as you say, putting some limitation to avoid selecting specialized node when not needed, could be achieved purely by the scheduler, since what needed can be sufficiently defined in the job request.

Perhaps my reasoning misses something important. I was motivated by the discussion on the last meeting about how some users have to wait long time while others do not.

There is another circumstantial benefit of user-side script: it helps to understand a bit what changes to the job would make it schedule faster.