Sbatch and srun

This is a quick post to try to explain the implication of submitting a job from within a sbatch script. I’m posting this because I had a question by email related to this topic.

Let say we have a master script master.sh that calls a slave script slave.sh.

The master script is doing almost nothing, only submitting slave script(s).

master.sh =>
#!/bin/sh

#SBATCH --cpus-per-task=2

slave.sh (no srun in front of it if you have srun in the slave.sh script!)
slave.sh => 
#!/bin/sh

#SBATCH --cpus-per-task=4

srun echo $(hostname)

We submit master script using sbatch:

[sagon@master sruninsidesbatch] $ sbatch master.sh
Submitted batch job 49500150

[sagon@master sruninsidesbatch] $ ls -la slurm-49500150.out
-rw-r--r-- 1 sagon unige 1535 Aug 20 10:13 slurm-49500150.out

In this case, the slave script is a job step of the master job. Once the step finishes, the master job is finished as well and releases the resources. The sbatch pragma #SBATCH --cpus-per-task=4 isn’t used as we aren’t submitting the slave script using sbatch.

A job step is executed inside the resources allocated by the master script. It is impossible to request more resources from the job step.

Example, we try to us 4 cpus in a job who requested only 2 cpus.

slave.sh =>
#!/bin/sh
srun -c 4 echo $(hostname)
[sagon@master sruninsidesbatch] $ sbatch master.sh
Submitted batch job 49500168

[sagon@master sruninsidesbatch] $ cat slurm-49500168.out

srun: Job step's --cpus-per-task value exceeds that of job (4 > 2). Job step may never run.
srun: error: Unable to create step for job 49500168: More processors requested than permitted

We can as well submit a sbatch from the master script:

master.sh =>
#!/bin/sh

#SBATCH --cpus-per-task=2

sbatch slave.sh
slave.sh =>
#!/bin/sh

#SBATCH --cpus-per-task=4

srun -c 4 echo $(hostname)
[sagon@master sruninsidesbatch] $ sbatch master.sh
Submitted batch job 49500173

[sagon@master sruninsidesbatch] $ cat slurm-49500173.out
Submitted batch job 49500174

As you can see, this results in two separated jobs with two distinct resources allocations. The second job allocated 4 CPUs, and the first one 2 CPUs.

If you need to have a master job controlling some slave jobs, it is better not to use SLURM for the master.

Other resources:

multiprog
hterogeneous jobs

Dear Yann,
thank you for your reply. What you mention is very clear.
In my case though, master submits slave to slurm with sbatch.
Similarly, slave uses sbatch to submit some jobs and srun for some other jobs. Which is why I was kind of surprised by the results.
Probably I did not explain it so well. Should I write a more elaborate code example?
Best, NR

Hi,

why not. But I think in your case it is better to have the “master” job outside of slurm. If you really need to launch a job from another job, use sbatch. In this case it is your duty to track the job number and to perform an action when the job is finished.

Using slurm, you can specify a specific script to be called once your job is finished using epilog.

Example:

The script which will be executed at then end:

[sagon@login2 epilog] $ cat aa
#!/bin/sh

echo Hi, I finished the job toto > res

Submit a job with the epilog parameter set:

[sagon@login2 epilog] $ sbatch --wrap "srun --epilog aa hostname"
Submitted batch job 49523053

Once the job is finished:

[sagon@login2 epilog] $ ls
aa  res  slurm-49523053.out

:warning: the epilog script must be a short script, if no, the node will be set to drain mode by Slurm and we’ll be angry. Well not than angry, don’t worry.

More about prolog/epilog: http://baobabmaster.unige.ch/slurmdoc/prolog_epilog.html

Let’s say my “slave” looks somewhat like this:

"""
EXAMPLE OF SLAVE SCRIPT
it is launched in a specific folder and does all the necessary calculations there
"""
import subprocess as sp
import os
import shutil as sh
import sys
from my_custom_functions import *

os.chdir(sys.argv[1]) # let's go into the folder

with open("geometry.xyz", "r") as f:  # let's read our molecular geometry
    geom = f.read()

write_input1(**kwargs)  # let's write our first input
sp.run("sbatch {} submit_qchem.sh file1.in".format(sbatch_options)) # run input 1 with sbatch

write_input2(**kwargs)  # let's write our first input
sp.run("sbatch {} submit_qchem.sh file2.in".format(sbatch_options)) # run input 1 with sbatch

diff = 1
energies = []
electron_density = get_guess()
n = 0
while diff > 1e-9:
    write_input3(electron_density, geom, **kwargs)
    sp.run("srun {} qchem file{}.in".format(slurm_options, n)) # run input 1 with sbatch
    electron_density, energy = get_data_from_ouput("file{}.out".format(n))
    energies.append(energy)
    diff = energies[-1] - energies[-2]
    n += 1

and my master something like this:

"""
EXAMPLE MASTER SCRIPT
launches slave for different folders
"""
folders = ["folder1", "folder2", "folder3"]

for folder in folders:
    sp.run("sbatch {} slave.py {}".format(slurm_options, folder))
"""

Since I always assigning the job to slurm, I don't fully get why the error is raised.