Understanding slurm memory allocation

Quentin.Vagne · July 9, 2021, 11:08am

Hi,

I’m not sure how to understand the memory requests to slurm and how it’s handled when having multiple cpus per tasks. I’ve been sending this kind of sbatch file:

#SBATCH --ntasks 3
#SBATCH --cpus-per-task 8
#SBATCH --partition shared-cpu
#SBATCH --mem-per-cpu=500

What does it mean? That I’m asking for 500MB per task (so 3x500=1.5G total) or 500MB per core being used so 3x8x500=12G ?

My job got killed because it went out of memory:
Start AveCPU State MaxRSS JobID NodeList ReqMem

2021-07-09T11:54:38 OUT_OF_ME+ 47964958 node[221-223] 0.49Gc
2021-07-09T11:54:38 00:00:00 OUT_OF_ME+ 0.01G 47964958.ba+ node221 0.49Gc
2021-07-09T11:54:38 00:00:00 OUT_OF_ME+ 0.00G 47964958.ex+ node[221-223] 0.49Gc
2021-07-09T11:54:39 02:15:19 OUT_OF_ME+ 3.63G 47964958.0 node[221-223] 0.49Gc

Thanks in advance if someone can help me to clarify how memory is counted,

Cheers,
Quentin

Luca.Capello · July 9, 2021, 12:58pm

Hi there,

You are asking 500MB per core (NB , CPU in Slurm terms), if you want to ask memory independently of the number of CPUs you should use the --mem=${GB} option (cf. hpc:slurm [eResearch Doc] ).

NB, you can check the situation via scontrol , see in the output below the difference between MinMemoryCPU and MinMemoryNode :

capello@login2:~$ salloc -n1 -c2 --partition=debug-cpu --mem-per-cpu=6000
salloc: Granted job allocation 47966867
salloc: Waiting for resource configuration
salloc: Nodes node001 are ready for job
capello@node001:~$ scontrol show Job=47966867 | grep -E 'Mem'
   MinCPUsNode=2 MinMemoryCPU=6000M MinTmpDiskNode=0
capello@node001:~$ exit
salloc: Relinquishing job allocation 47966867
capello@login2:~$ salloc -n1 -c2 --partition=debug-cpu --mem=6000
salloc: Granted job allocation 47966869
salloc: Waiting for resource configuration
salloc: Nodes node001 are ready for job
capello@node001:~$ scontrol show Job=47966869 | grep -E 'Mem'
   MinCPUsNode=2 MinMemoryNode=6000M MinTmpDiskNode=0
capello@node001:~$ exit
salloc: Relinquishing job allocation 47966869
capello@login2:~$

Thx, bye,
Luca

Quentin.Vagne · July 9, 2021, 1:20pm

Ok, I’m starting to understand…

So in my example above if I asked for “500MB” per “slurm cpus” and I asked for 8 cpus per task then each task should be allocated 8x500MB=4GB, is that right?

Then it seems that the job was cancelled when the memory was max at 3.63GB. Is this just some conversion between memory units and actually the memory was more than 4GB or am I missing something?

Thanks!

Jean-Luc.Falcone · July 12, 2021, 5:35am

Slurm uses polling to check for memory usage, and thus may miss a sudden increase. However, your process was killed by the linux kernel (cgroups) which acts almost in real time.

Quentin.Vagne · July 12, 2021, 8:35am

Ok thanks! Now everything makes sense.