I’m not sure how to understand the memory requests to slurm and how it’s handled when having multiple cpus per tasks. I’ve been sending this kind of sbatch file:
You are asking 500MB per core (NB , CPU in Slurm terms), if you want to ask memory independently of the number of CPUs you should use the --mem=${GB} option (cf. hpc:slurm [eResearch Doc] ).
NB, you can check the situation via scontrol , see in the output below the difference between MinMemoryCPU and MinMemoryNode :
So in my example above if I asked for “500MB” per “slurm cpus” and I asked for 8 cpus per task then each task should be allocated 8x500MB=4GB, is that right?
Then it seems that the job was cancelled when the memory was max at 3.63GB. Is this just some conversion between memory units and actually the memory was more than 4GB or am I missing something?
Slurm uses polling to check for memory usage, and thus may miss a sudden increase. However, your process was killed by the linux kernel (cgroups) which acts almost in real time.