Out of memory slurm

Emeline.Bolmont · December 9, 2019, 4:40pm

Dear all,
I’m trying to run a high resolution climate simulation (with a lower spatial resolution, it worked).
I get this error:
slurmstepd: error: Detected 1 oom-kill event(s) in step 24388146.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

I tried:
#SBATCH --mem-per-cpu=3000
or
#SBATCH --mem=3000
But nothing works…

Do you have any pointers?
Thank you!

Pablo.Strasser · December 9, 2019, 5:20pm

You can try to put higher number in the --mem-per-cpu or mem command. Increase as much as needed.
You can also try to put a memory size that is too big and ssh to the node where the job is running and check memory usage with htop. Note that putting a higher memory requirement will limit which node can be used as the job will only run when the memory requirement are fullfill.

Yann.Sagon · December 10, 2019, 3:10pm

Hello, please show your full sbatch script.

You can check here how to determine how murch memory was used by your job.

https://baobab.unige.ch/enduser/src/enduser/submit.html#memory-and-cpu-usage

Yann.Sagon · December 10, 2019, 3:12pm

That’s right, by default each job has 3GB per core, the same as Emeline was requesting.

Emeline.Bolmont · December 10, 2019, 3:36pm

Thanks for your quick response!

Here’s my submit.sh script, which I launch as “sbatch submit.sh”

#!/bin/bash
#
#SBATCH --job-name=Aquaplanet
#SBATCH --output=aqua.txt
#
#SBATCH --partition=debug-EL7
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:14:00
#SBATCH --mem-per-cpu=3000

env | grep SLURM
data_dir=/home/bolmonte/scratch/20191209_Formation_LMDZ
ulimit -Ss unlimited
./gcm_96x95x39_phylmd_seq_orch.e

I tried to re-launch it to do sstat (it runs for a very short time):

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          24412661 debug-EL7 Aquaplan bolmonte  R       0:02      1 node001
$ sstat --format=AveCPU,MaxRSS,JobID,NodeList -j 24412661
    AveCPU     MaxRSS        JobID             Nodelist
---------- ---------- ------------ --------------------
 00:01.000    498412K 24412661.ba+              node001

I also executed sreport:

$ sreport job sizesbyaccount user=bolmonte PrintJobCount start=2019-01-01 end=2019-12-31
--------------------------------------------------------------------------------
Job Sizes 2019-01-01T00:00:00 - 2019-12-10T16:59:59 (29696400 secs)
Units are in number of jobs ran
--------------------------------------------------------------------------------
  Cluster   Account     0-49 CPUs   50-249 CPUs  250-499 CPUs  500-999 CPUs  >= 1000 CPUs % of cluster
--------- --------- ------------- ------------- ------------- ------------- ------------- ------------
   baobab      root           533             0             0             0             0      100.00%

The error I get is still:

/var/spool/slurmd/job24412661/slurm_script: line 16: 119022 Killed                  ./gcm_96x95x39_phylmd_seq_orch.e
slurmstepd: error: Detected 1 oom-kill event(s) in step 24412661.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Thanks again for your help!
Emeline

edit: I updated the post with code formatting for readability.

Pablo.Strasser · December 10, 2019, 4:02pm

Try to increase --mem-per-cpu to something like 16000 and if its not enough increase the memory to 32000 and even 64000. When it does not crash check the real amount of memory used in the report.

Jan-Philipp.Sasse · December 10, 2019, 5:31pm

You could also get this message if you have errors in your code.
For example, a division error etc.

Yann.Sagon · December 11, 2019, 7:41am

Hello

To request memory you can indeed use --mem-per-cpu but this is more useful when you use more than one cpu, which is not your case. In your case, it’s better to use --mem=XG which don’t depend on the number of allocated cpu. As I said, the value of X is by default 3. Try as said Pablo to increase this value (for example 6G) and relaunch the job. If it doesn’t crash, check the real memory consumption and adapt this value to the maximum RSS you reached with a good margin (1G for example).

You should (or maybe you have a good reason not to do it) prefix this command with srun:

srun ./gcm_96x95x39_phylmd_seq_orch.e

Emeline.Bolmont · December 11, 2019, 3:19pm

Thank you for your help!

Increasing the -mem did the trick:

bolmonte@login2:~/LMDZ_Formation/LMDZ2019/modipsl/modeles/LMDZ/AQUAPLANET_highres$ squ
                     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                  24445021 debug-EL7 Aquaplan bolmonte  R       0:17      1 node001
bolmonte@login2:~/LMDZ_Formation/LMDZ2019/modipsl/modeles/LMDZ/AQUAPLANET_highres$ sstat --format=AveCPU,MaxRSS,JobID,NodeList -j 24445021
            AveCPU     MaxRSS        JobID             Nodelist
        ---------- ---------- ------------ --------------------
         00:15.000   4227064K 24445021.ba+              node001

I also added the srun thing before ./gcm, and somehow it decreased a lot the memory used…:

bolmonte@login2:~/LMDZ_Formation/LMDZ2019/modipsl/modeles/LMDZ/AQUAPLANET_highres$ squ
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          24445171 debug-EL7 Aquaplan bolmonte  R       3:24      1 node001
bolmonte@login2:~/LMDZ_Formation/LMDZ2019/modipsl/modeles/LMDZ/AQUAPLANET_highres$ sstat --format=AveCPU,MaxRSS,JobID,NodeList -j 24445171
    AveCPU     MaxRSS        JobID             Nodelist
---------- ---------- ------------ --------------------
 00:00.000      8160K 24445171.ba+              node001

So that in the end, I guess I don’t need to increase the memory allocation anymore… Is that normal?
Thank you for your patience!