CPU/tasks for Python multiprocessing - Memory error

Nik.Zielonka · November 28, 2022, 11:40am

Dear community,

in my computations, I use Python multiprocessing to run for loops on multiple CPUs. After all CPUs have finished their loops, I merge the generated data, process it and run further calculations (all in the same script). My questions is similar to the one of @Oriel.Kiss:

How do I set up the batch file for the cluster correctly?
What should be the number of cpus per task and what the number of tasks?

Using a job array does not work for me since only parts of my Python script use multiprocessing (i.e., multiple CPUs) while other parts run on only one core.

I tried already with the following, but the job was killed on the cluster since it used too much memory (“Some of your processes may have been killed by the cgroup out-of-memory handler”):
#SBATCH --cpus-per-task=22
#SBATCH --ntasks=1
However, my scripts had already worked on a 24-core office computer with less memory. Thus, I assume I only use the wrong batch settings for the cluster.

Thanks a lot in advance for your help!

Adrien.Albert · November 28, 2022, 3:18pm

hi @Nik.Zielonka ,

Please, could you provide your entire sbatch file ? (and any extra informations about your job)

What I understand; each loop is generating data when all the loops are done, you merge all to start another calculation. right ?

Are the loops independent of each other?

As I don’t precisely know what you do, the frist thing it come in mind is to run each loop in a job array. When your data generation is done you can run the final part in another job.

Maybe the job dependency option may match your need:

# a single job can depend on an array job
# it will start executing when all arrayjobs have finished
jid=$(sbatch collect_data.sh)
sbatch --dependency=afterany:$jid analyze_data.sh