Python for loop parallelism

Dear HPC community,

I was wondering what is the best way to parallelize a python for loop on the Yggdrasil HPC. The iterations are independent of each other and I can not easily vectorize it. Therefore, I thought about the multiprocessing library. Now my question is what is the most efficient way to allocate resources on the cluster: increase the number of cpus per task, or the number of tasks?
i was thinking about something like this:

from multiprocessing import pool
pool = Pool(size)
jobs = []
result =[]
for i in range(size):
    jobs.append(pool.apply_async(do_something, args=(...,))
for i in range(size):

Would that be efficient in terms of resources? i am not really experienced in parallel computing and i do not want to do something wrong and monopolize resources in a useless way, therefore i am reaching out for help.

Also, this for loop is at the center of my algorithm. so if I can parallelize it on multiple cores, they will be almost always busy.

Many thanks

Hi Oriel,

as you said that each iterations are independent, I suggest to remove the loop and instead use one core per iteration. Replace each iteration by a job array index.


Hi Yann,
Thanks for your answer.
As i understood, a job array just creates a bunch of different jobs, possibly with different parameters, right? What I am looking for, is a way to speed up a recurrent for loop inside a big job. When I tried what I proposed in my original post on my laptop, it seems to use the different cores to parallelize the processes as much as it can. However, on the cluster, it seems to run but it does nothing. Maybe it is because I requested less cpus-per-task than the number of iteration of my for loop.
Or maybe i have mistaken the job array?
Thanks in advance for any suggestions

Hi, it seems I didn’t answered your last post. As your post was cited in another one, time to update:)

What information I miss from your use case is the expected pool size (your for loop size).
Do you have as well a sequential part?

what was the number of cores requested and the pool size? If you didn’t specified the pool size, then it is trying to use all the cpus of the node and that is an issue if you didn’t allocated all the node.

if I may add my example: my for loop has 2200 iterations and I manually set my pool size to 22 in my script. This means, I use 22 cores that run the for loop 100 times each. After all loops (i.e., the pooling) finished, I merge the outputs from all 22 cores in the same script and perform further calculations where I again use pooling with for loops.


Have you solved this? And if yes, can you share a code?