Parallelization and loop for with R

Hello,

I have a some problems with parallelization and the time it takes to do a job.

I’m on R and I use the function lass0 on X matrix 155x11. This function have an option “parallel”.

If I do:

install.packages("readxl"    , repos = "http://cran.us.r-project.org")
install.packages("doParallel", repos = "http://cran.us.r-project.org")
library("lass0")
library("doParallel")
registerDoParallel(cores=detectCores())
fit.lass0=lass0(X,y, parallel=TRUE)

It will takes 2/3 minutes

After I need do it 50 times with a little change on each times so I right:

registerDoParallel(cores=detectCores())

for (i in 1:50){
Xi= small change on X
fit.lass0.i=lass0(Xi, y, parallel=TRUE)
}

It take more than 10h (I have tried with for(i in 1:10) and it’s also more than 10h…) I think I don’t understand something about the parallelization.

I use the cluster mono-shared-EL7, ntasks=1 and cpu-per-tasks=20

Thank’s for your helping

Garance

edit: add code bloc for readability

I don’t really know much about R in parallel workflow. However you can try to ssh into the node and verify the cpu usage of your job and check if it is really using the 20 cores you give to your job.

Hello,

I suggest that you don’t use detectCores as it will detect the whole cores of the compute node, even if you are restricted a subset of them by Slurm and you’ll get poor performances.

Please have a look at the example we provide here.

If you are happy with the execution time of one instance, the best that you can do is to use Slurm job array to launch 50 instances of your job.

Be sure to do some benchmark first: launch your instance with 2, 4, 8, 20 cores and measure time. If the performance aren’t better with 4, stick with 2 cores, etc.