R mclapply on baobab

Davide.Cucci · April 17, 2020, 12:47pm

Dear all,

I cannot figure out how to run multithread processing in R with benefits on baobab.

I have a big slow sequential function to be repeated with different inputs and I use the function mclapply to apply this function over all the inputs. You can specify how many cores you want to use.

It works nicely on my desktop, where I get a reduction in computation time which is almost linear with the number of threads (up to the number of cores). On baobab, nothing is faster than using only one thread.

I ask the proper number of cores to slurm (--cpus-per-task) and I have also tried the --exclusive flag and setting the cpu affinity. I have tried on the debug nodes interactively to see what happens and it appears that the computation resources immediately saturate even when only 2 threads are asked to mclapply.

Has anybody had the same kind of issue? Any suggestion?

Thanks to everybody for the help.

Luca.Capello · April 17, 2020, 1:58pm

Hi there,

If for “multithread” you mean Simultaneous multithreading - Wikipedia , then there is no hope for you, given that SMT is (and has always been) disabled on Baobab, since there is the risk of “process” contamination if two users shares the same core.

NB, SMT is enabled on purpose on the login* nodes to quicker prototype/test your analysis.

Care to share the code, at least to make some tests?

If I got it right you want to parallelize your sequential analysis with different inputs, right?

We already provide an example for the parallel R package (cf. r/R/helloParallel.R · ed8b9bc86fd485a1f00ecc194922230c62689b51 · hpc / softs · GitLab ), have you tried using the parLapply function (cf. How-to go parallel in R – basics + tips | R-bloggers ), instead?

I guess from my first sentence above it is now clear that on Baobab one thread always corresponds to one core.

Thx, bye,
Luca

Davide.Cucci · April 17, 2020, 2:10pm

Thanks Luca for your prompt answer.

Here with multithreading I was just meaning that I want to have my process spawn multiple threads and run each on one of the core that I have requested to slurm.

I will try the examples and the parLapply function. However, it looks very much similar to mclapply, again from the parallel package. Do you know if there is any difference between those two?

Thansk!

Luca.Capello · April 17, 2020, 4:20pm

HI there,

OK, thank you for the clarification, in HPC terms it is called parallel computing and I guess it is called mclapply for multi-core lapply .

My R knowledge is still small, thus I can not clearly answer your question, I can simply say that mclapply is part of the parallel package (cf. R: Support for Parallel Computation in R ) and since it relies on forking it is not available on Windows.

OTOH, I can point you to some upstream documentation:

CRAN Task View: High-Performance and Parallel Computing with R (cf. CRAN Task View: High-Performance and Parallel Computing with R )
Parallelize R code on a Slurm cluster (cf. Parallelize R code on a Slurm cluster )

Hope that helps!

Thx, bye,
Luca

Yann.Sagon · April 21, 2020, 7:22am

Dear @Davide.Cucci,

As you said your job is working fine in your desktop, it should works fine as well on Baobab. Can you please show us your sbatch script and or explain how you launch your code? The symptoms you are facing would suggest that you are using only one cpu.

Best

Davide.Cucci · April 21, 2020, 7:47am

Dear Yann,

my code is rather complex and I don’t know if it would be of help to discuss on that. What I know is that I am effectively using --cpus_per_tasks cores, see attached screenshot. Here I was asking 2 cores to mclappy (it is the same with parLapply) and 8 --cpus_per_task.

On my machine, when I ask 2 cores to mclapply the thing runs twice as fast, and so on linearly up to the number of my cpus. On baobab if I ask 8 --cpus_per_task and then anything bigger than 1 core to mclapply, it gets stuck. As you see in the screenshot, 8 processors are completely saturated whereas I was expecting to saturate only two of them.

Please let me know if I hadn’t been clear enough.

Samuel.Orso · May 13, 2020, 11:58am

Hey Davide,
I run into a similar issue. Although I did not find the solution to my problem, we can at least exclude the wrong hypothesis that Baobab does not run parallel::mcmapply on multiple cores as shown in the reproducible example below (run on debug-EL7 partition with 16 cores). Maybe the code below may be of any help in your case.

 20 > library(foreach)
 21 > library(doParallel)
 22 Loading required package: iterators
 23 Loading required package: parallel
 24 >
 25 > # set the number of cores in the sbatch script
 26 > registerDoParallel(cores=Sys.getenv("SLURM_CPUS_PER_TASK"))
 27 >
 28 > # print the number of workers
 29 > getDoParWorkers()
 30 [1] "16"
 31 >
 32 > trials <- 100000
 33 > x <- iris[which(iris[,5] != "setosa"), c(1,5)]
 34 >
 35 >
 36 > # parallel execution
 37 > system.time({
 38 +   r <- foreach(icount(trials), .combine=rbind) %dopar% {
 39 +     ind <- sample(100, 100, replace=TRUE)
 40 +     result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
 41 +     coefficients(result1)
 42 +   }
 43 + })
 44    user  system elapsed
 45 226.837   2.579  25.433
 46 >
 47 > # sequential execution
 48 > system.time({
 49 +   r <- foreach(icount(trials), .combine=rbind) %do% {
 50 +     ind <- sample(100, 100, replace=TRUE)
 51 +     result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
 52 +     coefficients(result1)
 53 +   }
 54 + })
 55    user  system elapsed
 56 202.595   0.127 202.769
 57 >
 58 >
 59 > f <- function(i,x){
 60 +   ind <- sample(100, 100, replace=TRUE)
 61 +   result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
 62 +   coefficients(result1)
 63 + }
 64 >
 65 > system.time(
 66 +   mapply(FUN = function(i)f(i,x), seq_len(trials))
 67 + )
 68    user  system elapsed
 69 179.318   0.020 179.357
 70 >
 71 > system.time(
 72 +   mcmapply(FUN = function(i)f(i,x), seq_len(trials))
 73 + )
 74    user  system elapsed
 75 184.129   0.231  95.026
 76 >
 77 > system.time(
 78 +   mcmapply(FUN = function(i)f(i,x), seq_len(trials), mc.cores=Sys.getenv("SLURM_CPUS_PER_TASK"))
 79 + )
 80    user  system elapsed
 81 220.898   3.288  14.573

Best,
Samuel

Davide.Cucci · May 14, 2020, 7:40am

Ciao Samuel,
thanks a lot for the info!

mcmapply indeed spawns threads in my case as well, just I don’t get any gain in performance (actually everything slows down), whereas on my desktop I do get substantial speedups. Most probably the reason in my case is more intricate and it is related to something else I do in my code.

I didn’t manage to find out why, but I have moved to parallelize at slurm array level, so spawning multiple single threaded tasks instead of one single multi threaded one and everything works fine in taht case.

Yann.Sagon · May 14, 2020, 4:18pm

You said you asked two cpu for mclappy and according to htop, you were in fact using 8 cpu, right?
So it seems mclappy isn’t doing what you expect and is probably trying to use all the cpu of the server (16 in this case). As you only requested 8 cpus per task to SLURM, ths issue is probably that mclappy is doing a lot of context switching because he is running more threads than allocated cpus. In the htop, you can see that you have 16 instances of R. And this is probably very bad performance wise.

Maybe you can show us how you limit the number of cpus to mclapply?

But as you said, you split your job in mono thread tasks, and this is probably the best parallelism you’ll ever get:)