I cannot figure out how to run multithread processing in R with benefits on baobab.
I have a big slow sequential function to be repeated with different inputs and I use the function mclapply to apply this function over all the inputs. You can specify how many cores you want to use.
It works nicely on my desktop, where I get a reduction in computation time which is almost linear with the number of threads (up to the number of cores). On baobab, nothing is faster than using only one thread.
I ask the proper number of cores to slurm (--cpus-per-task) and I have also tried the --exclusive flag and setting the cpu affinity. I have tried on the debug nodes interactively to see what happens and it appears that the computation resources immediately saturate even when only 2 threads are asked to mclapply.
Has anybody had the same kind of issue? Any suggestion?
If for “multithread” you mean Simultaneous multithreading - Wikipedia , then there is no hope for you, given that SMT is (and has always been) disabled on Baobab, since there is the risk of “process” contamination if two users shares the same core.
NB, SMT is enabled on purpose on the login* nodes to quicker prototype/test your analysis.
Care to share the code, at least to make some tests?
If I got it right you want to parallelize your sequential analysis with different inputs, right?
Here with multithreading I was just meaning that I want to have my process spawn multiple threads and run each on one of the core that I have requested to slurm.
I will try the examples and the parLapply function. However, it looks very much similar to mclapply, again from the parallel package. Do you know if there is any difference between those two?
OK, thank you for the clarification, in HPC terms it is called parallel computing and I guess it is called mclapply for multi-corelapply .
My R knowledge is still small, thus I can not clearly answer your question, I can simply say that mclapply is part of the parallel package (cf. R: Support for Parallel Computation in R ) and since it relies on forking it is not available on Windows.
OTOH, I can point you to some upstream documentation:
As you said your job is working fine in your desktop, it should works fine as well on Baobab. Can you please show us your sbatch script and or explain how you launch your code? The symptoms you are facing would suggest that you are using only one cpu.
my code is rather complex and I don’t know if it would be of help to discuss on that. What I know is that I am effectively using --cpus_per_tasks cores, see attached screenshot. Here I was asking 2 cores to mclappy (it is the same with parLapply) and 8 --cpus_per_task.
On my machine, when I ask 2 cores to mclapply the thing runs twice as fast, and so on linearly up to the number of my cpus. On baobab if I ask 8 --cpus_per_task and then anything bigger than 1 core to mclapply, it gets stuck. As you see in the screenshot, 8 processors are completely saturated whereas I was expecting to saturate only two of them.
Hey Davide,
I run into a similar issue. Although I did not find the solution to my problem, we can at least exclude the wrong hypothesis that Baobab does not run parallel::mcmapply on multiple cores as shown in the reproducible example below (run on debug-EL7 partition with 16 cores). Maybe the code below may be of any help in your case.
20 > library(foreach)
21 > library(doParallel)
22 Loading required package: iterators
23 Loading required package: parallel
24 >
25 > # set the number of cores in the sbatch script
26 > registerDoParallel(cores=Sys.getenv("SLURM_CPUS_PER_TASK"))
27 >
28 > # print the number of workers
29 > getDoParWorkers()
30 [1] "16"
31 >
32 > trials <- 100000
33 > x <- iris[which(iris[,5] != "setosa"), c(1,5)]
34 >
35 >
36 > # parallel execution
37 > system.time({
38 + r <- foreach(icount(trials), .combine=rbind) %dopar% {
39 + ind <- sample(100, 100, replace=TRUE)
40 + result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
41 + coefficients(result1)
42 + }
43 + })
44 user system elapsed
45 226.837 2.579 25.433
46 >
47 > # sequential execution
48 > system.time({
49 + r <- foreach(icount(trials), .combine=rbind) %do% {
50 + ind <- sample(100, 100, replace=TRUE)
51 + result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
52 + coefficients(result1)
53 + }
54 + })
55 user system elapsed
56 202.595 0.127 202.769
57 >
58 >
59 > f <- function(i,x){
60 + ind <- sample(100, 100, replace=TRUE)
61 + result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
62 + coefficients(result1)
63 + }
64 >
65 > system.time(
66 + mapply(FUN = function(i)f(i,x), seq_len(trials))
67 + )
68 user system elapsed
69 179.318 0.020 179.357
70 >
71 > system.time(
72 + mcmapply(FUN = function(i)f(i,x), seq_len(trials))
73 + )
74 user system elapsed
75 184.129 0.231 95.026
76 >
77 > system.time(
78 + mcmapply(FUN = function(i)f(i,x), seq_len(trials), mc.cores=Sys.getenv("SLURM_CPUS_PER_TASK"))
79 + )
80 user system elapsed
81 220.898 3.288 14.573
mcmapply indeed spawns threads in my case as well, just I don’t get any gain in performance (actually everything slows down), whereas on my desktop I do get substantial speedups. Most probably the reason in my case is more intricate and it is related to something else I do in my code.
I didn’t manage to find out why, but I have moved to parallelize at slurm array level, so spawning multiple single threaded tasks instead of one single multi threaded one and everything works fine in taht case.
You said you asked two cpu for mclappy and according to htop, you were in fact using 8 cpu, right?
So it seems mclappy isn’t doing what you expect and is probably trying to use all the cpu of the server (16 in this case). As you only requested 8 cpus per task to SLURM, ths issue is probably that mclappy is doing a lot of context switching because he is running more threads than allocated cpus. In the htop, you can see that you have 16 instances of R. And this is probably very bad performance wise.
Maybe you can show us how you limit the number of cpus to mclapply?
But as you said, you split your job in mono thread tasks, and this is probably the best parallelism you’ll ever get:)