R parallelisation

Hello !

I’m new with Baobab and i’m facing some problems to run my Rcode with the package “doParallel”. I tried to run my code many times but every times, i got this error :
Error in unserialize(socklist[[n]]) : error reading from connection

Calls: cluster_clara_para_improved … recvOneData → recvOneData.SOCKcluster → unserialize
Execution halted
slurmstepd: error: Detected 27 oom-kill event(s) in StepId=46351858.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

(Cluster_clara_para_improved is the name of my function called)

My sbatch is the following :
#!/bin/sh
#SBATCH --job-name=runclara
#SBATCH --error=runclara-error.e%j
#SBATCH --output=runclara.out.o%j
#SBATCH --partition=shared-cpu
#SBATCH --time=03:00:00
#SBATCH --exclusive
#SBATCH --cpus-per-task=15
#SBATCH --mem=50000
Rscript /home/users/t/tochon8/RCode/code_baobab.R | tee stdout.txt

I checked many forum which were speaking about this problem but i have no clues how to resolve this.

Thanks for any help !

Louis Tochon

Hi Louis,

As we don’t know you R code, we may only guess.

This means that your job required more than the memory allocated. Check here for details.

As you are testing, please use the debug-cpu partition. You’ll have less wait time, and this is the purpose of this partition.

Why do you specify the exclusive flag?
Please specify as well that you want only one task as Slurm will try to create “as much as possible” tasks.

Best

Hello Mr. Sagon,

Thank you for all your help, it finally worked. I still have a last question. If i run a “simple” task on my computer, i have the following calculation time given with proc.time :

user : 0.94
system : 0.34
elapsed : 133.66

And with the partition debug-cpu i get :

user : 1745.953
system : 17.782
elapsed : 122.128

Do you know why i got such a small difference between the two elapsed time? (both are using a 15-core parallel processing)

Thank you again for your time !

Louis Tochon

Hi,

You are suprised because there is a difference in the “elapsed” time between your run on your computer and on Baobab?

  • the frequency may be different
  • do you have 16 cores or 8 with HT enabled?
  • this is the CPU we are using for the debut partition.
    Best

Yann