Srun error: eio_handle_mainloop

Hello,

Since very recently, I am encountering the following error and difficulty for filezilla to retrieve the directory listing while job is running:

srun: error: eio_handle_mainloop: Abandoning IO 60 secs after job shutdown initiated

My code runs a for loop and this occurs after some time for a given iteration and does not complete.

What I have found on the net is that this error occurs when:
“Slurm is giving up waiting for stdout/stderr to finish. This typically happens when some rank ends early while others are still wanting to write. If you don’t get complete stdout/stderr from the job, please resubmit the job.”
but I do not understand what it means or how to fix it.

This is the sh file I run on yggdrasil:

#!/bin/sh
#SBATCH --cpus-per-task=1
#SBATCH --job-name=smoothed
#SBATCH --ntasks=1
#SBATCH --time=01:30:00
#SBATCH --array=1-9999
#SBATCH --partition=shared-cpu
#SBATCH --mail-type=ALL
#SBATCH --mail-user=younes.boulaguiem@unige.ch

## deps
module load foss/2019b R/3.6.2

## main 
srun Rscript smoothed_HPC.R $SLURM_ARRAY_TASK_ID

I would greatly appreciate your help!

Thanks in advance,
Younes

Hi,

this issue is probably related with high IO jobs (yours or someone else’s jobs).

As you are launching 10k your job using job array, it may happens it is in fact your jobs producing this high IO. Plus you are talking about a for loop in your job? Can you elaborate a little bit your your jobs are doing? Describe where the data are read/write etc.

You are using an old R version, is there any reason for that?

Best

Thank you for your reply.

So I start by loading 3 rdata files (the heaviest lower than 1mb) from my directory in yggdrasil, and I have 4 matrices of 2x10k that I fill in at the end of each iteration of the for loop (so 10k iterations in total) after doing some computations. At the end of the loop I save the results in an array that I then save as an rdata file of ~600kb. So as I am launching 10k jobs using array, I end up saving on the server 10k rdata files of size ~600kb each.

No reason for the old R version except that it’s been working fine. I should probably upgrade, thanks!

Hello,

It seems that using the latest version of R solved the problem.

Cheers
Younes