Slurm submission problem

Hi,

When I submit several slurm jobs on “priavate-dpt-cpu” partition, the status of some of the jobs are shown to be “launch failed requeued held”. Can you guide me on what can be the issue and how to sort it out?

For reference, here is my batch script:

#!/bin/bash
#SBATCH -J psE_p9s1p1        #Jobname
#SBATCH -e ps_p9_s1_kmp1_emcee-err_%j.error  #Jobname error file
#SBATCH -o ps_p9_s1_kmp1_emcee-out_%j.out    #Jobname out file  
#SBATCH -n 20                       # Number of tasks
#SBATCH -t 00-10:00              # Runtime in D-HH:MM
#SBATCH --mem-per-cpu=4000
#SBATCH -p private-dpt-cpu

module load GCC/7.3.0-2.30  OpenMPI/3.1.1 Valgrind/3.14.0 GSL/2.5 OpenBLAS/0.3.1 FFTW/3.3.8
srun -n 20 cosmosis --mpi  euclid_results/ini_files/ps_kmp1_emcee.ini

Thank you,
Azadeh

edit: code formating

1 Like

Hi there,

sorry for the delay.

Your problem was probably due to the Slurm version difference, fixed last Wednesday 17th (cf. Current issues on Baobab and Yggdrasil - #32 by Yann.Sagon ), sorry for the inconvenience.

Can you try again to confirm that everything is OK now?

Thx, bye,
Luca