Hi,
When I submit several slurm jobs on “priavate-dpt-cpu” partition, the status of some of the jobs are shown to be “launch failed requeued held”. Can you guide me on what can be the issue and how to sort it out?
For reference, here is my batch script:
#!/bin/bash
#SBATCH -J psE_p9s1p1 #Jobname
#SBATCH -e ps_p9_s1_kmp1_emcee-err_%j.error #Jobname error file
#SBATCH -o ps_p9_s1_kmp1_emcee-out_%j.out #Jobname out file
#SBATCH -n 20 # Number of tasks
#SBATCH -t 00-10:00 # Runtime in D-HH:MM
#SBATCH --mem-per-cpu=4000
#SBATCH -p private-dpt-cpu
module load GCC/7.3.0-2.30 OpenMPI/3.1.1 Valgrind/3.14.0 GSL/2.5 OpenBLAS/0.3.1 FFTW/3.3.8
srun -n 20 cosmosis --mpi euclid_results/ini_files/ps_kmp1_emcee.ini
Thank you,
Azadeh
edit: code formating