Hello,
I’m creating this issue because I have problems when submitting a job depending on the partition used.
In more detail, I’ve cloned the CUTE code (GitHub - damonge/CUTE: Correlation Utilities and Two-point Estimation), loaded the foss/2020b and GSL/2.6 modules, compiled directly the code, and run the incorporated test with this sh file in baobab:
#!/bin/bash
#
#SBATCH --job-name="test_CUTE"
#SBATCH --time=0:01:00
#SBATCH --partition debug-cpu
#SBATCH --ntasks 32
#SBATCH --cpus-per-task=1
#SBATCH --error test/test_error.e%j
#SBATCH --output test/test_out.o%j
module load foss/2020b
module load GSL/2.6
srun ./CUTE test/param.ini
It works perfectly fine. However, if I replace the debug-cpu partition by the public-cpu one, the code crashes claiming that it cannot find GSL:
/home/users/t/tutusaus/CUTE_original/CUTE/./CUTE: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory
srun: error: node031: tasks 8-15: Exited with exit code 127
srun: First task exited 30s ago
srun: StepId=52471128.0 tasks 0-7,16-31: running
srun: StepId=52471128.0 tasks 8-15: exited abnormally
srun: launch/slurm: _step_signal: Terminating StepId=52471128.0
slurmstepd: error: *** STEP 52471128.0 ON node030 CANCELLED AT 2021-11-25T12:21:47 ***
srun: Job step aborted: Waiting up to 92 seconds for job step to finish.
srun: error: node040: tasks 23-26: Killed
srun: error: node030: tasks 0-7: Killed
srun: error: node032: tasks 16-22: Killed
srun: error: node048: tasks 27-31: Killed
Any feedback on why it might work on debug-cpu and crash on public-cpu would be highly appreciated. Thanks in advance!