Hi,
I have an issue while sending jobs with the srun command manually from the command line or through a sbatch script.
The program we are using was installed at :
/sst1m/sw/prod5/sim_telarray/bin//sim_telarray
The srun
command used to send the job manually is the following :
user$ srun /sst1m/sw/prod5/sim_telarray/bin//sim_telarray -I/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg/ -c /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg//CTA-PROD5-LaPalma-baseline_4LSTs_MAGIC.cfg -DNUM_TELESCOPES=1 -DNO_STEREO_TRIGGER=1 -C min_photons=0 -C min_photoelectrons=0 -C save_photons=3 -C only_triggered_telescopes=1 -C only_triggered_arrays=1 -C random_state=auto -C show=all -C maximum_events=100000 -C maximum_telescopes=1 -C telescope_phi=180 -C telescope_zenith_angle=20 -C asum_threshold=300 -C trigger_current_limit=2000.0 -C nightsky_background=all:0.1076 -C nsb_scaling_factor=2 -C dark_events=0 -C pedestal_events=0 -h /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output//hist/dummy100000_asum_threshold_300.hdata -o /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output/dummy100000_asum_threshold_300.simtel.gz /sst1m/data/prod5/corsika/dummy//dummy100000.corsika.gz > /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output//log/dummy100000_asum_threshold_300.log
The message error obtained while running the command above was:
srun: job 42071227 queued and waiting for resources
srun: job 42071227 has been allocated resources
slurmstepd: error: couldn't chdir to `/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2': No such file or directory: going to /tmp instead
slurmstepd: error: execve(): /sst1m/sw/prod5/sim_telarray/bin//sim_telarray: No such file or directory
srun: error: node001: task 0: Exited with exit code 2
However, if avoid the srun
command (launching the entire command line manually, with no use of the job manager), the program runs :
user$ /sst1m/sw/prod5/sim_telarray/bin//sim_telarray -I/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg/ -c /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg//CTA-PROD5-LaPalma-baseline_4LSTs_MAGIC.cfg -DNUM_TELESCOPES=1 -DNO_STEREO_TRIGGER=1 -C min_photons=0 -C min_photoelectrons=0 -C save_photons=3 -C only_triggered_telescopes=1 -C only_triggered_arrays=1 -C random_state=auto -C show=all -C maximum_events=100000 -C maximum_telescopes=1 -C telescope_phi=180 -C telescope_zenith_angle=20 -C asum_threshold=300 -C trigger_current_limit=2000.0 -C nightsky_background=all:0.1076 -C nsb_scaling_factor=2 -C dark_events=0 -C pedestal_events=0 -h /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output//hist/dummy100000_asum_threshold_300.hdata -o /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output/dummy100000_asum_threshold_300.simtel.gz /sst1m/data/prod5/corsika/dummy//dummy100000.corsika.gz > /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output//log/dummy100000_asum_threshold_300.log
Yielding a correct output
Configuration file is '/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg//CTA-PROD5-LaPalma-baseline_4LSTs_MAGIC.cfg'.
Preprocessor is '/sst1m/sw/prod5/sim_telarray/bin//pfp -v -I. -DNUM_TELESCOPES=1 -DNO_STEREO_TRIGGER=1 -DWITH_LOW_GAIN_CHANNEL -DMAX_GAINS=2 -DSIMTEL_VERSION=1593356843 -DSIMTEL_RELEASE=20200628 -I/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg/ -I. -I/sst1m/sw/prod5/sim_telarray/cfg -I/sst1m/sw/prod5/sim_telarray/cfg/common -I/sst1m/sw/prod5/sim_telarray/cfg/hess -I/sst1m/sw/prod5/sim_telarray/cfg/hess2 -I/sst1m/sw/prod5/sim_telarray/cfg/hess3 -I/sst1m/sw/prod5/sim_telarray/cfg/hess5000 -I/sst1m/sw/prod5/sim_telarray/cfg/CTA'.
Read atmospheric transmission data from file atm_trans_2158_1_3_2_0_0_0.1_0.1.dat
Got 800 wavelength intervals for 41 heights starting at 2.158 km
Preprocessor command: /sst1m/sw/prod5/sim_telarray/bin//pfp -v -I. -DNUM_TELESCOPES=1 -DNO_STEREO_TRIGGER=1 -DWITH_LOW_GAIN_CHANNEL -DMAX_GAINS=2 -DSIMTEL_VERSION=1593356843 -DSIMTEL_RELEASE=20200628 -I/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg/ -I. -I/sst1m/sw/prod5/sim_telarray/cfg -I/sst1m/sw/prod5/sim_telarray/cfg/common -I/sst1m/sw/prod5/sim_telarray/cfg/hess -I/sst1m/sw/prod5/sim_telarray/cfg/hess2 -I/sst1m/sw/prod5/sim_telarray/cfg/hess3 -I/sst1m/sw/prod5/sim_telarray/cfg/hess5000 -I/sst1m/sw/prod5/sim_telarray/cfg/CTA -DMAX_GAINS=2 -DTELESCOPE=1 - < /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg//CTA-PROD5-LaPalma-baseline_4LSTs_MAGIC.cfg
Table with 53 rows has been read from file CTA-LST_lightguide_eff_SST1M.dat
Warning: CORSIKA producing only photons in the range 200 to 700 nm
but telescope 1 has sensitivity from 300 to 790 nm.
Extending the range to 200 to 790 nm would imply 1.0191 times bigger bunches.
No such correction is implemented (but could be done unless CEFFIC or CERWLEN are used).
The impact on the signal though is expected to be negligible. No problem.
Launching the task should be also made by a sbatch script containing :
#!/bin/bash
#SBATCH --partition=debug-EL7
#SBATCH --time=00:03:00
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=2200 # in MB
#SBATCH --output=/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/run///log/job_sim_telarray_parameter_scan_12.log
#SBATCH --error=/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/run///error/job_sim_telarray_parameter_scan_12.err
srun /sst1m/sw/prod5/sim_telarray/bin//sim_telarray -I/sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg/ -c /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/cfg//CTA-PROD5-LaPalma-baseline_4LSTs_MAGIC.cfg -DNUM_TELESCOPES=1 -DNO_STEREO_TRIGGER=1 -C min_photons=0 -C min_photoelectrons=0 -C save_photons=3 -C only_triggered_telescopes=1 -C only_triggered_arrays=1 -C random_state=auto -C show=all -C maximum_events=100000 -C maximum_telescopes=1 -C telescope_phi=180 -C telescope_zenith_angle=20 -C asum_threshold=300 -C trigger_current_limit=2000.0 -C nightsky_background=all:0.1076 -C nsb_scaling_factor=2 -C dark_events=0 -C pedestal_events=0 -h /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output//hist/dummy100000_asum_threshold_300.hdata -o /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output/dummy100000_asum_threshold_300.simtel.gz /sst1m/data/prod5/corsika/dummy//dummy100000.corsika.gz > /sst1m/data/prod5/simtel/mono-lst-sipm-borofloat-3ns/rate_scan_nsbx2/output//log/dummy100000_asum_threshold_300.log
which also fails due to the srun
command, we believe.
We would like to know how to solve this issue so we can resume our jobs.
Thanks in advance