[2023] Current issues on HPC Cluster

[Baobab] 2023-04-28T13:00:00Z

correction expected on 2023-05-01T22:00:00Z

Primary information

Username: ALL
Cluster: Baobab

Description

Since The maintenance on baobab loading module through slurm sbatch does not work.
The issue doesn’t happens when using salloc.

Steps to Reproduce

(baobab)-[sagon@login2 modules]$ sbatch --wrap "ml GCC/12.2.0; which gcc"
Submitted batch job 582485
(baobab)-[sagon@login2 modules]$ cat slurm-582485.out
/var/spool/slurmd/job582485/slurm_script: line 4: ml: command not found
/usr/bin/gcc

WorkArround

  1. Load the wanted module on login2 before launching your job

or

  1. add the following line after all the #SBATCH pgramas in your sbatch script: . /etc/profile.d/modules.sh (yes there is a dot and a space in front of the line)

or

  1. transform the very first line of your sbatch script to be #!/bin/sh -l

And launch your job

Example (option 1):

(baobab)-[alberta@login2 ~]$ ml Stata/17
(baobab)-[alberta@login2 ~]$ srun stata-mp -h
srun: job 582114 queued and waiting for resources
srun: job 582114 has been allocated resources

stata-mp:  usage:  stata-mp [-h -q -s -b] ["stata command"]
        where:
            -h          show this display
            -q          suppress logo, initialization messages
            -s          "batch" mode creating .smcl log
            -b          "batch" mode creating .log file
            -rngstream# set rng to mt64s and set rngstream to #;
                          see "help rngstream" for more information;
                          note that there must be no space between
                          "rngstream" and #

        Notes:
            xstata-mp is the command to launch the GUI version of Stata/MP
            stata-mp  is the command to launch the console version of Stata/MP

            -b is better than "stata-mp < filename > filename".

The workaround is working with sbatch too

(baobab)-[alberta@login2 stata]$ sbatch test.sh
Submitted batch job 582125
(baobab)-[alberta@login2 stata]$ ll
total 3
-rw-r--r-- 1 alberta hpc_users  76 Apr 28 19:51 slurm-582125.out
-rw-r--r-- 1 alberta hpc_users  15 Apr 28 17:56 test.do
-rw-r--r-- 1 alberta hpc_users 804 Apr 28 19:51 test.log
-rw-r--r-- 1 alberta hpc_users 130 Apr 28 17:56 test.sh

We apologize for any inconvenience caused.

Best Regards,


HPC Team