Using slurm on a NAS share leads to slurmstepd: error: couldn't chdir to?

Hi @Matthieu.Stigler

Here are my tests. The script in the SMB share contains only:

echo toto

Reproducing the current situation (Not working):

(baobab)-[alberta@login1 ~]$ cat !$
cat sbatch_test.sh
#!/bin/sh
#SBATCH --job-name test_gio
#SBATCH --cpus-per-task 1
#SBATCH --time 00:05:00
#SBATCH --partition debug-cpu

dbus-launch bash
gio  mount smb://isis.unige.ch/nasac/hpc_exchange/backup < .credentials
bash /var/run/user/401775/gvfs/smb-share\:server\=isis.unige.ch\,share\=nasac/hpc_exchange/backup/tutu

(baobab)-[alberta@login1 ~]$ sbatch !$
sbatch sbatch_test.sh
Submitted batch job 18350710
    
     
(baobab)-[alberta@login1 ~]$ cat slurm-18350710.out 
Error creating proxy: Could not connect: No such file or directory (g-io-error-quark, 1)
gio: smb://isis.unige.ch/nasac/hpc_exchange/backup: volume doesn’t implement mount
bash: /var/run/user/401775/gvfs/smb-share:server=isis.unige.ch,share=nasac/hpc_exchange/backup/tutu: No such file or directory

Resolution

After analyzing the logs on the compute node, I noticed that dbus-launch had not fully completed its initialization when gio was executed.

To address this, I added a sleep 5 after the dbus-launch command, and it appears to resolve the issue.

The reason is that dbus-launch returns immediately and sets up the D-Bus session environment in the background. This means that gio can start running before the D-Bus session is fully ready, which results in an error. By adding a short delay (e.g., sleep 5), we give D-Bus enough time to complete its initialization, allowing gio mount to work as expected.

#!/bin/sh
#SBATCH --job-name test_gio
#SBATCH --cpus-per-task 1
#SBATCH --time 00:05:00
#SBATCH --partition debug-cpu

dbus-launch bash
sleep 5
gio  mount smb://isis.unige.ch/nasac/hpc_exchange/backup < .credentials
bash /var/run/user/401775/gvfs/smb-share\:server\=isis.unige.ch\,share\=nasac/hpc_exchange/backup/tutu
(baobab)-[alberta@login1 ~]$ !sbat
sbatch sbatch_test.sh
Submitted batch job 18350763
(baobab)-[alberta@login1 ~]$ cat slurm-18350763.out 
Authentication Required
Enter user and password for share “nasac” on “isis.unige.ch”:
User [alberta]: Domain [SAMBA]: Password: 

toto   <------ Result
1 Like