Job manager slurm failing to send jobs (baobab)

Hello David,

We had to dig up a bit to find out the problem and I believe your issue is the following.

Your code works when you don’t use srun because the commands are executed on login2 (Baobab).
On login2, the mount point to /sst1m is done when the server boots up on:
nasac-evs1.unige.ch:/s-astro/archive/walter/sst1m

When you use srun, your code is executed on a/multiple node(s) of the cluster. On the nodes, you access /sst1m with autofs

autofs is a program for automatically mounting directories on an as-needed basis. Auto-mounts are mounted only as they are accessed

Now, the issue is that someone renamed the folder you are trying to access (please note the “_old” after walter :
nasac-evs1.unige.ch:/s-astro/archive/walter_old/sst1m

Your configuration changed, but we haven’t been notified of this change, so our configuration files still point to the old path:
nasac-evs1.unige.ch:/s-astro/archive/walter/sst1m

Now, on login2, the server hasn’t been rebooted in some time, and at the time the mount point was still valid (the name walter changed to walter_old, but the inode didn’t).

You have to decide internally at astro department, which path we should use as - as far as I know - the data you are trying to access has been/is being migrated to a new storage.

We have contacted Walter Roland on this matter.

Let us know as soon as a decision has been made and we will try to upgrade the autofs configuration accordingly.

In the meantime, if you computations are really urgent, I suggest the following workaround:
a) Howto access external storage from Baobab - #8 by Silas.Kieser
b) Copy the files you need for you code to run in your $HOME/scratch folder until you can access the shared space again.

I hope this helps!

All the best,

Massimo Brero