Is it possible to run longer interactive jobs in a way that they survive the restart of the login node?
Currently I am using something like
salloc --time 4-00:00:0 --partition private-dpt-cpu,public-cpu --cpus-per-task 30 --mem 150G
inside of a tmux on the login node. Of course this means that this job dies if there is an issue with communication or a reboot of the login node. Is there a way to avoid this?
Hi, not sure there is an easy way to do so. Hopefully we won’t have to reboot login node again soon.
out of curiosity:why do you need an interactive allocation?
Ok, let’s keep our fingers crossed .
In general I tend to use interactive sessions for ‘one-of-a-kind’ computations or when using new, potentially buggy code. This way I can see if intermediate results look reasonable and stop if they don’t, checkpoint manually in case the computing time is longer then expected, fix smaller mistakes live and debug significantly easier. Once the code is reliable and it is a matter of running it for a lot of parameters, of course
sbatch is the way to go.