Sacct error on yggdrasil

Hi all,
I wanted to use the following commend sacct -u conradin to check if I have running jobs, and I got the following error

sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm1:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

Is it related to the maintenance on baobab?
Best,
Raphaël

Hi there,

Yes, it was, fixed:

[capello@login1 ~]$ sacct -u capello
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
[capello@login1 ~]$ 

Sorry for the inconvenience.

Thx, bye,
Luca

Thank you very much.

Hi,
this error seems to be popping up again on Yggdrasil. Tested just now.

(noisepy) [savardg@login1.yggdrasil CCF_25sps_3c_coherence]$ sacct -j 8061735
sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm1:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

Thanks in advance,
Genevieve

1 Like

Hi, this is fixed! Thanks for letting us know.

1 Like

Hi, this issue is back again for me on Ygg.

(base) [savardg@login1.yggdrasil ~]$ sacct -j  8163959
sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm1:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

Hi, this is corrected, thanks.

Hi, I’m sorry to report the sacct error is back.

sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:slurm1:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

Arf you are right! I restarted a server this morning and this killed our ssh tunnel.

Thanks for the monitoring:)

Hi Yann, the error is back. :grimacing: Sorry to keep bugging you with this one. I always use sacct to check the status of completed jobs, I guess other users don’t use it as much since they don’t seem to notice.

perhaps there could be a script that routinely checks if sacct fails and restart the server if necessary? not sure how feasible that would be or if it would affect performance.

Hi, this time this is related with Baobab maintenance… you’ll have to wait a little bit and we’ll reactivate this service. Yes we already have a script… but it seems it isn’t always working as expected. We’ll try to improve that in the future.

1 Like

Ok no problem, good to know! Thanks, Yann.

Hi, the error is back.

Hi, sacct is not working :slight_smile:

Good evening,

I guess you know what is the problem :slight_smile:

Yes… Yggdrasil is using login2.baobab to communicate with Baobab and login2 crashed :unamused:

Service restored.

1 Like