Current issues on Baobab and Yggdrasil

Dear HPC users,


Many of you already contacted us about this issue.

Every HPC users at UNIGE received hundreds of emails today between 11h00 and ~15h30. The subject was :

[Yggdrasil] Job XXXXX will never run

This is a mistake and we are very sorry for the inconvenience!

You can safely delete those emails.

Multiple reasons (and just a hint of Murphy’s law) caused this mass mailing this morning.

It seems the last Slurm update we installed this morning (during Yggdrasil maintenance) introduced a new “reason” to explain why a job is pending. And while another Slurm service was not running (because it was being updated), our script to detect and notify users of their pending jobs was launched… at a very bad timing.
This new “reason” was not filtered in the script and triggered this mass mailing.

The script has been updated and everything is corrected now and this hopefully shouldn’t happen anymore. However, all the emails have been release from the mail server and there is nothing we can do to stop them.

We understand everyone’s frustration about this flood of email spamming you.
Please understand these email left Yggdrasil this morning between 11h02 and 11h05. We do not have any control on them at this point. We have already contacted UNIGE’s postmaster to ask them to stop whatever can be stopped.

With the help of the Postmaster, we eventually managed to put an end to this mass mailing.

The emails left Yggdrasil between 11h02 and 11h05 this morning. It’s like if you send an email by mistake, you can’t just “take it back”. The same happened but on a very large scale. So the HPC team had no control or any way to stop them anymore.

Most of the email have been stuck in a queue on the mail servers and were release around 15h. From this point on, they flooded the mail system for the next hours.
Around 17h45, thousands of remaining emails were identified and blocked on the mail queues.
Eventually, at 18h18 they all have been deleted. You shouldn’t have received any spam since that time.

We thank all of you for your understanding and we apologize again for the inconvenience.

Massimo Brero

1 Like