Several nodes in drain

Hi,
It appears that several nodes are currently in drain/down state.

Regards,
Debajyoti

1 Like

Dear @Debajyoti.Sengupta

Here the status of this incident:

Computes are back into production.

Best Regards

They continue to be drain/down

1 Like

Dear @Debajyoti.Sengupta

We have identified the root cause of the issue. An infiniband switch is experiencing hardware malfunction, resulting in interference with the infiniband traffic across the entire cluster. Currently, the nodes connected to this switch have been temporarily disabled and powered off. We are awaiting the replacement hardware to restore these nodes.

We apologize for any inconvenience this may have caused.

Best Regards,

1 Like

It might be a related issue, but it seems like Scratch is not working at the moment.

Hi @Malte.Algren

it may be related, we’re working on it. We will keep you inform.

Best Regards