I seem to be having general issues with jobs running on baobab this morning.
I’m currently debugging some scripts in a jupyter notebook on a CPU node and have hit the following issues:
- The port forwarding over ssh sometimes reports a timeout or connection refused
- Processes take a very long time to start up
- Individual calls to the interactive kernel can hang for a very long time
- The kernel stops responding entirely and needs to be rebooted
I thought it might be disk read issues, but even when everything is in memory it is painfully slow, but it could be a general I/O slowdown? Or perhaps an effect of sending my commands over ssh port forwarding?
I know I’m not the only one with issues though. @Matthew.Leigh is having non-interactive jobs fail due to timeouts (which were running perfectly fine first thing this morning).