[2025] Current issues on HPC Cluster

Description

Cluster: Bamboo

One of the scratch server server crashed. We had to restart it. Unfortunately, home and scratch are sharing some hardware, thus we had a short outage on home too.


HPC Team

Status : Resolved :green_circle:

start: 2025-08-27T16:15:00Z
end: 2025-08-28T08:00:00Z

Description

Cluster: Baobab

The team managing the storage for the servers did a maintenance this morning and all our admin servers crashed. We are investigating as normally it is fully redundant

In the meantime, the running jobs are probably still running but slurm is stopped.

edit: the service is restored. We’ll now investigate with the storage team why this happened


HPC Team

Status : Resolved :green_circle:

start: 2025-09-04T09:07:00Z
end:2025-09-04T09:50:00Z

Description

Cluster: Baobab

Dear HPC Users,

Due to the update to OpenOnDemand version 4, we are currently experiencing some portability issues. We are working on resolving these and applying the necessary adjustments.

Affected Apps:

  • Stata
  • Matlab

Please, feel free to post on HPC Support > HPC issues for any unexpected behavior.

Status: Resolved :green_circle:

StartTime: 2025-09-10T16:00:00Z

Dear Users,

We are experiencing issues with the tmp space on login1.baobab, which is making it difficult to use the services.

An emergency reboot is required.

Sorry for inconvenience,

Status : Resolved :green_circle:

start: 2025-09-30T08:00:00Z
end: 2025-09-30T10:00:00Z

Dear Users,

After reinstalling the gpu nodes on Bamboo, we saw the / partition was too small to host all the CUDA rpms + tmp space. We had to reinstall all the GPU node

Sorry for inconvenience,

Status : Resolved :green_circle:

start: 2025-10-10T08:00:00Z
end: 2025-10-13T10:00:00Z

Dear Users,

Cluster: Baobab

We are experiencing technical issue with Baobab OpenOnDemand resulting service unavilable. We are working to resolve this as quickly as possible.

Status : Resolved :green_circle:

start: 2025-11-04T10:00:00Z
end: 2025-11-04T15:30:00Z