Node cpu331.baobab has no access to scratch

Primary informations

Username: serpolla
Cluster: baobab

Description

I noticed some of my jobs failing while trying to write files on scratch.
They were all running on cpu331.baobab and indeed, after opening a shell, I noticed scratch is unavailable on that computing node:

(baobab)-[serpolla@login1 ~]$ srun -p shared-cpu --nodelist=cpu331 --pty bash -i
srun: job 12696695 queued and waiting for resources
srun: job 12696695 has been allocated resources
(baobab)-[serpolla@cpu331 ~]$ ls /srv/beegfs/scratch/
<Empty dir>

Until now excluding just that one node seems sufficient, but I do not know if other nodes are experiencing the same issue.

I have the same issue. I have access to scratch on the login node but when using ‘salloc’ to go on a compute node, I do not have access anymore.

As far as I can see its only for some nodes: gpu017 I don’t have access but on cpu209 I do

Malte

Hi all;

I am on it. I keep you informed as soon as possible

1 Like

It seems fine now :+1:

Dear @Malte.Algren and @Andrea.Serpolla you can follow the issue there: [2024] Current issues on HPC Cluster - #25 by Yann.Sagon
Every node was set in drain, it means every new jobs should be fine.

Best