Scratch on Yggdrasil nearly full

Primary informations

Username: kruckow
Cluster: Yggdrasil

Description

I got no response on some I/O intense requests for some time. Hence, I was wondering, whether there are issues with the storage.

Steps to Reproduce

Checking the storage use via df -h shows the usage. And it clearly points out, that the scratch is close to be full:

Filesystem                              Size  Used Avail Use% Mounted on
devtmpfs                                 47G     0   47G   0% /dev
tmpfs                                    47G   40M   47G   1% /dev/shm
tmpfs                                    47G  2.6G   44G   6% /run
tmpfs                                    47G     0   47G   0% /sys/fs/cgroup
/dev/sda4                                10G  8.4G  1.6G  85% /
/dev/sda3                              1014M  223M  792M  22% /boot
/dev/sda2                               256M  5.8M  250M   3% /boot/efi
/dev/sda6                               878G   31G  847G   4% /srv
nasac-evs2.unige.ch:/baobab-nfs/         80T   64T   17T  80% /acanas
nasac-evs2.unige.ch:/baobab-nfs-fiteo/   40T   39T  1.7T  96% /acanas-fiteo
admin1:/opt/ebsofts                     3.5T  1.9T  1.7T  53% /opt/ebsofts
admin1:/opt/ebmodules                   3.5T  1.9T  1.7T  53% /opt/ebmodules
admin1:/srv/export/opt/cluster          3.5T  1.9T  1.7T  53% /opt/cluster
beegfs_home                             495T  106T  389T  22% /home
beegfs_scratch                          1.2P  1.2P  1.9T 100% /srv/beegfs/scratch
cvmfs2                                   30G   25G  5.3G  83% /cvmfs/sft.cern.ch
tmpfs                                   9.4G     0  9.4G   0% /run/user/0
cvmfs2                                   30G   25G  5.3G  83% /cvmfs/cvmfs-config.cern.ch
grid06.unige.ch:/mnt/beegfs             535T  531T  3.4T 100% /dpnc/beegfs

My own usage is <10TB and my group uses <25TB.
Hence, it looks like there are some having a lot more data there then we have.

I guess, there is some action needed in the near future to check:

  1. Whether some job run out of control and wrote too much data
  2. Whether some data can be removed

Dear @Matthias.Kruckow

We have contacted our users who store the most data, asking them to remove any unused data.

@all: Each user is responsible for managing their own data. The HPC team only deletes data in emergency situations.

Since sending the email, we have already reclaimed more than 80 terabytes of storage:

(yggdrasil)-[root@login1 ~]$ df -t beegfs -h
Filesystem      Size  Used Avail Use% Mounted on
beegfs_home     495T  106T  389T  22% /home
beegfs_scratch  1.2P  1.1P   85T  93% /srv/beegfs/scratch

We invite all users to manage their data responsibly to prevent a full filesystem, which could disrupt other ongoing projects.

Thank you for your understanding

Best Regards,