Cvmfs issue after Yggdrasil reboot

Primary informations

Username: coppinp
Cluster: Yggdrasil

Description

Jobs fail when they try to load/source certain files on cvmfs.

Steps to Reproduce

source /cvmfs/dampe.cern.ch/centos7/etc/setup_conda_python2.7_tensorflow2.1.sh
python -c “from tensorflow.python.tools import module_util as _module_util”

Tested both in jobs and on the login-node. Sometimes it works, sometimes it doesn’t.

Result

Traceback (most recent call last):
File “”, line 1, in
File “/cvmfs/dampe.cern.ch/centos7/opt/conda_python2.7_tensorflow/lib/python2.7/site-packages/tensorflow/init.py”, line 101, in
from tensorflow_core import *
File “/cvmfs/dampe.cern.ch/centos7/opt/conda_python2.7_tensorflow/lib/python2.7/site-packages/tensorflow_core/init.py”, line 40, in
from tensorflow.python.tools import module_util as _module_util
ImportError: No module named tools

Hello @Paul.Coppin

Everything seems to be working fine.

Maybe it’s a output of your job during the incident ?

Or could give me the node whcre cvmfs is not working ?

(yggdrasil)-[alberta@login1 ~]$ salloc
salloc: Pending job allocation 29937644
salloc: job 29937644 queued and waiting for resources
salloc: job 29937644 has been allocated resources
salloc: Granted job allocation 29937644
salloc: Nodes cpu001 are ready for job

(yggdrasil)-[alberta@cpu001 ~]$ source /cvmfs/dampe.cern.ch/centos7/etc/setup_conda_python2.7_tensorflow2.1.sh
cpu001.yggdrasil

(yggdrasil)-[alberta@cpu001 ~]$ cvmfs_config probe
Probing /cvmfs/euclid.in2p3.fr... OK
Probing /cvmfs/euclid-dev.in2p3.fr... OK
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/atlas-nightlies.cern.ch... OK
Probing /cvmfs/sft.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK

Hi Adrien,

The errors happened between Thursday morning, up this morning (~10 am). However, now indeed everything seems to be working fine. So no need to dig deeper unless it happens again! In that case I will report back here.

Purely for info, nodes which produced errors (non-exhaustive list) include:
cpu077,cpu85,cpu88,cpu098,cpu117,cpu133 (+ login node)
I noticed that trying out the command on the login node before submitting a job generally means that the job will be fine, i.e. if cvmfs is loaded correctly on the login-node (and vice-versa jobs would fail to start if the test on the login node fails).

Cheers, Paul