Primary informations
Username: coppinp
Cluster: Yggdrasil
Description
Jobs fail when they try to load/source certain files on cvmfs.
Steps to Reproduce
source /cvmfs/dampe.cern.ch/centos7/etc/setup_conda_python2.7_tensorflow2.1.sh
python -c “from tensorflow.python.tools import module_util as _module_util”
Tested both in jobs and on the login-node. Sometimes it works, sometimes it doesn’t.
Result
Traceback (most recent call last):
File “”, line 1, in
File “/cvmfs/dampe.cern.ch/centos7/opt/conda_python2.7_tensorflow/lib/python2.7/site-packages/tensorflow/init.py”, line 101, in
from tensorflow_core import *
File “/cvmfs/dampe.cern.ch/centos7/opt/conda_python2.7_tensorflow/lib/python2.7/site-packages/tensorflow_core/init.py”, line 40, in
from tensorflow.python.tools import module_util as _module_util
ImportError: No module named tools
Hello @Paul.Coppin
Everything seems to be working fine.
Maybe it’s a output of your job during the incident ?
Or could give me the node whcre cvmfs is not working ?
(yggdrasil)-[alberta@login1 ~]$ salloc
salloc: Pending job allocation 29937644
salloc: job 29937644 queued and waiting for resources
salloc: job 29937644 has been allocated resources
salloc: Granted job allocation 29937644
salloc: Nodes cpu001 are ready for job
(yggdrasil)-[alberta@cpu001 ~]$ source /cvmfs/dampe.cern.ch/centos7/etc/setup_conda_python2.7_tensorflow2.1.sh
cpu001.yggdrasil
(yggdrasil)-[alberta@cpu001 ~]$ cvmfs_config probe
Probing /cvmfs/euclid.in2p3.fr... OK
Probing /cvmfs/euclid-dev.in2p3.fr... OK
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/atlas-nightlies.cern.ch... OK
Probing /cvmfs/sft.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK
Hi Adrien,
The errors happened between Thursday morning, up this morning (~10 am). However, now indeed everything seems to be working fine. So no need to dig deeper unless it happens again! In that case I will report back here.
Purely for info, nodes which produced errors (non-exhaustive list) include:
cpu077,cpu85,cpu88,cpu098,cpu117,cpu133 (+ login node)
I noticed that trying out the command on the login node before submitting a job generally means that the job will be fine, i.e. if cvmfs is loaded correctly on the login-node (and vice-versa jobs would fail to start if the test on the login node fails).
Cheers, Paul