dbus-daemon[154590]: Failed to start message bus: Failed to bind socket “/tmp/eb-240s9w15/dbus-BhXJEhGkTK”: No such file or directory
EOF in dbus-launch reading address from bus daemon
This is happening on a cluster node on baobab.
Also, the documentation is not very clear on smb://server_name/share_name. Could you give a typical example in the documentation? I asked for a standard NASAC share and got J'ai créé le partage \isis.unige.ch\nasac\gsem\name... what is server_nameandshare_name` here?
If you want to copy the data from NASAC to the cluster, you may want to try to mount the cluster storage in your desktop/server and copy from there. Not very efficient I must admit:(
To briefly summarize, the issue is caused by a Mellanox driver update that disables the CIFS kernel module required for mounting smb share.
Here the mention in officiel Mellanox known issues 2657392:
Description: OFED installation caused CIFS to break in RHEL 8.4 and above. A dummy module was added so that CIFS would be disabled after the OFED installation in RHEL 8.4 and above.
We tried to apply a workaround based on the very limited information from Mellanox-NVIDIA, but it didn’t work. Mellanox-NVIDIA support has been mostly quiet on this issue.
I’m currently working on another clue, but I can’t guarantee a solution, as dealing with the kernel can be quite tricky.
Thanks for the update in addressing this pernicious issue.
That’s good news for all users of the HPC who also depend upon the NASAC for sharing data across our teams.
I’d like to try your patch, but it’s not clear which HPC email address you would like us to write to (hpc or hpc-admin…?)
More generally - Given this appears to be an essential functionality, but you say it has been disabled by Mellanox for good reason, what alternative best practice can you propose for sharing data sustainably?
As the patch could be unstable, we have limited its deployment to the login node to ensure there is no impact on production.
I tried some basic adjustments, but unfortunately they were not successful.
As a workaround has been implemented, we won’t be exploring this fix any further. While I understand the convenience of directly mounting the share in Singularity images on compute nodes, the time investment required to avoid a simple data copy on the cluster is too significant. Furthermore, I was unable to find any relevant documentation on this approach.
As an alternative, we’ve discussed converting your CIFS share to NFS. This would allow us to easily mount your share across all compute nodes. Have you see with the storage team which is the best option?