Mounting nasac on baobab: dbus-launch bash fails

I tried to mount a new nasac share on baoab, and following instructions in hpc:storage_on_hpc [eResearch Doc],
I run:

$dbus-launch bash

but this triggers the error message:

dbus-daemon[154590]: Failed to start message bus: Failed to bind socket “/tmp/eb-240s9w15/dbus-BhXJEhGkTK”: No such file or directory
EOF in dbus-launch reading address from bus daemon

This is happening on a cluster node on baobab.

Also, the documentation is not very clear on smb://server_name/share_name. Could you give a typical example in the documentation? I asked for a standard NASAC share and got J'ai créé le partage \isis.unige.ch\nasac\gsem\name... what is server_nameandshare_name` here?

Thanks!

Dear @Matthieu.Stigler

this is unfortunately a know issue [2024] Current issues on HPC Cluster - #15 by Adrien.Albert still not solved.

If you want to copy the data from NASAC to the cluster, you may want to try to mount the cluster storage in your desktop/server and copy from there. Not very efficient I must admit:(

Best

Yann

Hello @Matthieu.Stigler

To briefly summarize, the issue is caused by a Mellanox driver update that disables the CIFS kernel module required for mounting smb share.

Here the mention in officiel Mellanox known issues 2657392:

Description: OFED installation caused CIFS to break in RHEL 8.4 and above. A dummy module was added so that CIFS would be disabled after the OFED installation in RHEL 8.4 and above.

We tried to apply a workaround based on the very limited information from Mellanox-NVIDIA, but it didn’t work. Mellanox-NVIDIA support has been mostly quiet on this issue.

I’m currently working on another clue, but I can’t guarantee a solution, as dealing with the kernel can be quite tricky.

Thanks for following-up on this, I really appreciate! Hope a workaround can be found and especially that Mellanox responds to this!

Hi @Matthieu.Stigler

Few chance to get an answer from mellanox as they assume that is not their scope…

But the good news is :tada: mywork arround is working, we need to test the robustness of this patch.

If you are interested in testing this patch, please send an email to HPC.

PS: Note that the module has been deactivated by mellanox for a good reason, so we’re not immune to unexpected behaviour.

Hi @Adrien.Albert,

Thanks for the update in addressing this pernicious issue.
That’s good news for all users of the HPC who also depend upon the NASAC for sharing data across our teams.
I’d like to try your patch, but it’s not clear which HPC email address you would like us to write to (hpc or hpc-admin…?)

More generally - Given this appears to be an essential functionality, but you say it has been disabled by Mellanox for good reason, what alternative best practice can you propose for sharing data sustainably?

Thanks in advance!
Alexis

Hi All

here the new about the work arround :wink:

great, thanks a lot Adrien!

were you able to deploy the patch on the on demand docker images too, or is that still an issue? Thanks!

As the patch could be unstable, we have limited its deployment to the login node to ensure there is no impact on production.

I tried some basic adjustments, but unfortunately they were not successful.

As a workaround has been implemented, we won’t be exploring this fix any further. While I understand the convenience of directly mounting the share in Singularity images on compute nodes, the time investment required to avoid a simple data copy on the cluster is too significant. Furthermore, I was unable to find any relevant documentation on this approach.

As an alternative, we’ve discussed converting your CIFS share to NFS. This would allow us to easily mount your share across all compute nodes. Have you see with the storage team which is the best option?