SSH public key not present on compute nodes, while present on login node

Primary informations

Username: scheulen
Cluster: Baobab

Description

SSH key used to connect successfully to login node fails for proxy jump to specific CPU node reserved via slurm.

I just reserved a CPU node (cpu330) for remote software development via VSCode. Upon trying to proxy jump to the compute node (either within VSCode or on console) the ssh key authentication fails and a password prompt for $USER@cpu330, which does not accept the usual UniGe password. On the initial session on the node opened after successful slurm allocation, no authorised key was found using the /usr/bin/sss_ssh_authorisedkey script as required for authentication, while my public key is found with the same script on the login node.

Therefore, it seems that the key is not accessible on the CPU node. I am doubtful that this is expected behaviour since I was able to access nodes via keys in the past and since I am not aware that there is an alternative authorisation method such as Kerberos tokens to get to the compute nodes from the login node.

For completeness, an MWE of my .ssh/config is reproduced below:

Host baobab
     HostName                    login1.baobab.hpc.unige.ch
     User                        scheulen
     Compression                 yes
     ForwardX11                  yes
     ForwardAgent                yes
     PubkeyAuthentication        yes
     IdentityFile                ~/.ssh/id_ed25519

Host cpu* gpu*
     HostName                    %h
     User                        scheulen
     UserKnownHostsFile          ~/.ssh/known_hosts.unige
     ProxyJump                   baobab
     Compression                 yes
     ForwardAgent                yes
     PubkeyAuthentication        yes
     IdentityFile                ~/.ssh/id_ed25519
     

Steps to Reproduce

  1. Log on to baobab via ssh baobab
  2. Reserve CPU node using salloc -n1 --partition=private-dpnc-cpu,shared-cpu --time=8:00:00 --mem=10G -c2
  3. Log on to CPU node via ssh cpuXXX (XXX being the specific node number)
  4. Cry a bit
  5. Try out /usr/bin/sss_ssh_authorisedkey $USER on both the login node and the CPU node (using the console opened after successful negotiation of the slurm job)

Expected Result

Connect directly to CPU node with ssh key, show ssh key uploaded via https://my-account.unige.ch on both the login and the compute node.

Actual Result

Upon trying to connect to the CPU node via proxy jump, the ssh key was not accepted (running ssh verbosely shows the key is supplied for authentication and also does work for the login node, but fails for the CPU node). Similarly, /usr/bin/sss_ssh_authorizedkeys $USER shows the key on the login node but not on the CPU node.

I seem to have just solved it by adding the key to .ssh/authorised_keys instead of relying on the uploaded key in my account. In case the key uploaded to the UniGe account should be accessible by the compute nodes as well, there might however still be something wrong.

Hi @Christian.Scheulen

Is this post is related to your issue ?

Hi @Adrien.Albert,

Indeed, that post was the guideline I used initially to troubleshoot my issues. Initially, I was under the (presumably wrong) impression that the public key associated to my UniGe account should also be available on the worker nodes.

Problematically, when trying to copy the ssh-key from my computer to the login node via copy-ssh-id, the process failed as the key was already available on there (I am assuming due to the fact that I had already uploaded the same public key to my UniGe account). Manually adding the public key to .ssh/authorized_keys did however resolve the issues for me.

In any case, since copying the key manually I have not had problems using proxy-jumps to the worker nodes via the login node.

Best,

Chris

1 Like

To make it work:

  1. To login on cluster we use the sshPublicKey registered in Unige Account. You need to have the same ssh key in Unige Account and your local machine to connect on login (entry point)

  2. Then in cluster, we use the traditionnally way. You need to have the sshPublicKey from login1.baobab:~/.ssh/id_rsa.pub in your login1.baobab:~/.ssh/authorized_keys (or something else if you have configured your ssh_config)

EDIT:

  1. Then on cluster, your ~/.ssh/ must be empty.

Exemple:

# screen1
(yggdrasil)-[alberta@login1 .ssh]$ ls
(yggdrasil)-[alberta@login1 .ssh]$ 
(yggdrasil)-[alberta@login1 .ssh]$ srun sleep 600
...
# screen2
(yggdrasil)-[alberta@login1 ~]$ sac
          JobID    JobName    Account      User        NodeList   NTasks               Start                 End      State 
--------------- ---------- ---------- --------- --------------- -------- ------------------- ------------------- ---------- 
       38555419      sleep      burgi   alberta          cpu001          2025-03-05T10:11:28             Unknown    RUNNING 
(yggdrasil)-[alberta@login1 ~]$ ssh cpu001
Last login: Wed Mar  5 10:01:02 2025 from login1.yggdrasil
Installed: Thu Jan 30 09:26:01 CET 2025
(yggdrasil)-[alberta@cpu001 ~]$

@Christian.Scheulen

I’ve updated my previous answer. I made a mistake :pray:

Hi @Adrien.Albert,

Apologies for only coming back to this now, I should really activate the forum digest to not miss out on messages. Thanks a lot for the additional information on keeping $HOME/.ssh empty.

Since I put some aliases to get access to some CERN machines, this one unfortunately does not work for me. However, keeping the ssh-keys in .ssh/authorized_keys is working without issues so far.

Best,

Chris