Hello team,
I attach the image with the error.
I was trying to allocate an interactive job on the private-wesolowski-bigmem partition, and it seemed to be granted, then just failed.
Thanks in advance,
Cristina
Hello team,
I attach the image with the error.
Thanks in advance,
Cristina
The same problem seems to appear on the public-interactive-cpu
partition:
salloc: Nodes cpu003 are ready for job
srun: error: get_addr_info: getaddrinfo() failed: Name or service not known
srun: error: slurm_set_addr: Unable to resolve "cpu003"
srun: error: fwd_tree_thread: can't find address for host cpu003, check slurm.conf
srun: error: Task launch for StepId=10034249.0 failed on node cpu003: Can't find an address, check slurm.conf
srun: error: Application launch failed: Can't find an address, check slurm.conf
srun: Job step aborted: Waiting up to 92 seconds for job step to finish.
This might be an undesired consequence of the maintenance…
Quick follow up, @Cristina.GonzalezEspinoza things seems to work now in the public-interactive-cpu
partition. Maybe it’s the case in your private one as well, however I have no idea where the problem was coming from.
Best,
Stefano
Hi Stefano,
Yes, indeed, it works now, thanks for the heads-up!!
Cristina
Hi,
I’m not able to reproduce the issue, the reason was probably some glitch after the re install of the compute nodes.
Dear HPC team,
Today I experience a similar error as the one described in this post on Yggdrasil:
Below is the command line used and the error message.
Any idea ?
Kind regards,
Julien
Blockquote
(yggdrasil)-[prados@login1 oxa-48]$ salloc --partition=shared-cpu --time=4:00:00 --mem=12G --ntasks=1 --cpus-per-task=4
salloc: Pending job allocation 13923014
salloc: job 13923014 queued and waiting for resources
salloc: job 13923014 has been allocated resources
salloc: Granted job allocation 13923014
salloc: Waiting for resource configuration
salloc: Nodes cpu116 are ready for job
srun: error: get_addr_info: getaddrinfo() failed: Name or service not known
srun: error: slurm_set_addr: Unable to resolve “cpu116”
srun: error: fwd_tree_thread: can’t find address for host cpu116, check slurm.conf
srun: error: Task launch for StepId=13923014.interactive failed on node cpu116: Can’t find an address, check slurm.conf
srun: error: Application launch failed: Can’t find an address, check slurm.conf
srun: Job step aborted
salloc: Relinquishing job allocation 13923014
Hi, this is fixed: Current issues on Baobab and Yggdrasil - #96 by Yann.Sagon