Interactive jobs killed prematurely

Adrien.Albert · December 9, 2025, 9:10am

Thank you very much for the details provided.

The issue you are experiencing appears to be similar to the one discussed here: Job timeout despite not hitting the timelimit - #8 by Michael.Sonner.

Based on the logs:

[2025-12-08T14:56:12.460] sched: _slurm_rpc_allocate_resources JobId=6129035 NodeList=(null) usec=5978
[2025-12-08T14:56:13.005] sched: Allocate JobId=6129035 NodeList=cpu089 #CPUs=16 Partition=private-dpnc-cpu
[2025-12-08T15:41:21.003] job_time_limit: inactivity time limit reached for JobId=6129035
[2025-12-08T15:46:20.880] cleanup_completing: JobId=6129035 completion process took 299 seconds

To clarify, was the job terminated while a process was still running under salloc?

We are also considering that this may be related to network issues, which could cause similar behaviour, and we are currently investigating this as part of the ongoing cluster incident: