Hi
your suggestion of using calling slurm inside a runner is indeed an interesting one.
And it’s especially easy to debug, as you say. Though in a way, this makes runner rely on potentially non-repeatable environment which is established by the baobab user, which makes the project more difficult to migrate, scale, or ship, if necessary.
Also I am a bit concerned that it may be a bit much effort for many users. But perhaps it is possible to develop some common scripts to facilitate this.
In addition, it may be hard to maintain live runner, indeed you can not be sure it will be forever active on the login node.
It can be of course just systemd somewhere on a private node.
Or perhaps, since k8s is great just for that - keeping things live and well - your suggestion can be used together with k8s, shared gitlab runner would live in k8s and execute on baobab. It could be interesting to try, not sure.
Or you had in mind yet another option to reliably run long-running runner?
Concerning security implications, it indeed can become tricky, and also it evolved between different versions of k8s and gitlab, but I had in mind a pretty limited case, when kubernetes executor of gitlab runner does not give user permissions to create arbitrary objects in kubernetes. It can be restricted to just run CI from .gitlab-ci yaml, with very restrictive service account (or even none).
I can not say I have the complete knowledge of k8s, but I tried and it appears to be possible to make runner execute CI jobs with service account which has no permissions. It should be also possible to avoid mounting it altogether.
And in different cases, when the user is allowed to access the API, it can be still severely limited, preventing gaining much privileges (e.g. restricting hostPath volumes, analog of docker volume mounting).
Still, since this is indeed delicate, possibility to create own applications (with deployments, ingresses, pvs, etc) is to be reserved to separate, specialized runners, restricted to groups with more experienced and trusted users, with larger rights but still limited to their namespaces with suitable resource limits.
So it would seem that it should not mean that any user can do when they want with the whole node or cluster, unless they are given this access?..
By the way, the physical cluster will run actually vmware and virtual sub-clusters, so it’s possible to isolate cases further. Though this may make balancing sometimes difficult.
Cheers
Volodymyr